Developer Tools B

Showing 181–210 of 292
  • arXiv cs.LG (Machine Learning) · EN Developer Tools extract
    INI-VPINN: A Variational Physics-Informed Neural Network with Implicit Neumann and Interface Handling for Multi-Material Domains with Geometric Singularities
    INI-VPINN: a variational PINN for multi-material domains
    Deep Learning Neural Network
    INI-VPINN is a weak-form physics-informed neural network that naturally incorporates Neumann boundary and interface conditions into a variational formulation, targeting multi-material domains with geometric singularities.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    LegalHalluLens: Typed Hallucination Auditing and Calibrated Multi-Agent Debate for Trustworthy Legal AI
    LegalHalluLens audits typed legal-AI hallucinations with calibrated debate
    Retrieval-Augmented Generation (RAG)
    Legal-AI systems hallucinate at aggregate rates near 52%, but averages hide where and how errors concentrate. LegalHalluLens is an auditing framework pairing typed hallucination auditing with calibrated multi-agent debate to give compliance officers actionable signals for trustworthy legal AI.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Developer Tools extract
    A T-API-Compliant ReAct Agentic Loop for Optical Networks: Generic vs. Domain-Specific Tool Abstractions
    A T-API-compliant ReAct agentic loop for optical networks
    The paper presents the first T-API-compliant ReAct loop for intent-driven, closed-loop optical network management, reporting that domain-specific composite tools achieve 90% oracle-validated correctness with threefold token savings versus generic tools.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Inference & Efficiency extract
    Differential Privacy of Gaussian Process Posterior Sampling
    Differential privacy of Gaussian process posterior sampling
    Inference
    The paper studies privacy when releasing posterior sample paths from a Gaussian process where the entire training set is private. Unlike DP mechanisms that add external noise, it shows the intrinsic randomness of posterior sampling itself yields differential-privacy guarantees.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    Recover Semantics First, Generate Better: Improved Latent Modeling for 3D MRI Reconstruction and Cross-Contrast Synthesis
    Improved latent modeling for 3D MRI reconstruction and synthesis
    The paper proposes an improved latent modeling approach for 3D MRI reconstruction and cross-contrast synthesis, addressing the heavy computational cost of large 3D volumes by recovering semantics first to better infer absent MRI contrasts.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Multimodal extract
    STAR: SpatioTemporal Adaptive Reward Allocation for Text-to-Image RL Post-Training
    STAR: spatiotemporal adaptive reward allocation for text-to-image RL
    Reinforcement Learning
    The paper proposes STAR, a spatiotemporal adaptive reward allocation method for text-to-image RL post-training, replacing a single scalar advantage applied uniformly with rewards that account for the temporal and spatial structure of generation.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Developer Tools extract
    Learning task-specific subspaces via interventional post-training of speech foundation models
    Learning task-specific subspaces in speech foundation models
    Neural Network Retrieval-Augmented Generation (RAG) Speech Processing
    Speech foundation models produce general-purpose representations encoding salient variables in a distributed way, while downstream tasks use only some variability. The paper learns task-specific subspaces via interventional post-training of speech foundation models.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    Beyond Visual Cues: CoT-Enhanced Reasoning for Semi-supervised Medical Image Segmentation
    CoT-enhanced reasoning for semi-supervised medical image segmentation
    Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning
    Semi-supervised medical image segmentation mitigates annotation scarcity via consistency regularization but relies mostly on pixel-level visual matching. The paper adds chain-of-thought-enhanced reasoning to go beyond visual cues for segmentation.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Infrastructure & Hardware extract
    Predictive Analytics in E-Commerce for CustomerBehavior Forecasting using hybrid Ret-DNN withXGBoost Model
    Hybrid Ret-DNN with XGBoost for e-commerce behavior forecasting
    Deep Learning Neural Network
    E-commerce platforms struggle to understand customer behavior and predict future purchases. The study proposes predictive analytics using a hybrid Ret-DNN combined with an XGBoost model to forecast customer behavior.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions
    ChLogic evaluates logical reasoning robustness in Chinese
    LLMs do well on standardized logical reasoning benchmarks, but whether this holds beyond English is unclear. ChLogic is an English-Chinese aligned benchmark testing whether models preserve logical reasoning when the same latent structure is expressed in Chinese.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning Models
    Dynamic rollout editing reduces overthinking in RL reasoning models
    Neural Network Reinforcement Learning Software Engineering
    Long chain-of-thought reasoning helps, but models often keep generating unnecessary reasoning after reaching a correct answer. Framing this as overthinking in GRPO-style RL post-training, the paper proposes dynamic rollout editing to reduce it.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Dimensionality Controls When Modularity Helps in Continual Learning
    Dimensionality controls when modularity helps in continual learning
    Reinforcement Learning
    Compositional learning systems must balance plasticity and stability. The paper analyzes when modularity helps in continual learning and shows that the dimensionality of representations controls whether modular structure is beneficial.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?
    GameCraft-Bench: can agents build playable games end-to-end?
    AI Agents
    Game generation is an emerging coding-agent application requiring natural-language specs to become playable interactive systems. GameCraft-Bench evaluates whether agents can build games end-to-end inside a real game engine, where scripts, scenes, assets, rendering and runtime must cohere.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • Hacker News (Front Page) · EN Developer Tools extract
    SpaceX Is Buying Cursor
    Report: SpaceX is said to be acquiring AI coding tool Cursor
    A headline report states that SpaceX is acquiring Cursor, the AI-assisted code editor. No article body, deal terms, timing, or rationale is available, so this is noted neutrally as an unverified report rather than a confirmed transaction.
    Read original (Hacker News (Front Page)) ↗
  • arXiv cs.LG (Machine Learning) · EN Infrastructure & Hardware extract
    Meta-classification of one-class classification models using ranking correlation and nearest neighbor
    Meta-classification of one-class models via ranking correlation and kNN
    Algorithms & Theory Machine Learning Meta
    ML has been applied widely, but applying ML to ML models is underexplored. Treating models as approximable by one-class classification (OCC), the paper proposes meta-classification of OCC models using ranking correlation and nearest-neighbor methods.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    WallZero: Mastering the Game of WallGo with Strategic Analysis
    WallZero masters the board game WallGo with strategic analysis
    Meta Retrieval-Augmented Generation (RAG) Reinforcement Learning
    WallGo is a recently introduced strategic board game. WallZero masters WallGo through an approach incorporating strategic analysis, demonstrating game-playing performance and strategic insights.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN Training & Fine-tuning extract
    Perceptual compensation for tonal context in self-supervised speech models
    Perceptual compensation for tonal context in self-supervised speech models
    Embeddings Retrieval-Augmented Generation (RAG) Speech Processing
    The study examines the extent to which self-supervised speech models exhibit perceptual compensation for tonal context, analyzing how context effects seen in human speech perception are reflected in the models' learned representations.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    When Multiple Scripts Matter: Evaluating ASR in Clinical Settings
    Evaluating ASR in clinical settings when multiple scripts matter
    Meta Speech Processing
    Automatic speech recognition in non-English clinical settings faces multiscript variability, where a term appears in multiple valid orthographies. String-matching metrics treat variants as errors and underestimate performance; the paper studies ASR evaluation when multiple scripts matter.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Developer Tools extract
    A Framework for Evaluating Agentic Skills at Scale
    A framework for evaluating agentic skills at scale
    AI Agents Deep Learning Reinforcement Learning
    Agent skills, structured reusable knowledge artifacts that augment LLM agents, have been rapidly adopted, yet their cross-domain impact and a reusable methodology for evaluating individual skills are lacking. The paper presents a framework for evaluating agentic skills at scale.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Developer Tools extract
    Beyond Native Success: Auditing Deployment-Interface Exposure of CLIP Backdoors
    Auditing deployment-interface exposure of CLIP backdoors
    Neural Network Reinforcement Learning
    CLIP models are reused across downstream interfaces including feature extraction, retrieval, reranking and selection. Existing CLIP backdoors are validated on small attack-native tasks; the paper audits backdoor exposure across deployment interfaces beyond native success.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Multimodal extract
    The Slop Paradox: How Synthetic Standardization Erodes Clinical Uncertainty and Cross-Modal Alignment in AI-Rewritten Radiology Reports
    The Slop Paradox: AI-rewritten radiology reports erode clinical uncertainty
    AI clinical documentation tools increasingly summarize and reformat radiology reports with LLMs. Using 450 chest X-ray reports from the Indiana University dataset, the paper measures resulting information degradation, showing erosion of clinical uncertainty and cross-modal alignment in AI-rewritten reports.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    Toward Accessible Psychotherapy Training Using AI-Driven Interactive Patient Avatars
    AI-driven patient avatars for more accessible psychotherapy training
    GPT
    Training psychotherapists in evidence-based interventions like Acceptance and Commitment Therapy needs repeated practice with feedback, limited by ethical, logistical and resource constraints. The paper introduces AI-driven interactive patient avatars to make such training more accessible.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • Hacker News (Front Page) · EN Developer Tools extract
    SpaceX to buy Cursor for $60B
    SpaceX to acquire Cursor (Anysphere) for $60B, per Reuters
    A Reuters report that SpaceX plans to acquire the AI coding tool Cursor (Anysphere) for $60 billion, drawing attention to the deal's scale and rationale. Deal value and details are per reporting and unverified.
    Read original (Hacker News (Front Page)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    LLMs Infer Cultural Context but Fail to Apply It When Responding
    LLMs infer cultural context but fail to apply it when responding
    Inference Retrieval-Augmented Generation (RAG) Reinforcement Learning Software Engineering
    LLMs are known to overrepresent dominant, often Western cultures while marginalizing others. The paper evaluates how this affects culturally adapted response generation, finding that models can infer cultural context but fail to apply it when responding.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning
    From trainee to trainer: LLM-designed RL training environments
    Gemini GPT Reinforcement Learning
    RL pipelines for LLM training often rely on manually redesigned environments between stages, forcing heuristic guesses about good configurations. The paper has the LLM itself design training environments for reinforcement learning with multi-agent reasoning, moving from trainee to trainer.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    Prompt Perturbation for Reliable LLM Evaluation over Comparison Graphs
    Prompt perturbation for reliable LLM evaluation over comparison graphs
    Evaluating LLMs is important but can be fragile to small prompt changes. The paper proposes using prompt perturbation to achieve more reliable LLM evaluation over comparison graphs.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • Lobste.rs (AI tagged) · EN Developer Tools extract
    Why adding ontologies to LLMs won't yield machine intelligence
    Talk: adding ontologies to LLMs won't yield machine intelligence
    Machine Intelligence
    A video shared via the lobste.rs AI feed argues that adding ontologies—explicit symbolic knowledge structures—onto LLMs will not yield genuine machine intelligence. It treats symbolic augmentation and LLMs' statistical processing as fundamentally distinct, concluding that ontology integration alone is insufficient for intelligence. Neutral summary based on title and context, as the excerpt is minimal.
    Read original (Lobste.rs (AI tagged)) ↗
  • Simon Willison's Weblog · EN Developer Tools extract
    Cloudflare CAPTCHA on at least one ampersand
    Cloudflare WAF: fire CAPTCHA only on search URLs with an ampersand
    Claude Reinforcement Learning
    A Simon Willison TIL: to stop crawlers hammering his faceted search engine he used Cloudflare's WAF Managed Challenge, but plain single-term searches kept triggering it. Working with Claude Code, he added a custom rule so the CAPTCHA only fires when a search URL contains at least one ampersand, letting single-keyword queries through.
    Read original (Simon Willison's Weblog) ↗
  • NVIDIA Developer Blog · EN Training & Fine-tuning extract
    Fine-Tuning Biological Foundation Models with LoRA Using NVIDIA BioNeMo Recipes
    NVIDIA details LoRA fine-tuning of biological foundation models via BioNeMo
    Fine-tuning NVIDIA
    An NVIDIA developer blog post explains how to efficiently fine-tune biological foundation models—pretrained on large protein or genomic sequence corpora, such as the ESM2 protein language model—using LoRA, illustrated with the company's BioNeMo Recipes. A technical piece on applying foundation models in computational biology.
    Read original (NVIDIA Developer Blog) ↗
  • arXiv cs.CL (Computation and Language) · EN Training & Fine-tuning extract
    The Value Axis: Language Models Encode Whether They're on the Right Track
    LLMs encode a 'value axis' tracking if their strategy works
    Fine-tuning Reinforcement Learning Reinforcement Learning from Human Feedback (RLHF)
    Researchers built a 'value axis' for Qwen3-8B that captures whether its current strategy is likely to reach its goal. The axis separates high- and low-confidence rollouts, backtracking, and correct vs. corrupted code; steering it up suppresses self-correction while steering down induces exploration. DPO can raise the internal value of rewarded behaviors.
    Read original (arXiv cs.CL (Computation and Language)) ↗