Developer Tools B

Showing 211–240 of 292
  • arXiv cs.CL (Computation and Language) · EN Multimodal extract
    Context-Aware RL for Agentic and Multimodal LLMs
    ContextRL rewards picking the right context to ground answers
    AI Agents Retrieval-Augmented Generation (RAG) Reinforcement Learning Software Engineering
    ContextRL is a context-aware RL method that improves long-horizon and multimodal reasoning via an indirect objective: instead of supervising only the final answer, it rewards selecting the context that supports a query-answer pair, encouraging fine-grained grounding. Trained on contrastive coding-trajectory and image data, it gains an average +2.2% over standard GRPO.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN Multimodal extract
    Geometric Action Model for Robot Policy Learning
    GAM reuses a geometric foundation model for robot control
    Computer Vision Reinforcement Learning
    The Geometric Action Model (GAM) is a language-conditioned manipulation policy that repurposes a pretrained geometric foundation model as a shared substrate for perception, temporal prediction, and action decoding. It splits the model at an intermediate layer: shallow layers act as an observation encoder, while a causal future predictor forecasts latent tokens from language, proprioception, and action history.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Benchmarking LLM Agents on Meta-Analysis Articles from Nature Portfolio
    A benchmark for LLM agents on Nature Portfolio meta-analyses
    AI Agents Meta Retrieval-Augmented Generation (RAG)
    This work introduces a benchmark that evaluates LLM agents on meta-analysis articles from Nature Portfolio. The article excerpt was unavailable, so this summary is limited to a neutral description based on the title.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN Developer Tools extract
    The Importance of Phase in Neural Representations: An Internal Oppenheim-Lim Test of Image Classifiers
    Phase, not magnitude, carries identity inside image classifiers
    Meta Neural Network Retrieval-Augmented Generation (RAG)
    Revisiting Oppenheim and Lim's 1981 finding that images stay recognizable from Fourier phase alone, the authors test whether trained classifiers reproduce this internally. Transplanting one image's phase onto another's magnitude, predictions in PRISM2D, GFNet, and ViT-B/16 follow the phase donor, and removing image-specific magnitude barely changes accuracy. ResNet-50 also shows a strong latent sign code when intervened before its ReLUs.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Developer Tools extract
    Learning the Geometry of Data: A Mathematical Review of Shape Space Analysis
    A mathematical review of shape space analysis for geometric data
    Computer Vision Deep Learning Machine Learning Neural Network Reinforcement Learning
    This survey synthesizes the fast-growing literature on shape space analysis, a framework for data whose observations carry rich geometric form across biology, medicine, anthropology and vision. Drawing on differential geometry, statistics and ML, it organizes the work around a shared pipeline of shape representation, parameterization and metric construction.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Developer Tools extract
    Filtered Conformal Ellipsoids for Graph-Native Time Series
    Filtered conformal ellipsoids for multivariate time-series prediction sets
    Retrieval-Augmented Generation (RAG)
    The paper builds joint prediction sets for multivariate time series via filtered conformal ellipsoids: a frozen state-space filter emits a one-step predictive mean and covariance, and split-conformal calibration on the Mahalanobis scores sets the radius. The filter picks the ellipsoid shape and calibration the scale, avoiding Gaussian tail assumptions.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    From Tokens to Policy: Causal and Interpretable Heterogeneous Treatment Effects Identification
    NEXIS identifies causal, interpretable heterogeneous treatment effects
    The paper proposes NEXIS (Neural EXposure Interaction Search), a method for causally identifying heterogeneous treatment effects (HTE) in controlled experiments. By leveraging multi-modal pre-treatment measurements and scalable representations, it reframes HTE identification as Markov-blanket discovery over a sufficient, aligned representation, aiming to ease the expressivity-interpretability trade-off.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    Analytic Torsion and Spectral Gap Capture Persistent-Laplacian Performance
    Three invariants capture persistent-Laplacian predictive power compactly
    The paper proposes a compact, fixed-length spectral representation that distills the persistent Laplacian into three invariants - Betti numbers, the spectral gap, and analytic torsion - addressing the high dimensionality and varying-length problems of the full eigenspectrum. On benchmarks like MNIST and QM-3D, it matches or exceeds full-spectrum performance while cutting computational overhead.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Infrastructure & Hardware extract
    Stable Menus of Public Goods: AI-Enabled Progress
    Study tests AI research workflows on an open economics problem
    Retrieval-Augmented Generation (RAG)
    Using an open problem from an EC 2025 paper as a testbed, the paper studies AI-for-economics research workflows. It reports that prompting with human intuition and multi-turn interaction can help, while finding an LLM slightly less effective than a first-year PhD student on the task.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Policy & Regulation extract
    Consensus-based Agentic Large Language Model Framework for Harmonized Tariff Schedule Code Classification
    Agentic LLM framework for tariff (HTS) code classification
    The paper proposes an agentic LLM framework for Canadian 10-digit Harmonized Tariff Schedule code classification in maritime logistics. It integrates multi-agent retrieval, semantic search over official tariff documents, evidence-grounded reasoning, consensus-based validation, confidence estimation, and human-in-the-loop escalation.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    Sobolev Approximation by Fixed-Size Neural Networks with Arbitrary Accuracy
    Fixed-size neural nets achieve arbitrary-accuracy Sobolev approximation
    Neural Network
    This work studies new activation functions enabling arbitrary-accuracy Sobolev approximation by fixed-size neural networks. It first shows any function in W^{2,inf} can be approximated to arbitrary accuracy in the W^{1,inf} norm via the Elementary Universal Activation Function (EUAF). To extend this to higher-order spaces W^{s,inf}, the authors introduce a smooth activation DUAF_inf and prove arbitrary-accuracy approximation in the W^{s-1,inf} norm, with sigmoidal variants constructed.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers
    Decade-long study of 56,800 AI papers finds rising code/data sharing
    Reinforcement Learning
    Analyzing 56,800 papers from five leading AI conferences over 2014-2024, the study reports that sharing both code and data rose nearly sixfold, from 11% to 64%. Based on documentation practices, it estimates reproducibility increased from 28% to 64%, with gains predating reproducibility checklists.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    How Much Do Reviews Really Contribute? A Study on Text-Enriched Matrix Factorization for Recommendations
    How much do reviews help? A study of text-enriched matrix factorization
    Embeddings Reinforcement Learning
    Incorporating textual reviews into recommender systems is a popular way to enrich collaborative signals with semantic information, yet their actual contribution remains unclear against strong collaborative baselines. The authors systematically investigate text's impact on matrix factorization by introducing and comparing three enrichment strategies over a common collaborative backbone, including a learnable gating mechanism that adaptively balances collaborative and textual signals.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Inference & Efficiency extract
    Probing Low Frame Rate Degradation in Neural Audio Codecs
    Probing why neural audio codecs degrade at low frame rates
    Inference Reinforcement Learning Speech Processing
    Low frame rates in neural audio codecs are attractive for autoregressive speech synthesis, where cost scales with sequence length. Codecs can run at 12.5 Hz and below, but the mechanisms of low-frame-rate degradation are unclear. Through a controlled frame-rate ablation, the authors reproduce a quality cliff at 6.25 Hz and test explanations, phonemic collisions and codebook saturation, finding no fundamental barrier. The cliff instead stems from suboptimal training such as fixed clip duration.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Infrastructure & Hardware extract
    Phantoms and Disclosures: a Causal Framework for Auditing Synthetic Data
    A causal auditing framework to detect synthetic-data privacy disclosures
    Generative AI Inference Reinforcement Learning
    Generative AI and LLMs have made synthetic data a popular privacy-preserving substitute for sensitive datasets, yet it can memorize and reproduce private training data. The authors propose a customizable empirical framework distinguishing "true disclosures" (direct reproduction of user data) from "phantom disclosures" (incidental generation). Using training/holdout partitioning and statistical hypothesis testing, it checks whether disclosures match strict privacy baselines like zero-learning.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Latent space mapping of interpretable structural coordinates from stochastic single-molecule signals
    Contrastive latent mapping of nanopore signals into molecular coordinates
    Nanopores are versatile single-molecule sensors, but stochastic translocation dynamics warp encoded information, limiting their utility. The paper shifts from time-domain analysis to a learned latent-space mapping via a contrastive encoder trained only on simulated signals from a physics-informed model. It maps nanopore signals of engineered DNA barcodes into an interpretable molecular coordinate system that responds to structural parameters but stays invariant to acquisition conditions.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • NVIDIA Developer Blog · EN Developer Tools extract
    Boosting MoE Training Throughput with Advanced Fusion Kernels
    NVIDIA details advanced fusion kernels to boost MoE training throughput
    Deep Learning Generative AI Machine Learning Mixture of Experts (MoE) NVIDIA
    On its developer blog, NVIDIA explains advanced fusion-kernel techniques aimed at boosting training throughput for Mixture-of-Experts (MoE) models. Noting that MoE has rapidly become a foundational component of modern large-scale AI systems, the post outlines kernel-level optimizations for more efficient training.
    Read original (NVIDIA Developer Blog) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Inference & Efficiency extract
    A Causal Model of Theory of Mind in Conflict for Artificial Intelligence
    A structural causal model for when AI should engage theory of mind in conflict
    Inference
    Theory of mind (ToM), ascribing mental states to others for prediction and inference, is widely assumed essential for human-machine integration. Existing AI-ToM models address how to mentalize but leave when largely unaddressed. The paper asks under what situational and agent-level conditions ToM engagement is causally warranted in conflict, presenting a structural causal model as a directed acyclic graph that treats ToM as a mechanism activated by conditions rather than an always-on capacity.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    A nonparametric two-sample test using a parametric integral probability metric
    A nonparametric two-sample test via a single-node parametric IPM
    Machine Learning Neural Network Reinforcement Learning
    Detecting distributional differences between two independent samples is fundamental in statistics and machine learning. Nonparametric two-sample testing decides whether two samples come from the same distribution without assuming a parametric form. The paper proposes a new test statistic based on an integral probability metric (IPM) defined via a specially designed parametric discriminator class using a single neural-network node, and analyzes the resulting test's properties.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Developer Tools extract
    CrossMaps: Confidence-Aware Open-Vocabulary Semantic Mapping for Rover Navigation
    CrossMaps: confidence-aware open-vocabulary semantic mapping for rovers
    Embeddings
    Rovers rely on perception to maintain spatial maps encoding objects and sensor quality (range reliability, lighting artifacts, data density) to guide fusion, embedding updates, and navigation under partial observability. The paper presents CrossMaps, a real-time confidence-aware open-vocabulary semantic mapping pipeline that builds language-queryable maps from RGB-D data, extending VLMaps-style approaches with multi-scale CLIP embeddings, confidence-aware fusion, and a dual-memory architecture.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    MA-SBI: Misspecification-Aware Simulation-Based Inference via Side-Channel Guidance
    MA-SBI: misspecification-aware inference via side-channel guidance
    Inference Neural Network Reinforcement Learning
    Simulation-based inference (SBI) is often hindered by simulator misspecification, the mismatch between simulated and real observations. The recent robust method RoPE uses optimal transport between learned representations but needs ground-truth calibration pairs unavailable where SBI is needed. Practitioners instead have unstructured side-information such as regime labels, instruction text, and policy bulletins. The authors propose Misspecification-Aware SBI (MA-SBI) to exploit this guidance.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    IMPACTeen: Intentions, Manipulation, Persuasion, Annotations, and Consequences in Teen Communication Dataset
    IMPACTeen: a teen-context dataset of social-influence scenarios and labels
    Neural Network
    The paper introduces IMPACTeen, a dataset of textual social-influence scenarios in adolescent interpersonal, media, and digital settings. It contains 1,021 texts and 5,100 annotation records labeled from five perspectives (teens, parents, psychologists, communication experts, teachers), built via constrained LLM generation plus two-step human editing, with Polish and English versions. Summarized neutrally from the abstract.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Inference & Efficiency extract
    LESS Is More: Mutual-Stability Sampling for Diffusion Language Models
    LESS: a training-free adaptive sampler for diffusion language models
    Deep Learning Inference Neural Network Retrieval-Augmented Generation (RAG) Transformer
    The paper presents LESS, a training-free, model-agnostic adaptive sampler for diffusion LLMs that frames token commitment as an online stopping problem. Its mutual-stability rule unmasks a position only when its top-1 prediction is confident, persists across recent steps, and is distributionally stable (top-K inter-step JS divergence). It is evaluated on Dream-7B, LLaDA-8B, and LLaDA-1.5-8B. Summarized neutrally from the abstract.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Training & Fine-tuning extract
    Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences
    LOGOS: a general-purpose generative foundation model for natural sciences
    Neural Network
    This report presents LOGOS (Language Of Generative Objects in Science), a generative language model unifying heterogeneous natural-science tasks in one autoregressive framework over a shared scientific grammar. It encodes scientific objects and their spatial contacts/constraints as discrete tokens, casting tasks as next-token prediction without explicit coordinates or geometric networks, and reportedly matches or beats domain-specific baselines. Summarized neutrally from the abstract.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    Factorized Neural Operators Decompose Dynamic and Persistent Responses
    FaNO: factorized neural operators splitting dynamic and persistent responses
    Deep Learning
    Physical systems often combine fast-evolving dynamics with persistent structures, which existing neural operators struggle to capture because a single dominant inductive bias couples distinct responses into one representation. The authors introduce a unified Green's-function framework and propose Factorized Neural Operators (FaNO), decomposing spectral representations into equivariant dynamic responses and invariant persistent responses to better model multiscale physical behavior.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Training & Fine-tuning extract
    Fantastic Pretraining Optimizers and Where to Find Them II: Hyperball Optimization
    Hyperball: an optimizer wrapper fixing Frobenius norms to speed up pretraining
    Matrix-based optimizers like Muon accelerate LLM pretraining, but their edge over AdamW shrinks at larger model and data scales under standard constant decoupled weight decay. The paper proposes Hyperball, a simple wrapper that fixes the Frobenius norms of weight matrices and their optimizer updates to constants. On Qwen3-style models up to 1.2B parameters, Muon-Hyperball reports a 20-30% token-equivalent speedup over weight-decay baselines.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    Contrastive-Difference CKA Reveals Concept-Specific Structural Alignment Across Language Model Architectures
    CKA_Delta reveals concept-specific alignment across LLM architectures
    Neural Network
    An arXiv paper introduces contrastive-difference CKA (CKA_Delta), a training-free diagnostic, to characterize whether different LLM architectures encode high-level concepts compatibly. It reports a geometric-functional universality dissociation: moderate geometric convergence alongside near-perfect functional transfer. Neutral, abstract-based summary.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN Developer Tools extract
    Beyond Weights and Gradients: A Taxonomy of Federated Learning Messages
    A formal definition and taxonomy of federated learning messages
    Deep Learning
    Federated learning now exchanges more than weights and gradients, including synthetic data and analytics. This paper gives a formal mathematical definition of a federated message capturing utility and privacy, and a taxonomy of three categories—model structures, statistical summaries, and data-conditioned representations—evaluated on compute, communication, and privacy. A review of 202 papers shows a shift toward diverse messaging.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Compositional Reasoning Depth Predicts Clinical AI Failure: Empirical Evidence Consistent with Transformer Compositionality Limits in Electronic Health Record Question Answering
    Reasoning hop-count predicts clinical AI failure in EHR QA
    Claude GPT OpenAI Software Engineering Transformer
    An arXiv paper shows that in electronic health record (EHR) question answering, questions needing more inferential hops yield disproportionately more LLM errors. Using a pre-specified hop-count taxonomy, it links this failure structure to theoretical limits on transformer compositionality. Neutral, abstract-based summary.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Integrated Marketing Attribution: A Bayesian Framework for Privacy-Safe Granular Measurement Anchored in MMM
    IMA fuses MMM and Bayesian attribution for privacy-safe measurement
    Neural Network Retrieval-Augmented Generation (RAG)
    Retail marketing needs granular, campaign-level insight without user-level tracking, yet MMM is too coarse and MTA is unreliable under privacy limits. Integrated Marketing Attribution (IMA) combines MMM with channel-specific Bayesian attribution models, using MMM-informed priors to deliver granular, privacy-safe attribution consistent with MMM.
    Read original (arXiv cs.LG (Machine Learning)) ↗