Developer Tools B
Showing 211–240 of 292
-
Context-Aware RL for Agentic and Multimodal LLMsContextRL rewards picking the right context to ground answersContextRL is a context-aware RL method that improves long-horizon and multimodal reasoning via an indirect objective: instead of supervising only the final answer, it rewards selecting the context that supports a query-answer pair, encouraging fine-grained grounding. Trained on contrastive coding-trajectory and image data, it gains an average +2.2% over standard GRPO.
-
Geometric Action Model for Robot Policy LearningGAM reuses a geometric foundation model for robot controlThe Geometric Action Model (GAM) is a language-conditioned manipulation policy that repurposes a pretrained geometric foundation model as a shared substrate for perception, temporal prediction, and action decoding. It splits the model at an intermediate layer: shallow layers act as an observation encoder, while a causal future predictor forecasts latent tokens from language, proprioception, and action history.
-
Benchmarking LLM Agents on Meta-Analysis Articles from Nature PortfolioA benchmark for LLM agents on Nature Portfolio meta-analysesThis work introduces a benchmark that evaluates LLM agents on meta-analysis articles from Nature Portfolio. The article excerpt was unavailable, so this summary is limited to a neutral description based on the title.
-
The Importance of Phase in Neural Representations: An Internal Oppenheim-Lim Test of Image ClassifiersPhase, not magnitude, carries identity inside image classifiersRevisiting Oppenheim and Lim's 1981 finding that images stay recognizable from Fourier phase alone, the authors test whether trained classifiers reproduce this internally. Transplanting one image's phase onto another's magnitude, predictions in PRISM2D, GFNet, and ViT-B/16 follow the phase donor, and removing image-specific magnitude barely changes accuracy. ResNet-50 also shows a strong latent sign code when intervened before its ReLUs.
-
Learning the Geometry of Data: A Mathematical Review of Shape Space AnalysisA mathematical review of shape space analysis for geometric dataThis survey synthesizes the fast-growing literature on shape space analysis, a framework for data whose observations carry rich geometric form across biology, medicine, anthropology and vision. Drawing on differential geometry, statistics and ML, it organizes the work around a shared pipeline of shape representation, parameterization and metric construction.
-
Filtered Conformal Ellipsoids for Graph-Native Time SeriesFiltered conformal ellipsoids for multivariate time-series prediction setsThe paper builds joint prediction sets for multivariate time series via filtered conformal ellipsoids: a frozen state-space filter emits a one-step predictive mean and covariance, and split-conformal calibration on the Mahalanobis scores sets the radius. The filter picks the ellipsoid shape and calibration the scale, avoiding Gaussian tail assumptions.
-
From Tokens to Policy: Causal and Interpretable Heterogeneous Treatment Effects IdentificationNEXIS identifies causal, interpretable heterogeneous treatment effectsThe paper proposes NEXIS (Neural EXposure Interaction Search), a method for causally identifying heterogeneous treatment effects (HTE) in controlled experiments. By leveraging multi-modal pre-treatment measurements and scalable representations, it reframes HTE identification as Markov-blanket discovery over a sufficient, aligned representation, aiming to ease the expressivity-interpretability trade-off.
-
Analytic Torsion and Spectral Gap Capture Persistent-Laplacian PerformanceThree invariants capture persistent-Laplacian predictive power compactlyThe paper proposes a compact, fixed-length spectral representation that distills the persistent Laplacian into three invariants - Betti numbers, the spectral gap, and analytic torsion - addressing the high dimensionality and varying-length problems of the full eigenspectrum. On benchmarks like MNIST and QM-3D, it matches or exceeds full-spectrum performance while cutting computational overhead.
-
Stable Menus of Public Goods: AI-Enabled ProgressStudy tests AI research workflows on an open economics problemUsing an open problem from an EC 2025 paper as a testbed, the paper studies AI-for-economics research workflows. It reports that prompting with human intuition and multi-turn interaction can help, while finding an LLM slightly less effective than a first-year PhD student on the task.
-
Consensus-based Agentic Large Language Model Framework for Harmonized Tariff Schedule Code ClassificationAgentic LLM framework for tariff (HTS) code classificationThe paper proposes an agentic LLM framework for Canadian 10-digit Harmonized Tariff Schedule code classification in maritime logistics. It integrates multi-agent retrieval, semantic search over official tariff documents, evidence-grounded reasoning, consensus-based validation, confidence estimation, and human-in-the-loop escalation.
-
Sobolev Approximation by Fixed-Size Neural Networks with Arbitrary AccuracyFixed-size neural nets achieve arbitrary-accuracy Sobolev approximationThis work studies new activation functions enabling arbitrary-accuracy Sobolev approximation by fixed-size neural networks. It first shows any function in W^{2,inf} can be approximated to arbitrary accuracy in the W^{1,inf} norm via the Elementary Universal Activation Function (EUAF). To extend this to higher-order spaces W^{s,inf}, the authors introduce a smooth activation DUAF_inf and prove arbitrary-accuracy approximation in the W^{s-1,inf} norm, with sigmoidal variants constructed.
-
The embrace of open science: An analysis of a decade of AI research and 56 800 conference papersDecade-long study of 56,800 AI papers finds rising code/data sharingAnalyzing 56,800 papers from five leading AI conferences over 2014-2024, the study reports that sharing both code and data rose nearly sixfold, from 11% to 64%. Based on documentation practices, it estimates reproducibility increased from 28% to 64%, with gains predating reproducibility checklists.
-
How Much Do Reviews Really Contribute? A Study on Text-Enriched Matrix Factorization for RecommendationsHow much do reviews help? A study of text-enriched matrix factorizationIncorporating textual reviews into recommender systems is a popular way to enrich collaborative signals with semantic information, yet their actual contribution remains unclear against strong collaborative baselines. The authors systematically investigate text's impact on matrix factorization by introducing and comparing three enrichment strategies over a common collaborative backbone, including a learnable gating mechanism that adaptively balances collaborative and textual signals.
-
Probing Low Frame Rate Degradation in Neural Audio CodecsProbing why neural audio codecs degrade at low frame ratesLow frame rates in neural audio codecs are attractive for autoregressive speech synthesis, where cost scales with sequence length. Codecs can run at 12.5 Hz and below, but the mechanisms of low-frame-rate degradation are unclear. Through a controlled frame-rate ablation, the authors reproduce a quality cliff at 6.25 Hz and test explanations, phonemic collisions and codebook saturation, finding no fundamental barrier. The cliff instead stems from suboptimal training such as fixed clip duration.
-
Phantoms and Disclosures: a Causal Framework for Auditing Synthetic DataA causal auditing framework to detect synthetic-data privacy disclosuresGenerative AI and LLMs have made synthetic data a popular privacy-preserving substitute for sensitive datasets, yet it can memorize and reproduce private training data. The authors propose a customizable empirical framework distinguishing "true disclosures" (direct reproduction of user data) from "phantom disclosures" (incidental generation). Using training/holdout partitioning and statistical hypothesis testing, it checks whether disclosures match strict privacy baselines like zero-learning.
-
Latent space mapping of interpretable structural coordinates from stochastic single-molecule signalsContrastive latent mapping of nanopore signals into molecular coordinatesNanopores are versatile single-molecule sensors, but stochastic translocation dynamics warp encoded information, limiting their utility. The paper shifts from time-domain analysis to a learned latent-space mapping via a contrastive encoder trained only on simulated signals from a physics-informed model. It maps nanopore signals of engineered DNA barcodes into an interpretable molecular coordinate system that responds to structural parameters but stays invariant to acquisition conditions.
-
Boosting MoE Training Throughput with Advanced Fusion KernelsNVIDIA details advanced fusion kernels to boost MoE training throughputOn its developer blog, NVIDIA explains advanced fusion-kernel techniques aimed at boosting training throughput for Mixture-of-Experts (MoE) models. Noting that MoE has rapidly become a foundational component of modern large-scale AI systems, the post outlines kernel-level optimizations for more efficient training.
-
A Causal Model of Theory of Mind in Conflict for Artificial IntelligenceA structural causal model for when AI should engage theory of mind in conflictTheory of mind (ToM), ascribing mental states to others for prediction and inference, is widely assumed essential for human-machine integration. Existing AI-ToM models address how to mentalize but leave when largely unaddressed. The paper asks under what situational and agent-level conditions ToM engagement is causally warranted in conflict, presenting a structural causal model as a directed acyclic graph that treats ToM as a mechanism activated by conditions rather than an always-on capacity.
-
A nonparametric two-sample test using a parametric integral probability metricA nonparametric two-sample test via a single-node parametric IPMDetecting distributional differences between two independent samples is fundamental in statistics and machine learning. Nonparametric two-sample testing decides whether two samples come from the same distribution without assuming a parametric form. The paper proposes a new test statistic based on an integral probability metric (IPM) defined via a specially designed parametric discriminator class using a single neural-network node, and analyzes the resulting test's properties.
-
CrossMaps: Confidence-Aware Open-Vocabulary Semantic Mapping for Rover NavigationCrossMaps: confidence-aware open-vocabulary semantic mapping for roversRovers rely on perception to maintain spatial maps encoding objects and sensor quality (range reliability, lighting artifacts, data density) to guide fusion, embedding updates, and navigation under partial observability. The paper presents CrossMaps, a real-time confidence-aware open-vocabulary semantic mapping pipeline that builds language-queryable maps from RGB-D data, extending VLMaps-style approaches with multi-scale CLIP embeddings, confidence-aware fusion, and a dual-memory architecture.
-
MA-SBI: Misspecification-Aware Simulation-Based Inference via Side-Channel GuidanceMA-SBI: misspecification-aware inference via side-channel guidanceSimulation-based inference (SBI) is often hindered by simulator misspecification, the mismatch between simulated and real observations. The recent robust method RoPE uses optimal transport between learned representations but needs ground-truth calibration pairs unavailable where SBI is needed. Practitioners instead have unstructured side-information such as regime labels, instruction text, and policy bulletins. The authors propose Misspecification-Aware SBI (MA-SBI) to exploit this guidance.
-
IMPACTeen: Intentions, Manipulation, Persuasion, Annotations, and Consequences in Teen Communication DatasetIMPACTeen: a teen-context dataset of social-influence scenarios and labelsThe paper introduces IMPACTeen, a dataset of textual social-influence scenarios in adolescent interpersonal, media, and digital settings. It contains 1,021 texts and 5,100 annotation records labeled from five perspectives (teens, parents, psychologists, communication experts, teachers), built via constrained LLM generation plus two-step human editing, with Polish and English versions. Summarized neutrally from the abstract.
-
LESS Is More: Mutual-Stability Sampling for Diffusion Language ModelsLESS: a training-free adaptive sampler for diffusion language modelsThe paper presents LESS, a training-free, model-agnostic adaptive sampler for diffusion LLMs that frames token commitment as an online stopping problem. Its mutual-stability rule unmasks a position only when its top-1 prediction is confident, persists across recent steps, and is distributionally stable (top-K inter-step JS divergence). It is evaluated on Dream-7B, LLaDA-8B, and LLaDA-1.5-8B. Summarized neutrally from the abstract.
-
Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural SciencesLOGOS: a general-purpose generative foundation model for natural sciencesThis report presents LOGOS (Language Of Generative Objects in Science), a generative language model unifying heterogeneous natural-science tasks in one autoregressive framework over a shared scientific grammar. It encodes scientific objects and their spatial contacts/constraints as discrete tokens, casting tasks as next-token prediction without explicit coordinates or geometric networks, and reportedly matches or beats domain-specific baselines. Summarized neutrally from the abstract.
-
Factorized Neural Operators Decompose Dynamic and Persistent ResponsesFaNO: factorized neural operators splitting dynamic and persistent responsesPhysical systems often combine fast-evolving dynamics with persistent structures, which existing neural operators struggle to capture because a single dominant inductive bias couples distinct responses into one representation. The authors introduce a unified Green's-function framework and propose Factorized Neural Operators (FaNO), decomposing spectral representations into equivariant dynamic responses and invariant persistent responses to better model multiscale physical behavior.
-
Fantastic Pretraining Optimizers and Where to Find Them II: Hyperball OptimizationHyperball: an optimizer wrapper fixing Frobenius norms to speed up pretrainingMatrix-based optimizers like Muon accelerate LLM pretraining, but their edge over AdamW shrinks at larger model and data scales under standard constant decoupled weight decay. The paper proposes Hyperball, a simple wrapper that fixes the Frobenius norms of weight matrices and their optimizer updates to constants. On Qwen3-style models up to 1.2B parameters, Muon-Hyperball reports a 20-30% token-equivalent speedup over weight-decay baselines.
-
Contrastive-Difference CKA Reveals Concept-Specific Structural Alignment Across Language Model ArchitecturesCKA_Delta reveals concept-specific alignment across LLM architecturesAn arXiv paper introduces contrastive-difference CKA (CKA_Delta), a training-free diagnostic, to characterize whether different LLM architectures encode high-level concepts compatibly. It reports a geometric-functional universality dissociation: moderate geometric convergence alongside near-perfect functional transfer. Neutral, abstract-based summary.
-
Beyond Weights and Gradients: A Taxonomy of Federated Learning MessagesA formal definition and taxonomy of federated learning messagesFederated learning now exchanges more than weights and gradients, including synthetic data and analytics. This paper gives a formal mathematical definition of a federated message capturing utility and privacy, and a taxonomy of three categories—model structures, statistical summaries, and data-conditioned representations—evaluated on compute, communication, and privacy. A review of 202 papers shows a shift toward diverse messaging.
-
Compositional Reasoning Depth Predicts Clinical AI Failure: Empirical Evidence Consistent with Transformer Compositionality Limits in Electronic Health Record Question AnsweringReasoning hop-count predicts clinical AI failure in EHR QAAn arXiv paper shows that in electronic health record (EHR) question answering, questions needing more inferential hops yield disproportionately more LLM errors. Using a pre-specified hop-count taxonomy, it links this failure structure to theoretical limits on transformer compositionality. Neutral, abstract-based summary.
-
Integrated Marketing Attribution: A Bayesian Framework for Privacy-Safe Granular Measurement Anchored in MMMIMA fuses MMM and Bayesian attribution for privacy-safe measurementRetail marketing needs granular, campaign-level insight without user-level tracking, yet MMM is too coarse and MTA is unreliable under privacy limits. Integrated Marketing Attribution (IMA) combines MMM with channel-specific Bayesian attribution models, using MMM-informed priors to deliver granular, privacy-safe attribution consistent with MMM.