New Model Releases A

Showing 241–267 of 267
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    CORA: Analyzing and bridging thinking-answer gap in Multimodal RLVR via Consistency-Oriented Reasoning Alignment
    CORA aligns reasoning and answers in multimodal RLVR
    Computer Vision Inference Retrieval-Augmented Generation (RAG) Reinforcement Learning Software Engineering
    CORA analyzes the gap between a model's reasoning and its final answer when extending verifiable-reward RL to multimodal settings. It proposes consistency-oriented reasoning alignment to bridge that gap.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    A Complexity Measure for Active Learning in Multi-group Mean Estimation
    A complexity measure for active multi-group mean estimation
    The paper studies active learning for multi-group mean estimation framed as a d-armed bandit minimizing max-risk. It introduces a complexity measure characterizing the difficulty of adaptive budget allocation.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    Optimal Hidden-Target Learning for Online Inventory Optimization on General Convex Sets
    Optimal hidden-target learning for online inventory optimization
    The work casts online inventory optimization as online convex optimization with memory, where carryover makes the feasible set history-dependent. It develops an optimal hidden-target learning method on general convex sets.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    AgentSpec: Understanding Embodied Agent Scaffolds Through Controlled Composition
    AgentSpec dissects embodied agent scaffolds via controlled composition
    AI Agents Machine Learning Reinforcement Learning
    AgentSpec studies scaffolded LLM agents that combine reasoning, memory, reflection, and action through controlled composition. It aims to isolate how each component contributes to overall performance.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    Towards Direct Latent-Space Synthesis for Parallel Branches in LLM-Agent Workflows
    Direct latent-space synthesis for parallel branches in LLM-agent workflows
    AI Agents Neural Network
    LLMs serve as execution engines for agentic systems yet still consume context through a sequential text interface, mismatching modern structured workflows with independent parallel branches. The paper explores synthesizing such parallel branches directly in latent space.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    When to Write and When to Suppress: Route-Specialized Dual Adapters for Memory-Assisted Knowledge Editing
    Route-specialized dual adapters for memory-assisted knowledge editing
    Embeddings Inference Llama
    This work targets knowledge editing that updates selected facts while preserving nearby behavior in a memory-assisted setting. It proposes route-specialized dual adapters that decide when to write and when to suppress edits.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    Giving AI a Headache: Acoustic Adversarial Attacks to Computer Vision Applications
    Acoustic adversarial attacks that disrupt computer vision systems
    Computer Vision Deep Learning Reinforcement Learning
    As AI automates real-world computer vision applications such as autonomous vehicle control, this paper demonstrates acoustic adversarial attacks that can disrupt CV systems, highlighting a new physical, sound-based attack surface.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    Abstracting Cross-Domain Action Sequences into Interpretable Workflows
    Abstracting cross-domain action sequences into interpretable workflows
    Deep Learning Inference Microsoft Reinforcement Learning
    Time-stamped interaction logs objectively record digital app usage, but their granularity and noise obscure meaningful insights into work. The paper proposes abstracting cross-domain action sequences into interpretable workflows.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    Online Convex Optimization with Sublinear Noisy Probes
    Online convex optimization using sublinear noisy probes
    Machine Learning
    The paper studies online convex optimization over a convex set where the learner may use only a sublinear number of noisy probes. It provides theoretical guarantees under this limited-probe setting.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    Expert-Driven Survival Machines: Improving Stratification and Interpretability in Multiple Clinical Cohorts
    Expert-driven survival machines for stratification across clinical cohorts
    Mixture of Experts (MoE) Neural Network Reinforcement Learning
    Survival prediction is central for healthcare providers and clinical researchers. The paper introduces expert-driven survival machines that improve risk stratification and interpretability across multiple clinical cohorts.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    LoSoNA: A Benchmark for Local Social Norm Adaptation in Group Conversations
    LoSoNA benchmarks local social norm adaptation in group chats
    AI Agents Claude Gemini Software Engineering
    Online group chats have rarely-stated local conversational norms. LoSoNA is a benchmark measuring whether LLM-based agents can recognize and adapt to these local social norms.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    Cluster LOCO: Feature Importance For Interpreting Clusters
    Cluster LOCO gives feature importance to interpret clusters
    Algorithms & Theory
    Clustering is widely used but its outputs are hard to interpret and audit. Cluster LOCO provides feature-importance scores to explain what distinguishes each cluster.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    Sensitivity Shaping for Latent Modeling
    Sensitivity shaping for detecting OOD transitions in dynamics models
    Neural Network
    Generative dynamics models enable planning in challenging robotic systems, but safe deployment requires reliably detecting policy-induced out-of-distribution transitions. The paper proposes sensitivity shaping for latent modeling to improve such OOD detection.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    A Temporal Planning Framework for Disruption Aware Dynamic Route Optimization in Heterogeneous Railway Systems
    A temporal planning framework for disruption-aware railway routing
    Deep Learning Meta Neural Network Reinforcement Learning
    Route optimization is vital for safety and punctuality in railway operations, especially in heterogeneous multi-gauge networks. The paper proposes a temporal planning framework for disruption-aware dynamic route optimization.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    CARE: Controlling LLM-Generated Policies through Auditable Review of Evidence in Scientific Experimentation
    CARE: auditable evidence review to control LLM-generated policies
    Machine Learning
    Giving LLMs direct control over costly, irreversible experiments invites unsafe exploration, while discarding their creativity sacrifices optimization. CARE controls LLM-generated policies through auditable review of evidence in scientific experimentation.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    SIMMER: Benchmarking Latent Failures in LLM Executable Planning with a World Model
    SIMMER: benchmarking latent failures in LLM executable planning
    AI Agents Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning
    LLMs are increasingly deployed as planners for autonomous agents in household environments. Whereas existing benchmarks only check whether generated plans execute, SIMMER uses a world model to benchmark their latent failures.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    StreamMemBench: Streaming Evaluation of Agent Memory for Future-Oriented Assistance
    StreamMemBench: streaming evaluation of agent memory for assistance
    A core role of personal-agent memory is turning stored information and prior interactions into future-oriented assistance. StreamMemBench provides a streaming evaluation of agent memory using cues from what the agent observes and how users interact.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    Regional Climate Model Emulation with Diffusion Approaches: What is the Added Value of Generative Machine Learning?
    Added value of diffusion-based generative ML for climate model emulation
    Deep Learning Machine Learning Neural Network Reinforcement Learning
    Emulators cheaply reproduce regional climate models' downscaling, linking global-model predictors to high-resolution fields. The paper assesses the added value of diffusion-based generative machine learning for regional climate model emulation.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    ORCA: A Platform for Open-Source Dexterity Research
    ORCA: an open-source platform for dexterity research
    Neural Network Retrieval-Augmented Generation (RAG) Robotics
    Two-finger grippers dominate manipulation research but are limited by their form factor. ORCA is an open-source platform to support research on more dexterous robotic manipulation.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    TRACE: Trajectory-Routed Causal Memory for Delayed-Evidence Visuomotor Imitation
    TRACE: trajectory-routed causal memory for delayed-evidence imitation
    Reinforcement Learning
    Autonomous robots may need decisions based on evidence no longer visible. For delayed-evidence tasks, where an early cue disappears before a later decision, TRACE introduces trajectory-routed causal memory for visuomotor imitation.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    The Risk Shadow of Principal Component Analysis: When 99.9999% Variance Preservation Causes Catastrophic Decision Errors
    PCA's risk shadow: variance preservation can hide catastrophic risk
    Reinforcement Learning
    PCA preserves variance, not the information needed to detect rare catastrophic events. The paper proves a risk shadow where even very high variance preservation can cause catastrophic decision errors.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN Training & Fine-tuning extract
    BayLing-Duplex: Native Full-Duplex Speech Dialogue with a Single Autoregressive LLM
    BayLing-Duplex: native full-duplex speech dialogue from one LLM
    Deep Learning Fine-tuning Llama Reinforcement Learning from Human Feedback (RLHF) Speech Processing
    BayLing-Duplex enables native full-duplex speech interaction with a single autoregressive LLM, letting it listen and speak simultaneously. It handles natural phenomena such as overlap and hesitation.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    From Shield to Target: Denial-of-Service Attacks on LLM-Based Agent Guardrails
    From shield to target: DoS attacks on LLM-based agent guardrails
    AI Agents Claude DeepSeek Gemini GPT
    LLM-based guardrails effectively defend autonomous agents against prompt injection and jailbreaks. The paper reveals that the very reasoning and instruction-following abilities enabling this defense can be turned into denial-of-service attacks against the guardrails.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Every Eval Ever: A Unifying Schema and Community Repository for AI Evaluation Results
    Every Eval Ever: a unifying schema and repository for AI evaluations
    Meta Neural Network
    AI evaluations are widely used to track progress, but inconsistencies across evaluators hinder analysis and comparison. The paper proposes a unifying schema and a community repository, Every Eval Ever, for AI evaluation results.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    PepALD: Macrocyclic Peptide Generation via Autoregressive Latent Diffusion
    PepALD generates macrocyclic peptides via autoregressive latent diffusion
    Embeddings Neural Network
    Macrocyclic peptides are promising for intracellular targets but require joint control of non-natural chemistry, ring topology, and permeability. PepALD generates them using autoregressive latent diffusion.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    GitOfThoughts: Version-Controlled Reasoning and Agent Memory You Can Replay, Diff, and Merge
    GitOfThoughts: version-controlled reasoning and agent memory
    AI Agents Neural Network Reinforcement Learning Software Engineering
    LLM reasoning is ephemeral: chains of thought vanish, pruned branches leave no record, and memory cannot be diffed or merged. GitOfThoughts makes reasoning and agent memory version-controlled, so it can be replayed, diffed, and merged.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    The Perceived Fragility of Explanations in Audio Models: Manipulation of Attribution with Unchanged Predictions
    Manipulating audio-model explanations while predictions stay unchanged
    Retrieval-Augmented Generation (RAG)
    The paper investigates the fragility of post-hoc explanations in audio deepfake detection. Introducing a psychoacoustic framework beyond image-style Lp metrics, it shows attributions can be manipulated while predictions remain unchanged.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗