New Model Releases A
Showing 241–267 of 267
-
CORA: Analyzing and bridging thinking-answer gap in Multimodal RLVR via Consistency-Oriented Reasoning AlignmentCORA aligns reasoning and answers in multimodal RLVRCORA analyzes the gap between a model's reasoning and its final answer when extending verifiable-reward RL to multimodal settings. It proposes consistency-oriented reasoning alignment to bridge that gap.
-
A Complexity Measure for Active Learning in Multi-group Mean EstimationA complexity measure for active multi-group mean estimationThe paper studies active learning for multi-group mean estimation framed as a d-armed bandit minimizing max-risk. It introduces a complexity measure characterizing the difficulty of adaptive budget allocation.
-
Optimal Hidden-Target Learning for Online Inventory Optimization on General Convex SetsOptimal hidden-target learning for online inventory optimizationThe work casts online inventory optimization as online convex optimization with memory, where carryover makes the feasible set history-dependent. It develops an optimal hidden-target learning method on general convex sets.
-
AgentSpec: Understanding Embodied Agent Scaffolds Through Controlled CompositionAgentSpec dissects embodied agent scaffolds via controlled compositionAgentSpec studies scaffolded LLM agents that combine reasoning, memory, reflection, and action through controlled composition. It aims to isolate how each component contributes to overall performance.
-
Towards Direct Latent-Space Synthesis for Parallel Branches in LLM-Agent WorkflowsDirect latent-space synthesis for parallel branches in LLM-agent workflowsLLMs serve as execution engines for agentic systems yet still consume context through a sequential text interface, mismatching modern structured workflows with independent parallel branches. The paper explores synthesizing such parallel branches directly in latent space.
-
When to Write and When to Suppress: Route-Specialized Dual Adapters for Memory-Assisted Knowledge EditingRoute-specialized dual adapters for memory-assisted knowledge editingThis work targets knowledge editing that updates selected facts while preserving nearby behavior in a memory-assisted setting. It proposes route-specialized dual adapters that decide when to write and when to suppress edits.
-
Giving AI a Headache: Acoustic Adversarial Attacks to Computer Vision ApplicationsAcoustic adversarial attacks that disrupt computer vision systemsAs AI automates real-world computer vision applications such as autonomous vehicle control, this paper demonstrates acoustic adversarial attacks that can disrupt CV systems, highlighting a new physical, sound-based attack surface.
-
Abstracting Cross-Domain Action Sequences into Interpretable WorkflowsAbstracting cross-domain action sequences into interpretable workflowsTime-stamped interaction logs objectively record digital app usage, but their granularity and noise obscure meaningful insights into work. The paper proposes abstracting cross-domain action sequences into interpretable workflows.
-
Online Convex Optimization with Sublinear Noisy ProbesOnline convex optimization using sublinear noisy probesThe paper studies online convex optimization over a convex set where the learner may use only a sublinear number of noisy probes. It provides theoretical guarantees under this limited-probe setting.
-
Expert-Driven Survival Machines: Improving Stratification and Interpretability in Multiple Clinical CohortsExpert-driven survival machines for stratification across clinical cohortsSurvival prediction is central for healthcare providers and clinical researchers. The paper introduces expert-driven survival machines that improve risk stratification and interpretability across multiple clinical cohorts.
-
LoSoNA: A Benchmark for Local Social Norm Adaptation in Group ConversationsLoSoNA benchmarks local social norm adaptation in group chatsOnline group chats have rarely-stated local conversational norms. LoSoNA is a benchmark measuring whether LLM-based agents can recognize and adapt to these local social norms.
-
Cluster LOCO: Feature Importance For Interpreting ClustersCluster LOCO gives feature importance to interpret clustersClustering is widely used but its outputs are hard to interpret and audit. Cluster LOCO provides feature-importance scores to explain what distinguishes each cluster.
-
Sensitivity Shaping for Latent ModelingSensitivity shaping for detecting OOD transitions in dynamics modelsGenerative dynamics models enable planning in challenging robotic systems, but safe deployment requires reliably detecting policy-induced out-of-distribution transitions. The paper proposes sensitivity shaping for latent modeling to improve such OOD detection.
-
A Temporal Planning Framework for Disruption Aware Dynamic Route Optimization in Heterogeneous Railway SystemsA temporal planning framework for disruption-aware railway routingRoute optimization is vital for safety and punctuality in railway operations, especially in heterogeneous multi-gauge networks. The paper proposes a temporal planning framework for disruption-aware dynamic route optimization.
-
CARE: Controlling LLM-Generated Policies through Auditable Review of Evidence in Scientific ExperimentationCARE: auditable evidence review to control LLM-generated policiesGiving LLMs direct control over costly, irreversible experiments invites unsafe exploration, while discarding their creativity sacrifices optimization. CARE controls LLM-generated policies through auditable review of evidence in scientific experimentation.
-
SIMMER: Benchmarking Latent Failures in LLM Executable Planning with a World ModelSIMMER: benchmarking latent failures in LLM executable planningLLMs are increasingly deployed as planners for autonomous agents in household environments. Whereas existing benchmarks only check whether generated plans execute, SIMMER uses a world model to benchmark their latent failures.
-
StreamMemBench: Streaming Evaluation of Agent Memory for Future-Oriented AssistanceStreamMemBench: streaming evaluation of agent memory for assistanceA core role of personal-agent memory is turning stored information and prior interactions into future-oriented assistance. StreamMemBench provides a streaming evaluation of agent memory using cues from what the agent observes and how users interact.
-
Regional Climate Model Emulation with Diffusion Approaches: What is the Added Value of Generative Machine Learning?Added value of diffusion-based generative ML for climate model emulationEmulators cheaply reproduce regional climate models' downscaling, linking global-model predictors to high-resolution fields. The paper assesses the added value of diffusion-based generative machine learning for regional climate model emulation.
-
ORCA: A Platform for Open-Source Dexterity ResearchORCA: an open-source platform for dexterity researchTwo-finger grippers dominate manipulation research but are limited by their form factor. ORCA is an open-source platform to support research on more dexterous robotic manipulation.
-
TRACE: Trajectory-Routed Causal Memory for Delayed-Evidence Visuomotor ImitationTRACE: trajectory-routed causal memory for delayed-evidence imitationAutonomous robots may need decisions based on evidence no longer visible. For delayed-evidence tasks, where an early cue disappears before a later decision, TRACE introduces trajectory-routed causal memory for visuomotor imitation.
-
The Risk Shadow of Principal Component Analysis: When 99.9999% Variance Preservation Causes Catastrophic Decision ErrorsPCA's risk shadow: variance preservation can hide catastrophic riskPCA preserves variance, not the information needed to detect rare catastrophic events. The paper proves a risk shadow where even very high variance preservation can cause catastrophic decision errors.
-
BayLing-Duplex: Native Full-Duplex Speech Dialogue with a Single Autoregressive LLMBayLing-Duplex: native full-duplex speech dialogue from one LLMBayLing-Duplex enables native full-duplex speech interaction with a single autoregressive LLM, letting it listen and speak simultaneously. It handles natural phenomena such as overlap and hesitation.
-
From Shield to Target: Denial-of-Service Attacks on LLM-Based Agent GuardrailsFrom shield to target: DoS attacks on LLM-based agent guardrailsLLM-based guardrails effectively defend autonomous agents against prompt injection and jailbreaks. The paper reveals that the very reasoning and instruction-following abilities enabling this defense can be turned into denial-of-service attacks against the guardrails.
-
Every Eval Ever: A Unifying Schema and Community Repository for AI Evaluation ResultsEvery Eval Ever: a unifying schema and repository for AI evaluationsAI evaluations are widely used to track progress, but inconsistencies across evaluators hinder analysis and comparison. The paper proposes a unifying schema and a community repository, Every Eval Ever, for AI evaluation results.
-
PepALD: Macrocyclic Peptide Generation via Autoregressive Latent DiffusionPepALD generates macrocyclic peptides via autoregressive latent diffusionMacrocyclic peptides are promising for intracellular targets but require joint control of non-natural chemistry, ring topology, and permeability. PepALD generates them using autoregressive latent diffusion.
-
GitOfThoughts: Version-Controlled Reasoning and Agent Memory You Can Replay, Diff, and MergeGitOfThoughts: version-controlled reasoning and agent memoryLLM reasoning is ephemeral: chains of thought vanish, pruned branches leave no record, and memory cannot be diffed or merged. GitOfThoughts makes reasoning and agent memory version-controlled, so it can be replayed, diffed, and merged.
-
The Perceived Fragility of Explanations in Audio Models: Manipulation of Attribution with Unchanged PredictionsManipulating audio-model explanations while predictions stay unchangedThe paper investigates the fragility of post-hoc explanations in audio deepfake detection. Introducing a psychoacoustic framework beyond image-style Lp metrics, it shows attributions can be manipulated while predictions remain unchanged.