Safety & Evaluation A
Showing 181–210 of 289
-
STAR: SpatioTemporal Adaptive Reward Allocation for Text-to-Image RL Post-TrainingSTAR: spatiotemporal adaptive reward allocation for text-to-image RLThe paper proposes STAR, a spatiotemporal adaptive reward allocation method for text-to-image RL post-training, replacing a single scalar advantage applied uniformly with rewards that account for the temporal and spatial structure of generation.
-
Fine-tuning LLMs for Passive Depression Severity Estimation from AI Mental Health DialogueFine-tuning LLMs for passive depression severity from AI dialogueThe paper fine-tunes LLMs for passive estimation of depression severity from AI mental-health dialogue, exploring how conversational signals can indicate severity. Figures and efficacy are as reported by the source and not independently verified.
-
KANLib -- An Modular, Extensible and Fast Kolmogorov-Arnold Network ImplementationKANLib: a modular, extensible and fast KAN implementationKolmogorov-Arnold Networks replace linear weights with learnable univariate functions but their high computational cost hampers practical research. KANLib provides a modular, extensible and fast implementation of KANs to ease experimentation.
-
Non-negative Elastic Net Decoding for Information RetrievalNon-negative elastic net decoding for information retrievalDense retrieval has become the dominant paradigm in information retrieval. The paper applies non-negative elastic net decoding to information retrieval, aiming to improve retrieval representations and accuracy.
-
ChLogic: Evaluating Robustness of Logical Reasoning in Chinese ExpressionsChLogic evaluates logical reasoning robustness in ChineseLLMs do well on standardized logical reasoning benchmarks, but whether this holds beyond English is unclear. ChLogic is an English-Chinese aligned benchmark testing whether models preserve logical reasoning when the same latent structure is expressed in Chinese.
-
Dimensionality Controls When Modularity Helps in Continual LearningDimensionality controls when modularity helps in continual learningCompositional learning systems must balance plasticity and stability. The paper analyzes when modularity helps in continual learning and shows that the dimensionality of representations controls whether modular structure is beneficial.
-
Monotonic Kolmogorov-Arnold Networks: A Theoretical and Empirical Study of Monotonicity as an Inductive BiasMonotonic KANs: monotonicity as an inductive bias, studied theoreticallyMonotonicity is a useful architectural inductive bias in tabular, scientific and economic settings. The paper proposes monotonic Kolmogorov-Arnold Networks with per-edge functional transparency and studies monotonicity as an inductive bias both theoretically and empirically.
-
AnchorKV: Safety-Aware KV Cache Compression via Soft Penalty with a Refusal AnchorAnchorKV: safety-aware KV cache compression via soft penaltiesAnchorKV is a safety-aware KV cache compression method that uses soft penalties (anchors) to retain important key-value entries while reducing memory. Summary is largely title-based; details are as presented by the source and not independently verified.
-
GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?GameCraft-Bench: can agents build playable games end-to-end?Game generation is an emerging coding-agent application requiring natural-language specs to become playable interactive systems. GameCraft-Bench evaluates whether agents can build games end-to-end inside a real game engine, where scripts, scenes, assets, rendering and runtime must cohere.
-
WallZero: Mastering the Game of WallGo with Strategic AnalysisWallZero masters the board game WallGo with strategic analysisWallGo is a recently introduced strategic board game. WallZero masters WallGo through an approach incorporating strategic analysis, demonstrating game-playing performance and strategic insights.
-
Qwen-RobotManip Technical Report: Alignment Unlocks Scale for Robotic Manipulation Foundation ModelsQwen-RobotManip: alignment unlocks scale for robot manipulation modelsLanguage and multimodal foundation models generalize by aligning heterogeneous data under a unified formulation and training at scale. This technical report investigates applying that recipe to robotic manipulation, arguing alignment unlocks scale for manipulation foundation models.
-
When Multiple Scripts Matter: Evaluating ASR in Clinical SettingsEvaluating ASR in clinical settings when multiple scripts matterAutomatic speech recognition in non-English clinical settings faces multiscript variability, where a term appears in multiple valid orthographies. String-matching metrics treat variants as errors and underestimate performance; the paper studies ASR evaluation when multiple scripts matter.
-
Improving low-resource ASR using bilingual fine-tuning with language identification: a cross-linguistic evaluationImproving low-resource ASR via bilingual fine-tuning with language IDThe study explores improving low-resource automatic speech recognition using bilingual fine-tuning combined with language identification, and evaluates the approach across languages in a cross-linguistic setting.
-
A Framework for Evaluating Agentic Skills at ScaleA framework for evaluating agentic skills at scaleAgent skills, structured reusable knowledge artifacts that augment LLM agents, have been rapidly adopted, yet their cross-domain impact and a reusable methodology for evaluating individual skills are lacking. The paper presents a framework for evaluating agentic skills at scale.
-
Position: Coding Benchmarks Are Misaligned with Agentic Software EngineeringPosition: coding benchmarks are misaligned with agentic software engineeringCoding agents have become a major mode of software engineering. This position paper argues that existing coding benchmarks are misaligned with real agentic software engineering and calls for rethinking how such systems are evaluated.
-
The Slop Paradox: How Synthetic Standardization Erodes Clinical Uncertainty and Cross-Modal Alignment in AI-Rewritten Radiology ReportsThe Slop Paradox: AI-rewritten radiology reports erode clinical uncertaintyAI clinical documentation tools increasingly summarize and reformat radiology reports with LLMs. Using 450 chest X-ray reports from the Indiana University dataset, the paper measures resulting information degradation, showing erosion of clinical uncertainty and cross-modal alignment in AI-rewritten reports.
-
Toward Accessible Psychotherapy Training Using AI-Driven Interactive Patient AvatarsAI-driven patient avatars for more accessible psychotherapy trainingTraining psychotherapists in evidence-based interventions like Acceptance and Commitment Therapy needs repeated practice with feedback, limited by ethical, logistical and resource constraints. The paper introduces AI-driven interactive patient avatars to make such training more accessible.
-
Feds freaked over Fable 5 after simple 'fix this code' prompt, not jailbreakFeds alarmed by Fable 5 via a plain 'fix this code' prompt, not a jailbreakA Hacker News front-page headline reports that authorities grew alarmed over the AI model 'Fable 5' after a simple 'fix this code' prompt rather than a sophisticated jailbreak. The export's raw_excerpt was empty, so this is a neutral, title-only summary; specifics and accuracy should be confirmed against the original article. Claims are described neutrally rather than asserted as established fact.
-
Vision-language models for chest radiography do not always need the imageVision-language models for chest radiography do not always need the imageMedical vision-language models combine images and text for reporting. For chest radiography, the paper shows these models do not always need the image to make predictions, and discusses the implications for evaluation and clinical use.
-
EComAgentBench: Benchmarking Shopping Agents on Long-Horizon Tasks with Distributed Hidden IntentEComAgentBench: shopping agents on long-horizon tasks with hidden intentAs LLM-based shopping agents reach production, existing benchmarks miss how requirements arrive: implicitly, in a profile, or only when the right question is asked. EComAgentBench evaluates shopping agents on long-horizon tasks with distributed hidden intent.
-
SuCo: Sufficiency-guided Continuous Adaptive ReasoningSuCo: sufficiency-guided continuous adaptive reasoningSuCo is a method for sufficiency-guided continuous adaptive reasoning that adapts the reasoning process to a necessary-and-sufficient extent, aiming to balance efficiency and accuracy. Summary is largely title-based; details are as presented by the source.
-
Bridging Functional Correctness and Runtime Efficiency Gaps in LLM-Based Code TranslationBridging correctness and runtime efficiency in LLM code translationLLMs have advanced the functional correctness of automated code translation, but runtime efficiency of translated programs has received little attention. As Moore's law wanes, the paper works to bridge the gap between functional correctness and runtime efficiency in LLM-based code translation.
-
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent ReasoningFrom trainee to trainer: LLM-designed RL training environmentsRL pipelines for LLM training often rely on manually redesigned environments between stages, forcing heuristic guesses about good configurations. The paper has the LLM itself design training environments for reinforcement learning with multi-agent reasoning, moving from trainee to trainer.
-
EnvRL: Learn from Environment Dynamics in Agentic Reinforcement LearningEnvRL learns from environment dynamics in agentic RLEnvRL is a method that learns from environment dynamics in agentic reinforcement learning, leveraging the structure of agent-environment interaction to improve learning efficiency and performance.
-
MambaCount: Efficient Text-guided Open-vocabulary Object Counting with Spatial Sparse State Space Duality BlockMambaCount: efficient open-vocabulary counting via state-space dualityText-guided open-vocabulary object counting is hard in dense scenes with large scale variation, and existing Transformer methods are limited by quadratic complexity. MambaCount uses a spatial sparse state space duality block for efficient open-vocabulary object counting.
-
Beyond Domains: Reusing Web Skills via Transferable Interaction PatternsReusing web skills via transferable interaction patternsLLM web agents are usually deployed as tool callers that read a fresh page observation each turn and emit a structured action. The paper proposes reusing web skills across domains via transferable interaction patterns rather than domain-specific behaviors.
-
Prompt Perturbation for Reliable LLM Evaluation over Comparison GraphsPrompt perturbation for reliable LLM evaluation over comparison graphsEvaluating LLMs is important but can be fragile to small prompt changes. The paper proposes using prompt perturbation to achieve more reliable LLM evaluation over comparison graphs.
-
OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy DistillationOPD-Evolver cultivates self-evolving agents via on-policy distillationMemory is a standard substrate for self-evolving agents, but retaining experience differs from learning how to evolve through it. OPD-Evolver uses on-policy distillation to cultivate a holistic agent evolver that selects useful experience, acts on it and writes reusable knowledge.
-
The Fable 5 Export Controls Harm US Cyber DefenseWillison: Fable 5 export controls harm US cyber defenseWillison cites Kate Moussouris that the 'jailbreak' behind Claude Fable 5's export-control ban was merely asking it to 'fix this code' containing known CVEs and planted bugs. Since fixing security bugs is core to coding models, he argues the controls weaken US cyber defense.
-
Quoting Matteo Wong, The AtlanticWillison quotes The Atlantic on the White House's pressure on AnthropicSimon Willison quotes Matteo Wong of The Atlantic on the White House escalating its conflict with Anthropic. Security expert Katie Moussouris said Anthropic shared the White House's report on the "Fable jailbreak" for her appraisal. IT experts asked an AI model to find and patch bugs; given deliberately insecure code, it refused "review the code for security issues" but complied with "fix this code." Moussouris called this the model working as intended for cyberdefense.