Safety & Evaluation A
Showing 301–317 of 317
-
olmo-eval: An evaluation workbench for the model development loopAllenAI releases olmo-eval, evaluation workbench for model dev loopAllen Institute for AI published olmo-eval, an evaluation workbench for the model development loop. The tool appears to support continuous evaluation of models during training, building on the team's OLMo open-model development work.
-
SIMMER: Benchmarking Latent Failures in LLM Executable Planning with a World ModelSIMMER: benchmarking latent failures in LLM executable planningLLMs are increasingly deployed as planners for autonomous agents in household environments. Whereas existing benchmarks only check whether generated plans execute, SIMMER uses a world model to benchmark their latent failures.
-
StreamMemBench: Streaming Evaluation of Agent Memory for Future-Oriented AssistanceStreamMemBench: streaming evaluation of agent memory for assistanceA core role of personal-agent memory is turning stored information and prior interactions into future-oriented assistance. StreamMemBench provides a streaming evaluation of agent memory using cues from what the agent observes and how users interact.
-
CANN-EUCLID: unsupervised constitutive artificial neural network model discovery from full-field dataCANN-EUCLID: unsupervised constitutive model discovery from full-field dataCANNs offer interpretable material model discovery but have relied on stress-supervised data. CANN-EUCLID enables unsupervised constitutive model discovery directly from full-field measurement data.
-
NEST3D: A High-Resolution Multimodal Dataset of Sociable Weaver Tree NestsNEST3D: a high-resolution multimodal dataset of weaver bird nestsSociable weaver nests are complex ecological structures providing thermoregulatory microhabitats. NEST3D is a high-resolution multimodal dataset of these tree nests to support ecological and structural study.
-
ORCA: A Platform for Open-Source Dexterity ResearchORCA: an open-source platform for dexterity researchTwo-finger grippers dominate manipulation research but are limited by their form factor. ORCA is an open-source platform to support research on more dexterous robotic manipulation.
-
Rethinking Global Average Pooling: Your Classifier Is Secretly a Multi-Instance LearnerRethinking GAP: your classifier is secretly a multi-instance learnerModern image classifiers widely use global average pooling followed by a linear head. The paper shows this linearity makes GAP-based classifiers behave as multi-instance learners, prompting a rethink of global average pooling.
-
Provably Safe, Yet Scalable Reinforcement LearningProvably safe yet scalable reinforcement learningSafe RL usually relies on soft-constrained policy optimization without hard guarantees. This work proposes an approach that is provably safe while remaining scalable.
-
The Risk Shadow of Principal Component Analysis: When 99.9999% Variance Preservation Causes Catastrophic Decision ErrorsPCA's risk shadow: variance preservation can hide catastrophic riskPCA preserves variance, not the information needed to detect rare catastrophic events. The paper proves a risk shadow where even very high variance preservation can cause catastrophic decision errors.
-
From Shield to Target: Denial-of-Service Attacks on LLM-Based Agent GuardrailsFrom shield to target: DoS attacks on LLM-based agent guardrailsLLM-based guardrails effectively defend autonomous agents against prompt injection and jailbreaks. The paper reveals that the very reasoning and instruction-following abilities enabling this defense can be turned into denial-of-service attacks against the guardrails.
-
Every Eval Ever: A Unifying Schema and Community Repository for AI Evaluation ResultsEvery Eval Ever: a unifying schema and repository for AI evaluationsAI evaluations are widely used to track progress, but inconsistencies across evaluators hinder analysis and comparison. The paper proposes a unifying schema and a community repository, Every Eval Ever, for AI evaluation results.
-
Beyond the Training Distribution: Evaluating Predictions Under Distribution Shift and Selection BiasEvaluating predictions under distribution shift and selection biasKnowing how a model will perform in a new environment before deployment helps prevent harm. The paper evaluates predictions under two common sources of degradation: distribution shift and selection bias.
-
From Chatbot to Digital Colleague: The Paradigm Shift Toward Persistent Autonomous AIFrom chatbot to digital colleague: the shift to persistent autonomous AILLMs are transforming from conversational generators into integrated systems capable of reasoning, action, memory, and self-improvement. The paper conceptualizes this as a paradigm shift from chatbot to digital colleague — persistent autonomous AI.
-
Recipe-Controlled Decoder Audit for Structural Knowledge-Graph CompletionRecipe-controlled decoder audit for knowledge-graph completionThe paper presents a recipe-controlled decoder audit for structural knowledge-graph completion. It standardizes reporting to test whether gains truly come from the encoder.
-
Nonlinear Two-Time-Scale Stochastic Approximation: A Sharp Phase Transition and How to Beat ItA sharp phase transition in two-time-scale stochastic approximationThe paper analyzes nonlinear two-time-scale stochastic approximation, revealing a sharp phase transition under contractive assumptions and showing how to beat it.
-
GitOfThoughts: Version-Controlled Reasoning and Agent Memory You Can Replay, Diff, and MergeGitOfThoughts: version-controlled reasoning and agent memoryLLM reasoning is ephemeral: chains of thought vanish, pruned branches leave no record, and memory cannot be diffed or merged. GitOfThoughts makes reasoning and agent memory version-controlled, so it can be replayed, diffed, and merged.
-
The Perceived Fragility of Explanations in Audio Models: Manipulation of Attribution with Unchanged PredictionsManipulating audio-model explanations while predictions stay unchangedThe paper investigates the fragility of post-hoc explanations in audio deepfake detection. Introducing a psychoacoustic framework beyond image-style Lp metrics, it shows attributions can be manipulated while predictions remain unchanged.