Safety & Evaluation A

Showing 301–317 of 317
  • Hugging Face Blog · EN Safety & Evaluation extract
    olmo-eval: An evaluation workbench for the model development loop
    AllenAI releases olmo-eval, evaluation workbench for model dev loop
    Allen Institute for AI published olmo-eval, an evaluation workbench for the model development loop. The tool appears to support continuous evaluation of models during training, building on the team's OLMo open-model development work.
    Read original (Hugging Face Blog) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    SIMMER: Benchmarking Latent Failures in LLM Executable Planning with a World Model
    SIMMER: benchmarking latent failures in LLM executable planning
    AI Agents Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning
    LLMs are increasingly deployed as planners for autonomous agents in household environments. Whereas existing benchmarks only check whether generated plans execute, SIMMER uses a world model to benchmark their latent failures.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    StreamMemBench: Streaming Evaluation of Agent Memory for Future-Oriented Assistance
    StreamMemBench: streaming evaluation of agent memory for assistance
    A core role of personal-agent memory is turning stored information and prior interactions into future-oriented assistance. StreamMemBench provides a streaming evaluation of agent memory using cues from what the agent observes and how users interact.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    CANN-EUCLID: unsupervised constitutive artificial neural network model discovery from full-field data
    CANN-EUCLID: unsupervised constitutive model discovery from full-field data
    Neural Network
    CANNs offer interpretable material model discovery but have relied on stress-supervised data. CANN-EUCLID enables unsupervised constitutive model discovery directly from full-field measurement data.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Policy & Regulation extract
    NEST3D: A High-Resolution Multimodal Dataset of Sociable Weaver Tree Nests
    NEST3D: a high-resolution multimodal dataset of weaver bird nests
    Algorithms & Theory Deep Learning Neural Network Reinforcement Learning Transformer
    Sociable weaver nests are complex ecological structures providing thermoregulatory microhabitats. NEST3D is a high-resolution multimodal dataset of these tree nests to support ecological and structural study.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    ORCA: A Platform for Open-Source Dexterity Research
    ORCA: an open-source platform for dexterity research
    Neural Network Retrieval-Augmented Generation (RAG) Robotics
    Two-finger grippers dominate manipulation research but are limited by their form factor. ORCA is an open-source platform to support research on more dexterous robotic manipulation.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Rethinking Global Average Pooling: Your Classifier Is Secretly a Multi-Instance Learner
    Rethinking GAP: your classifier is secretly a multi-instance learner
    Retrieval-Augmented Generation (RAG)
    Modern image classifiers widely use global average pooling followed by a linear head. The paper shows this linearity makes GAP-based classifiers behave as multi-instance learners, prompting a rethink of global average pooling.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Provably Safe, Yet Scalable Reinforcement Learning
    Provably safe yet scalable reinforcement learning
    Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning
    Safe RL usually relies on soft-constrained policy optimization without hard guarantees. This work proposes an approach that is provably safe while remaining scalable.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    The Risk Shadow of Principal Component Analysis: When 99.9999% Variance Preservation Causes Catastrophic Decision Errors
    PCA's risk shadow: variance preservation can hide catastrophic risk
    Reinforcement Learning
    PCA preserves variance, not the information needed to detect rare catastrophic events. The paper proves a risk shadow where even very high variance preservation can cause catastrophic decision errors.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    From Shield to Target: Denial-of-Service Attacks on LLM-Based Agent Guardrails
    From shield to target: DoS attacks on LLM-based agent guardrails
    AI Agents Claude DeepSeek Gemini GPT
    LLM-based guardrails effectively defend autonomous agents against prompt injection and jailbreaks. The paper reveals that the very reasoning and instruction-following abilities enabling this defense can be turned into denial-of-service attacks against the guardrails.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Every Eval Ever: A Unifying Schema and Community Repository for AI Evaluation Results
    Every Eval Ever: a unifying schema and repository for AI evaluations
    Meta Neural Network
    AI evaluations are widely used to track progress, but inconsistencies across evaluators hinder analysis and comparison. The paper proposes a unifying schema and a community repository, Every Eval Ever, for AI evaluation results.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Beyond the Training Distribution: Evaluating Predictions Under Distribution Shift and Selection Bias
    Evaluating predictions under distribution shift and selection bias
    Algorithms & Theory Machine Learning
    Knowing how a model will perform in a new environment before deployment helps prevent harm. The paper evaluates predictions under two common sources of degradation: distribution shift and selection bias.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    From Chatbot to Digital Colleague: The Paradigm Shift Toward Persistent Autonomous AI
    From chatbot to digital colleague: the shift to persistent autonomous AI
    AI Agents Inference Neural Network Retrieval-Augmented Generation (RAG) Software Engineering
    LLMs are transforming from conversational generators into integrated systems capable of reasoning, action, memory, and self-improvement. The paper conceptualizes this as a paradigm shift from chatbot to digital colleague — persistent autonomous AI.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Recipe-Controlled Decoder Audit for Structural Knowledge-Graph Completion
    Recipe-controlled decoder audit for knowledge-graph completion
    Machine Learning Neural Network Reinforcement Learning Software Engineering
    The paper presents a recipe-controlled decoder audit for structural knowledge-graph completion. It standardizes reporting to test whether gains truly come from the encoder.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Nonlinear Two-Time-Scale Stochastic Approximation: A Sharp Phase Transition and How to Beat It
    A sharp phase transition in two-time-scale stochastic approximation
    Speech Processing
    The paper analyzes nonlinear two-time-scale stochastic approximation, revealing a sharp phase transition under contractive assumptions and showing how to beat it.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    GitOfThoughts: Version-Controlled Reasoning and Agent Memory You Can Replay, Diff, and Merge
    GitOfThoughts: version-controlled reasoning and agent memory
    AI Agents Neural Network Reinforcement Learning Software Engineering
    LLM reasoning is ephemeral: chains of thought vanish, pruned branches leave no record, and memory cannot be diffed or merged. GitOfThoughts makes reasoning and agent memory version-controlled, so it can be replayed, diffed, and merged.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    The Perceived Fragility of Explanations in Audio Models: Manipulation of Attribution with Unchanged Predictions
    Manipulating audio-model explanations while predictions stay unchanged
    Retrieval-Augmented Generation (RAG)
    The paper investigates the fragility of post-hoc explanations in audio deepfake detection. Introducing a psychoacoustic framework beyond image-style Lp metrics, it shows attributions can be manipulated while predictions remain unchanged.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗