Safety & Evaluation (Page 11 of 11)｜AI/Tech News Trends

Hugging Face Blog · 2026-06-12 EN Safety & Evaluation extract

olmo-eval: An evaluation workbench for the model development loop

AllenAI releases olmo-eval, evaluation workbench for model dev loop

Allen Institute for AI published olmo-eval, an evaluation workbench for the model development loop. The tool appears to support continuous evaluation of models during training, building on the team's OLMo open-model development work.

Read original (Hugging Face Blog) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-06-12 EN New Model Releases extract

SIMMER: Benchmarking Latent Failures in LLM Executable Planning with a World Model

SIMMER: benchmarking latent failures in LLM executable planning

AI Agents Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning

LLMs are increasingly deployed as planners for autonomous agents in household environments. Whereas existing benchmarks only check whether generated plans execute, SIMMER uses a world model to benchmark their latent failures.

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-06-12 EN Safety & Evaluation extract

StreamMemBench: Streaming Evaluation of Agent Memory for Future-Oriented Assistance

StreamMemBench: streaming evaluation of agent memory for assistance

A core role of personal-agent memory is turning stored information and prior interactions into future-oriented assistance. StreamMemBench provides a streaming evaluation of agent memory using cues from what the agent observes and how users interact.

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-06-12 EN Safety & Evaluation extract

CANN-EUCLID: unsupervised constitutive artificial neural network model discovery from full-field data

CANN-EUCLID: unsupervised constitutive model discovery from full-field data

Neural Network

CANNs offer interpretable material model discovery but have relied on stress-supervised data. CANN-EUCLID enables unsupervised constitutive model discovery directly from full-field measurement data.

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-06-12 EN Policy & Regulation extract

NEST3D: A High-Resolution Multimodal Dataset of Sociable Weaver Tree Nests

NEST3D: a high-resolution multimodal dataset of weaver bird nests

Algorithms & Theory Deep Learning Neural Network Reinforcement Learning Transformer

Sociable weaver nests are complex ecological structures providing thermoregulatory microhabitats. NEST3D is a high-resolution multimodal dataset of these tree nests to support ecological and structural study.

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-06-12 EN New Model Releases extract

ORCA: A Platform for Open-Source Dexterity Research

ORCA: an open-source platform for dexterity research

Neural Network Retrieval-Augmented Generation (RAG) Robotics

Two-finger grippers dominate manipulation research but are limited by their form factor. ORCA is an open-source platform to support research on more dexterous robotic manipulation.

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-06-12 EN Safety & Evaluation extract

Rethinking Global Average Pooling: Your Classifier Is Secretly a Multi-Instance Learner

Rethinking GAP: your classifier is secretly a multi-instance learner

Retrieval-Augmented Generation (RAG)

Modern image classifiers widely use global average pooling followed by a linear head. The paper shows this linearity makes GAP-based classifiers behave as multi-instance learners, prompting a rethink of global average pooling.

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-06-12 EN Safety & Evaluation extract

Provably Safe, Yet Scalable Reinforcement Learning

Provably safe yet scalable reinforcement learning

Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning

Safe RL usually relies on soft-constrained policy optimization without hard guarantees. This work proposes an approach that is provably safe while remaining scalable.

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-06-12 EN New Model Releases extract

The Risk Shadow of Principal Component Analysis: When 99.9999% Variance Preservation Causes Catastrophic Decision Errors

PCA's risk shadow: variance preservation can hide catastrophic risk

Reinforcement Learning

PCA preserves variance, not the information needed to detect rare catastrophic events. The paper proves a risk shadow where even very high variance preservation can cause catastrophic decision errors.

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-06-12 EN Safety & Evaluation extract

From Shield to Target: Denial-of-Service Attacks on LLM-Based Agent Guardrails

From shield to target: DoS attacks on LLM-based agent guardrails

AI Agents Claude DeepSeek Gemini GPT

LLM-based guardrails effectively defend autonomous agents against prompt injection and jailbreaks. The paper reveals that the very reasoning and instruction-following abilities enabling this defense can be turned into denial-of-service attacks against the guardrails.

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-06-12 EN Safety & Evaluation extract

Every Eval Ever: A Unifying Schema and Community Repository for AI Evaluation Results

Every Eval Ever: a unifying schema and repository for AI evaluations

Meta Neural Network

AI evaluations are widely used to track progress, but inconsistencies across evaluators hinder analysis and comparison. The paper proposes a unifying schema and a community repository, Every Eval Ever, for AI evaluation results.

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-06-12 EN Safety & Evaluation extract

Beyond the Training Distribution: Evaluating Predictions Under Distribution Shift and Selection Bias

Evaluating predictions under distribution shift and selection bias

Algorithms & Theory Machine Learning

Knowing how a model will perform in a new environment before deployment helps prevent harm. The paper evaluates predictions under two common sources of degradation: distribution shift and selection bias.

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-06-12 EN Safety & Evaluation extract

From Chatbot to Digital Colleague: The Paradigm Shift Toward Persistent Autonomous AI

From chatbot to digital colleague: the shift to persistent autonomous AI

AI Agents Inference Neural Network Retrieval-Augmented Generation (RAG) Software Engineering

LLMs are transforming from conversational generators into integrated systems capable of reasoning, action, memory, and self-improvement. The paper conceptualizes this as a paradigm shift from chatbot to digital colleague — persistent autonomous AI.

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-06-12 EN Safety & Evaluation extract

Recipe-Controlled Decoder Audit for Structural Knowledge-Graph Completion

Recipe-controlled decoder audit for knowledge-graph completion

Machine Learning Neural Network Reinforcement Learning Software Engineering

The paper presents a recipe-controlled decoder audit for structural knowledge-graph completion. It standardizes reporting to test whether gains truly come from the encoder.

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-06-12 EN Safety & Evaluation extract

Nonlinear Two-Time-Scale Stochastic Approximation: A Sharp Phase Transition and How to Beat It

A sharp phase transition in two-time-scale stochastic approximation

Speech Processing

The paper analyzes nonlinear two-time-scale stochastic approximation, revealing a sharp phase transition under contractive assumptions and showing how to beat it.

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-06-12 EN New Model Releases extract

GitOfThoughts: Version-Controlled Reasoning and Agent Memory You Can Replay, Diff, and Merge

GitOfThoughts: version-controlled reasoning and agent memory

AI Agents Neural Network Reinforcement Learning Software Engineering

LLM reasoning is ephemeral: chains of thought vanish, pruned branches leave no record, and memory cannot be diffed or merged. GitOfThoughts makes reasoning and agent memory version-controlled, so it can be replayed, diffed, and merged.

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-06-12 EN New Model Releases extract

The Perceived Fragility of Explanations in Audio Models: Manipulation of Attribution with Unchanged Predictions

Manipulating audio-model explanations while predictions stay unchanged

Retrieval-Augmented Generation (RAG)

The paper investigates the fragility of post-hoc explanations in audio deepfake detection. Introducing a psychoacoustic framework beyond image-style Lp metrics, it shows attributions can be manipulated while predictions remain unchanged.

Read original (arXiv cs.AI (Artificial Intelligence)) ↗