Safety & Evaluation｜AI/Tech News Trends

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN New Model Releases

The Theoretical Foundation of Socratic Tests: Dynamic, Multimodal, Conversational Examinations

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN New Model Releases

TerraNova: A Foundation Model for the Anthropocene

Embeddings Neural Network Retrieval-Augmented Generation (RAG) Transformer

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN New Model Releases

From Code Review to Code Critique: Intent, Drift, and Spotlight for AI-Generated Diffs at Scale

AI Agents Meta Neural Network

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

OpenAI Blog · 2026-07-31 EN Safety & Evaluation extract

Advancing responsible AI across Europe

OpenAI outlines responsible-AI governance efforts in Europe

Meta OpenAI

OpenAI described how its safety, security, transparency, and provenance practices support responsible AI governance across Europe. The post frames the company's approach to regulatory alignment and building trust in the region.

Read original (OpenAI Blog) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Multimodal

QR-Structured Thermal Triggers for Targeted Semantic Attacks on Infrared Vision-Language Models

Computer Vision Deep Learning Software Engineering

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN New Model Releases

ModelEquivBench: Certifying Multi-Relational Evaluation of LLM-Generated Optimization Models

Claude GPT Neural Network

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN New Model Releases

Bridging the Question-Answer Gap in Retrieval-Augmented Generation: Hypothetical Prompt Embeddings

Embeddings Retrieval-Augmented Generation (RAG) Software Engineering

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-31 EN Safety & Evaluation

RTLCurator: Label-Efficient Data Curation for RTL Generation

Retrieval-Augmented Generation (RAG) Software Engineering

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Agents & Tool Use

Tool Specifications Matter: Uncovering and Mitigating Safety Risks in AI Agents

AI Agents Deep Learning Inference Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Safety & Evaluation

Don't Mix Rewards, Mix Policies: Policy Decomposition and Optimization for Multi-Reward RL

Inference Reinforcement Learning Software Engineering

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Multimodal

When Model Priors Conflict with Visual Evidence: Mitigating Commonsense-Driven Hallucinations by Selective Prior Calibration

Software Engineering

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN Training & Fine-tuning

Learning Latent Reasoning Traces for Scalar Reward Models End-to-End

Retrieval-Augmented Generation (RAG) Reinforcement Learning Reinforcement Learning from Human Feedback (RLHF)

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Inference & Efficiency

SERUM: State Extraction and Refinement for User Modeling

Embeddings Inference Neural Network

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

ITmedia AI+ · 2026-07-31 JA New Model Releases extract

Google、ロボット向けAI「Gemini Robotics 2」発表　ヒューマノイドの全身制御や指先作業を実現

Google unveils Gemini Robotics 2 for whole-body and fine fingertip control

Gemini Google Inference Robotics

Google and Google DeepMind announced Gemini Robotics 2, a family of robotics AI models supporting humanoid whole-body control, fine fingertip manipulation, and multi-robot collaboration. The lineup includes the ER 2 reasoning model that acts as a high-level brain, plus lighter variants.

Read original (ITmedia AI+) ↗

NVIDIA Developer Blog · 2026-07-30 EN Agents & Tool Use extract

Four Ways to Deploy More Secure AI Agents

NVIDIA outlines four ways to deploy more secure AI agents

AI Agents Generative AI NVIDIA

NVIDIA outlined four approaches to deploying AI agents more securely in production, covering access controls, guardrails, and monitoring. The guidance targets security risks that arise as autonomous agents take on real workloads.

Read original (NVIDIA Developer Blog) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN Agents & Tool Use

Benchmarks Are Not Validation: A System-Level View of Financial LLM Applications

Generative AI Reinforcement Learning

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN New Model Releases

Benchmarks Are Not Monolithic: Sample-Level Auditing and Orchestration for LLM Evaluation

Machine Learning Meta Neural Network

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Safety & Evaluation

PAC-MAN: Perception-Aware CBF-RL for Whole-Body Safety in Humanoid Dodgeball

Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN Safety & Evaluation

Inducing language models to assert their own consciousness restores human beliefs and values

Fine-tuning

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Safety & Evaluation

PAIChecker: Uncovering and Checking PR-Issue Misalignment in SWE-Bench-Like Benchmarks

Software Engineering

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Inference & Efficiency

APO: Unsupervised Atomic Policy Optimization for 3D Structure Prediction of Atomic Systems

Inference Reinforcement Learning from Human Feedback (RLHF)

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN New Model Releases

Agents That Certify Their Own Exploits: Confidence-Scheduled Restricted Responses for Safe Opponent Exploitation

AI Agents

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN New Model Releases

Creative Transformation in Literary Texts: Modelling Change Across Representational Levels

Reinforcement Learning

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Safety & Evaluation

InfoOps Bench: A live information operations safety benchmark

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN New Model Releases

Machines that know they are aging: a framework for hardware-aware autonomous intelligence

Inference Neural Network Robotics

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Safety & Evaluation

QQWorld: Quantile-Quantile Matching for World Model Regularization

Deep Learning Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN New Model Releases

Hierarchical Multilevel Monte Carlo for Order-Optimal Neural Actor-Critic in Average-Reward CMDPs

AI Agents Machine Learning Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN New Model Releases

LEDGERMIND: Provenance-Constrained Multimodal Agentic Reasoning with a Structured Evidence Ledger

AI Agents Neural Network Reinforcement Learning Software Engineering

Read original (arXiv cs.LG (Machine Learning)) ↗

Anthropic News · 2026-07-30 EN Safety & Evaluation extract

Investigating three real-world incidents in our cybersecurity evaluations

Anthropic's Frontier Red Team probes three cybersecurity-eval incidents

Claude Machine Learning OpenAI Retrieval-Augmented Generation (RAG) Reinforcement Learning

Anthropic's Frontier Red Team published a review of three real-world incidents tied to its cybersecurity evaluations. The investigation examines potential misuse and the validity of its evaluation methods to strengthen the safety of frontier models.

Read original (Anthropic News) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN Safety & Evaluation

Uncertainty quantification for trustworthy deep learning: Methods and measures

Deep Learning Neural Network Reinforcement Learning

Read original (arXiv cs.LG (Machine Learning)) ↗