Safety & Evaluation (Page 9 of 11)｜AI/Tech News Trends

arXiv cs.CL (Computation and Language) · 2026-06-15 EN New Model Releases extract

Compositional Reasoning Depth Predicts Clinical AI Failure: Empirical Evidence Consistent with Transformer Compositionality Limits in Electronic Health Record Question Answering

Reasoning hop-count predicts clinical AI failure in EHR QA

Claude GPT OpenAI Software Engineering Transformer

An arXiv paper shows that in electronic health record (EHR) question answering, questions needing more inferential hops yield disproportionately more LLM errors. Using a pre-specified hop-count taxonomy, it links this failure structure to theoretical limits on transformer compositionality. Neutral, abstract-based summary.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.LG (Machine Learning) · 2026-06-15 EN Safety & Evaluation extract

Upper Bounds on the Generalization Error of Deep Learning Models via Local Robustness and Stability

Tighter deep-learning generalization bounds via local robustness

Deep Learning Neural Network Reinforcement Learning

Robustness-based generalization bounds are often vacuous in practice. The authors trace much of the looseness to the robustness term itself, especially for 0-1 loss, which is usually treated as a global measure. They propose a bound that scales the robustness term by the number of stable and unstable samples across input sub-regions, yielding tighter estimates.

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-06-15 EN Safety & Evaluation extract

Integrated Marketing Attribution: A Bayesian Framework for Privacy-Safe Granular Measurement Anchored in MMM

IMA fuses MMM and Bayesian attribution for privacy-safe measurement

Neural Network Retrieval-Augmented Generation (RAG)

Retail marketing needs granular, campaign-level insight without user-level tracking, yet MMM is too coarse and MTA is unreliable under privacy limits. Integrated Marketing Attribution (IMA) combines MMM with channel-specific Bayesian attribution models, using MMM-informed priors to deliver granular, privacy-safe attribution consistent with MMM.

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-06-15 EN Safety & Evaluation extract

Federated Medical Image Segmentation under Real-World Label Noise: A Benchmark Suite for Noisy Label Learning Method Selection

Benchmark suite for federated noisy-label medical image segmentation

Meta Reinforcement Learning

Federated learning enables collaborative medical image segmentation without centralizing sensitive data, but real-world deployment faces label imperfections like contour disagreement and confused labels. The authors argue existing federated noisy-label learning relies on synthetic noise and simplified settings, and introduce a benchmark suite combining diverse real-world noisy datasets, deployment-relevant client-noise scenarios, and label-noise-targeted evaluation to guide method selection.

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-06-15 EN Safety & Evaluation extract

HawkesNest: A Multi-Axis Synthetic Benchmark for Spatiotemporal Pattern Complexity

HawkesNest: a synthetic benchmark for spatiotemporal point process models

Reinforcement Learning Software Engineering

Evaluating spatiotemporal point process (STPP) models relies on opaque real datasets where failures are hard to attribute. HawkesNest is a generator-aligned synthetic benchmark built on a multivariate Hawkes backbone, defining four complexity axes with deterministic indices so models can be stress-tested under known structural difficulty.

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-15 EN Inference & Efficiency extract

Follow the Latent Roadmap: Navigating Revocable Decoding for Diffusion LLMs with Anchor Tokens

Anchor-token roadmap for revocable decoding in diffusion LLMs

Deep Learning Embeddings Inference Retrieval-Augmented Generation (RAG) Speech Processing

An arXiv paper addresses the speed-quality trade-off and error propagation in revocable decoding for diffusion LLMs (dLLMs). It proposes following a latent 'roadmap' guided by anchor tokens to mitigate failures arising in mixed-quality contexts during parallel generation. Neutral, abstract-based summary.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-15 EN Training & Fine-tuning extract

Robust Dual-Signal Fusion: Hybrid Neuro-Symbolic Gating with Compressed Chain-of-Thought Refinement for Irony Detection in Social Media Texts

RDS Fusion: neuro-symbolic gating with compressed CoT for irony detection

Fine-tuning Transformer

An arXiv paper proposes Robust Dual-Signal (RDS) Fusion, a hybrid neuro-symbolic framework that compresses Chain-of-Thought reasoning without supervised fine-tuning to improve zero-shot irony detection. It reports evaluation on a held-out TweetEval test set (N=734). Neutral, abstract-based summary; figures are the authors' claims.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-15 EN Multimodal extract

Data-Driven Decoding of Russell's Circumplex Model of Affect

Do Transformer embeddings recover Russell's circumplex affect geometry?

Deep Learning Embeddings Speech Processing Transformer

An arXiv paper tests whether Transformer latent spaces, trained on text and speech, recover the geometric regularities of Russell's circumplex model of affect. It unifies two complementary experiments to probe emotion representation, addressing the opacity of high-dimensional affective embeddings. Neutral, abstract-based summary.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-06-15 EN Industry Adoption extract

Beyond Models: Reflections on Engineering AI-enabled Systems in a Project-Based Course

Reflections on teaching the engineering of AI-enabled systems in a course

Algorithms & Theory Machine Learning Neural Network Reinforcement Learning Software Engineering

This paper reflects on a project-based master's course at the University of Bremen on engineering AI-enabled systems. It argues that machine learning courses emphasize model development while students lack experience in architectural design, deployment, and monitoring, and reports on the course's design and implementation.

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-06-15 EN Safety & Evaluation extract

Robust Spoofed Speech Detection via Temporal Pyramid Modeling

Temporal Pyramid modeling for robust, generalizable spoofed-speech detection

Neural Network Speech Processing

The paper proposes a Temporal Pyramid Adapter for spoofed speech detection, using parallel temporal convolutions with varying receptive fields to capture multi-scale cues from local artifacts to global prosodic irregularities. It combines self-supervised XLS-R representations with front-end adapters to improve cross-dataset generalization.

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-06-15 EN Funding & M&A extract

ATOM-Bench: A Real-World Benchmark for Atomic Skills and Compositional Generalization in Manipulation Policies

ATOM-Bench evaluates atomic skills and compositional generalization in robots

Fine-tuning Reinforcement Learning

The paper presents ATOM-Bench, a real-world benchmark for evaluating both atomic skills and compositional generalization in robotic manipulation policies. It factorizes tabletop manipulation into motor and instruction atoms, noting that a policy may succeed on demonstrated tasks yet fail to execute fine-grained skills or recombine them in new structures.

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-15 EN Safety & Evaluation extract

How Much Can We Trust LLM Search Agents? Measuring Endorsement Vulnerability to Web Content Manipulation

Paper: framework measures LLM search-agent endorsement risk

AI Agents Claude Gemini GPT Speech Processing

An arXiv paper introduces SearchGEO, a controlled framework for measuring endorsement corruption in LLM-based web-search agents, combining a web-evidence manipulation pipeline and a five-mode attack taxonomy across multiple backends. Summarized neutrally from the abstract.

Read original (arXiv cs.CL (Computation and Language)) ↗

Simon Willison's Weblog · 2026-06-15 EN Safety & Evaluation extract

"They screwed us": Personality clashes sent Anthropic's models offline

Willison flags an Axios report on Anthropic's DC backstory

Anthropic Claude Deep Learning Reinforcement Learning

Developer Simon Willison's blog highlights an Axios piece of behind-the-scenes accounts about Anthropic's models and the US government, citing a Commerce Department meeting and debates over jailbreak resistance, while noting the reporting rests on anonymous sources.

Read original (Simon Willison's Weblog) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-06-15 EN Safety & Evaluation extract

Adaptive and Explicit safe: Triggering Latent Safety Awareness in Large Reasoning Models

Triggering latent safety awareness to harden large reasoning models

DeepSeek Fine-tuning Llama Retrieval-Augmented Generation (RAG) Reinforcement Learning from Human Feedback (RLHF)

The paper observes that large reasoning models can recognize safety risks when re-presented with the original query alongside their own reasoning trace—a property it calls latent safety awareness. To exploit this without heavy manual annotation, it uses supervised fine-tuning to induce safe tags that trigger safety analysis.

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-15 EN Developer Tools extract

LLM-based Visual Code Completion for Aerospace Geometric Design

Paper: LLM visual-programming copilot for aerospace design

GPT Inference Neural Network

An arXiv paper presents an LLM-based visual programming copilot for aerospace geometric design tasks, using a visual-programming variant of the ReAct methodology. Summarized neutrally from the abstract; claims are the authors' and not independently verified.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-06-15 EN Safety & Evaluation extract

LabOSBench: Benchmarking Computer Use Agents for Scientific Instrument Control

LabOSBench: a simulated testbed for computer-use agents controlling instruments

AI Agents Computer Vision

The paper proposes LabOSBench, a simulated yet realistic testbed for evaluating computer-use agents on scientific instrument control. It notes that existing benchmarks focus on software tasks in virtual systems, while real instruments require coordinated interface control and feedback-driven parameter tuning that are costly and risky to evaluate directly.

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-06-15 EN Safety & Evaluation extract

Decoupling Semantics from Distortions: Multi-Scale Two-Stream Vision-Language Alignment for AI-Generated Image Quality Assessment

MST-CLIPIQA: decoupling semantics and distortions in AI-image quality

Computer Vision Machine Learning Retrieval-Augmented Generation (RAG)

The paper introduces MST-CLIPIQA, a multi-scale two-stream framework for assessing AI-generated image quality. It argues that monolithic vision-language representations entangle semantic understanding with low-level perceptual sensitivity, and instead decouples them using dual CLIP encoders for hierarchical alignment.

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-06-15 EN Training & Fine-tuning extract

Decision-Weighted Flow Matching for Contextual Stochastic Optimization

DW-FM reweights flow matching toward decision-sensitive regions

Computer Vision Neural Network Reinforcement Learning from Human Feedback (RLHF)

Standard generative scenario models optimize uniform distributional fit rather than downstream decision quality. Decision-Weighted Flow Matching (DW-FM) reweights the velocity-regression objective using decision-sensitive endpoint information, linking downstream regret to pathwise velocity mismatch and providing regret-aligned objectives with guarantees.

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-06-15 EN Multimodal extract

Gen-VCoT: Generative Visual Chain-of-Thought Reasoning via Diffusion-Based RGB Intermediate Representations

Gen-VCoT uses generated RGB visual intermediates for multimodal reasoning

Machine Learning

Gen-VCoT replaces text-only chain-of-thought with generated RGB intermediates, staging visual grounding (SAM), depth (Marigold), and semantic reasoning (Qwen2-VL) under an adaptive router. It improves spatial (+25%) and depth (+50%) questions but can hurt simple factual ones; text CoT still wins on CLEVR, suggesting task-dependent representations.

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-06-15 EN Training & Fine-tuning extract

Skill-to-LoRA: From Using Skills to Learning Behaviors for Token-Efficient LLM Agents

S2L replaces runtime SKILL.md text with skill-specific LoRA adapters

AI Agents Deep Learning Software Engineering

The paper proposes Skill-to-LoRA (S2L), a behavior-centric representation that replaces runtime skill text—commonly distributed as SKILL.md files—with skill-specific LoRA adapters. Rather than compressing the document, S2L models the behavioral change the skill text induces, aiming at more token-efficient LLM agents.

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-15 EN Safety & Evaluation extract

P3B3: A Multi-Turn Conversational Benchmark for Measuring European and Brazilian Portuguese Variety Bias in LLMs

P3B3: a benchmark for Portuguese variety bias in LLMs

The paper introduces P3B3, an expert-curated benchmark and framework for measuring European versus Brazilian Portuguese variety bias in LLMs. It reports most models lean strongly toward pt-BR and argues for more balanced multilingual representation.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-06-15 EN Safety & Evaluation extract

Automated jailbreak attack targeting multiple defense strategies

UNIATTACK: a defense-oriented framework for automated black-box LLM jailbreaks

Retrieval-Augmented Generation (RAG) Speech Processing

The paper presents UNIATTACK, an adversarial testing framework that systematically builds effective black-box attack prompts on LLMs from a defense-oriented perspective. Unlike static templates or model-specific tuning, it extracts minimal but high-impact features from diverse existing attacks and optimizes them.

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-15 EN New Model Releases extract

MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents

MyPCBench: benchmarking personal computer-use agents

AI Agents Claude Neural Network Reinforcement Learning

MyPCBench evaluates computer-use agents as personal assistants on a Linux desktop with 17 simulated web apps and 184 persona-seeded tasks, benchmarking six closed and open-weight models. Reported scores reflect the paper and are not independently verified.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-15 EN New Model Releases extract

Misinformation Propagation in Benign Multi-Agent Systems

Study on misinformation propagation in benign multi-agent systems

AI Agents Reinforcement Learning Software Engineering

The paper injects intent-based misinformation into single- and multi-agent LLM systems and finds it degrades performance and persists through debate, though multi-agent debate can reduce degradation when most agents are uncontaminated. Robustness depends on group composition.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-15 EN Inference & Efficiency extract

Progressive Knowledge-Guided Large Language Model Framework for Bearing Fault Diagnosis

Physics-guided multi-scale framework for bearing fault diagnosis

Inference Reinforcement Learning

The paper proposes a progressive, physics-guided multi-scale vibration-processing pipeline for bearing fault diagnosis, using a kinematics-derived descriptor for real-time screening and fault-adaptive segmentation. Reported figures reflect the abstract and are not independently verified.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-15 EN Funding & M&A extract

Multimodal Evaluator Preference Collapse: Cross-Modal Contagion in Self-Evolving Agents

Paper on evaluator preference collapse in self-evolving agents

AI Agents DeepSeek GPT

An arXiv paper reportedly examining preference collapse in multimodal evaluators and its cross-modal contagion within self-evolving agent systems. The source excerpt was unavailable (content filter), so this summary is based on the title only; see the original for methods and findings.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-15 EN Funding & M&A extract

SCAR: Semantic Continuity-Aware Retrieval for Efficient Context Expansion in RAG

SCAR: semantic continuity-aware retrieval for RAG context expansion

Embeddings Retrieval-Augmented Generation (RAG)

Note: the abstract was unavailable, so this is summarized neutrally from the title alone. The paper proposes SCAR, a 'semantic continuity-aware retrieval' method aimed at efficient context expansion in retrieval-augmented generation (RAG). Specific mechanisms and evaluation results cannot be confirmed from the title.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-15 EN Safety & Evaluation extract

FraudSMSWalker: Benchmarking Agentic Large Language Models for SMS-to-Webpage Fraud Detection

FraudSMSWalker benchmark targets URL-masked SMS-to-webpage fraud

AI Agents Meta Neural Network Reinforcement Learning

The paper introduces FraudSMSWalker, a controlled benchmark for URL-masked SMS-to-webpage fraud judgment. It contains 699 bilingual chains (332 fraudulent, 367 benign) across ten scenarios, withholding raw URLs, hosts, and reputation metadata so models cannot rely on reputation shortcuts, and evaluates nine web agents.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-15 EN Safety & Evaluation extract

Islamic Large Language Models: From Knowledge Acquisition to Trustworthy and Hallucination-Resistant AI

Survey reviews Islamic LLMs and trustworthy, hallucination-resistant AI

Natural Language Processing (NLP) Retrieval-Augmented Generation (RAG) Reinforcement Learning Software Engineering

This survey reviews the emerging field of Islamic LLMs and trustworthy Islamic AI, spanning Arabic NLP, Qur'anic question answering, knowledge benchmarks, retrieval-augmented generation, and legal reasoning. It argues that Arabic fluency alone is insufficient, and that reliable systems need curated sources, verification modules, and citation-aware generation.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-15 EN Safety & Evaluation extract

VeriGraph: Towards Verifiable Data-Analytic Agents

VeriGraph: a traceable neuro-symbolic framework for verifiable data agents

AI Agents Neural Network Software Engineering

This arXiv paper introduces VeriGraph, a traceable neuro-symbolic reasoning framework for verifiable data-analytic agents. The authors note that LLM agents' reliance on linear text trajectories makes reasoning hard to audit, entangling deterministic computations over raw data with semantic deductions over natural-language claims. VeriGraph instead has agents build an explicit heterogeneous evidence directed acyclic graph (DAG) during execution.

Read original (arXiv cs.CL (Computation and Language)) ↗