Safety & Evaluation (Page 3 of 4)｜AI/Tech News Trends

arXiv cs.CL (Computation and Language) · 2026-07-29 EN Training & Fine-tuning

Constitutional Midtraining: Content Presence Drives Alignment Gains

Anthropic Fine-tuning Machine Learning Retrieval-Augmented Generation (RAG)

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-29 EN Safety & Evaluation

Prosody-driven Jailbreaks in Audio LLMs: A Controlled Study and Mechanistic Analysis

GPT Speech Processing

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-29 EN Training & Fine-tuning

Misalignment Has a Personality: A Big Five Account of Emergent Misalignment

Deep Learning Fine-tuning Reinforcement Learning Software Engineering

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-29 EN Multimodal

Symphony of Bias: Exploring Gender Associations with Musical Instruments in Multimodal LLMs

Neural Network Reinforcement Learning

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-28 EN Safety & Evaluation

Aligning LLM-Simulated and Human Examinees for Psychometric Calibration: A Cognitive Diagnostic Profiling Approach

Gemini Reinforcement Learning

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-28 EN Multimodal

VetClaw: An Edge-Cloud Multimodal Agentic System for Veterinary Disease Screening

Computer Vision Deep Learning Reinforcement Learning

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-28 EN New Model Releases

Falling Behind Drives Unsafe Development in an Idealised AI Race Experiment

Deep Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-28 EN Multimodal

Evaluating Multi-Turn Multimodal Diagnostic Reasoning on Challenging Real-World Clinical Cases

Machine Learning Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-28 EN Multimodal

SAM3D-Guided Object-Centric Representation Alignment for Vision-Language-Action Models

Computer Vision Inference Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-28 EN Inference & Efficiency

Minimizing Targeted Activations: Input-Only Suppression of Evaluation-Awareness Latents in Large Language Models

Inference Llama Machine Learning Neural Network Software Engineering

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-28 EN Developer Tools

How Do LLMs Read Bug Reports? An Empirical Study of Attention in LLMs for Automated Program Repair

Deep Learning Meta Reinforcement Learning Software Engineering

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-28 EN New Model Releases

Shieldstral

Reinforcement Learning Software Engineering

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-28 EN Safety & Evaluation

Evaluation of Adversarial Robustness in Arabic Language Models

Natural Language Processing (NLP) Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-28 EN Infrastructure & Hardware

Rashomon Alignment

Algorithms & Theory Machine Learning Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-28 EN Safety & Evaluation

AI's Capability in Assisting Scientific Research in Physics, Astrophysics, and Cosmology I: Literature Review

Gemini GPT Meta Reinforcement Learning

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-28 EN Safety & Evaluation

Construction-Driven Injection: Linguistically-Grounded Edit-Based Code-Mixing Fingerprints for Large Language Models

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-28 EN Training & Fine-tuning

MemSFT: Mitigating Alignment Tax with an External Parametric Memory

Fine-tuning

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-28 EN Safety & Evaluation

Evaluation of forced alignment of code-mixed speech: the case of Hindi-English

Speech Processing

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-28 EN Inference & Efficiency

IRIS: Reusable Identity Representations from Frozen LLMs for Entity Alignment

Inference Reinforcement Learning

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-28 EN New Model Releases

AMPBench-MT: A Homology-Controlled Benchmark for Antimicrobial Peptide Potency, Spectrum, and Safety Prediction

Embeddings Neural Network Reinforcement Learning from Human Feedback (RLHF)

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-28 EN New Model Releases

Phase Structure in Rotary Attention: A Spectral Framework for Semantic Continuity and Execution-Boundary Governance

Embeddings Machine Learning Neural Network Transformer

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-28 EN Agents & Tool Use

PatientAgentBench: A Benchmark Framework for Evaluating Patient-Facing Health AI Agents

AI Agents Neural Network Software Engineering

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-28 EN Safety & Evaluation

Data-Dependent Regret and Polyak Corrections for Constrained Online Convex Optimization

Neural Network

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-28 EN Safety & Evaluation

Emergent Latent-State Computation under Stochastic Volatility

Transformer

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-28 EN Safety & Evaluation

Inspect India Evals: An Open Benchmarking Framework for Evaluating Large Language Models in the Indian Linguistic and Cultural Context

Machine Learning Meta

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-28 EN New Model Releases

MyoCardBench: A Real-World Data Benchmark for Evaluating Large Language Models in Clinically Authentic Cardiovascular Care Scenarios

Gemini GPT Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.CL (Computation and Language)) ↗

ITmedia AI+ · 2026-07-27 JA Safety & Evaluation extract

NVIDIAやMicrosoftなど30社超、オープンAIの防御ツール共同開発の「Open Secure AI Alliance」設立

30+ firms including NVIDIA, Microsoft form Open Secure AI Alliance

Microsoft NVIDIA

NVIDIA, Microsoft, SpaceX AI and 30-plus others launched the Open Secure AI Alliance, an initiative to improve the safety of open AI models and jointly develop cybersecurity tools. Using open technologies to patch software vulnerabilities and co-build defensive tooling, the group frames open models as a defensive asset against excessive regulation.

Read original (ITmedia AI+) ↗

arXiv cs.CL (Computation and Language) · 2026-07-27 EN Training & Fine-tuning

Towards Robust Reinforcement Learning for Small-Scale Language Model Agents

AI Agents Fine-tuning Neural Network Reinforcement Learning

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-27 EN Multimodal

Evidence Attribution in Visual Document Understanding without Coordinates or Region Labels

Computer Vision Software Engineering

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-27 EN Safety & Evaluation

D-Score: A Spectral Hidden-State Signal for Hallucination Detection in Large Language Models

Neural Network Retrieval-Augmented Generation (RAG)