Safety & Evaluation A

Showing 61–90 of 317
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    REDACT: A Systematically Controlled Multilingual Benchmark for Personal Information Detection
    REDACT: a controlled multilingual benchmark for PII detection
    Claude GPT Meta Neural Network OpenAI
    The paper presents REDACT, a systematically controlled multilingual benchmark for personal information (PII) detection. It addresses limitations of existing corpora—few entity types, ad hoc generation, and little insight into which surface conditions cause detector failures.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    The Almost Intelligent Revolution: Options for Scaling Up Deliberation and Empowering People with AI
    Scaling up democratic deliberation and empowering people with AI
    Embeddings Reinforcement Learning
    The paper discusses options for scaling up democratic deliberation and empowering people with AI as large language models become prominent in public discourse. It weighs opportunities against persistent concerns such as linguistic constraints, biases, and the sycophantic tendencies of LLMs, beyond what red teaming addresses.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    Large Language Models Do Not Always Need Readable Language
    LLMs don't always need human-readable language
    Neural Network
    The paper investigates whether semantic information can be encoded in compact, non-standard text that sacrifices human readability while remaining usable by models. It argues large language models do not always need human-readable language, especially when the intended reader is another model.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    Prompt, Plan, Extract: Zero-Shot Agentic LLMs Workflows for Lung Pathology Extraction from Clinical Narratives
    Zero-shot agentic LLM workflows for lung pathology extraction
    GPT Neural Network Natural Language Processing (NLP)
    The paper presents Prompt, Plan, Extract, a zero-shot agentic LLM workflow for extracting lung pathology information from clinical narrative reports. It targets the labor-intensive, error-prone manual extraction needed for cancer staging and tumor registries, avoiding fully supervised NLP pipelines.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    AtomMem: Building Simple and Effective Memory System for LLM Agents via Atomic Facts
    AtomMem: an LLM-agent memory system built on atomic facts
    AI Agents Neural Network Retrieval-Augmented Generation (RAG)
    The paper proposes AtomMem, a simple and effective memory system for LLM agents built around atomic facts. It addresses the limits of fixed context windows for accumulating and reusing information across sessions, and the coarse, unstable memory of existing memory-augmented systems.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    Leverage Is Not Reach: A Control-Window Law for Single-Neuron Steering in Language Models
    A control-window law for single-neuron steering in LLMs
    Retrieval-Augmented Generation (RAG)
    The paper develops a budget-normalized control-window framework for single-neuron steering in language models. It seeks to predict when intervening on one neuron coherently controls a behavior—such as refusal or language routing gated by sparse feed-forward neurons—rather than collapsing the output.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    JAMER: Project-Level Code Framework Dataset and Benchmark on Professional Game Engines
    JAMER: a project-level code benchmark on game engines
    AI Agents Deep Learning
    The paper introduces JAMER, a project-level code framework dataset and benchmark for professional game engines. It addresses the lack of large-scale datasets and deterministic evaluation for project-level code engineering, which has remained underexplored despite progress in AI-driven game asset and gameplay generation.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    CREDENCE: Claim Reduction for Decomposition & Enhanced Credibility -- Semantic Metrics and Convergence Analysis
    CREDENCE: semantic metrics for claim decomposition in fact-checking
    Neural Network Reinforcement Learning
    The paper presents CREDENCE, an approach to decomposing compound sentences into atomic, verifiable claims for automated fact-checking. It introduces semantic metrics that avoid token-overlap measures, which underestimate quality for paraphrastic claims, and adds convergence and termination analysis.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    CombEval: A Framework for Evaluating Combinatorial Counting in Large Language Models
    CombEval: evaluating combinatorial counting in LLMs
    Reinforcement Learning Software Engineering
    The paper presents CombEval, a dynamic benchmark for evaluating combinatorial counting in large language models. Each problem is expressed as a typed Cofola specification over entities, combinatorial objects, dependencies, and constraints, enabling controlled generation of natural-language counting problems.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Developer Tools extract
    AgentFinVQA: A Deployable Multi-Agent Pipeline for Auditable Financial Chart QA
    AgentFinVQA: an auditable multi-agent pipeline for financial chart QA
    AI Agents Gemini Neural Network Reinforcement Learning Software Engineering
    The paper presents AgentFinVQA, a deployable multi-agent pipeline for auditable financial chart question answering. It targets regulated settings where practitioners must know which answers to trust and cannot send client data to external model providers, unlike existing accuracy-focused, opaque chart-QA agents.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models
    Manifold Bandits: Bayesian curriculum learning for LLM reasoning
    Retrieval-Augmented Generation (RAG) Reinforcement Learning
    The paper proposes Manifold Bandits, a Bayesian curriculum-learning method that samples training problems over the latent geometry of large language models. It targets reinforcement learning for LLM reasoning, where training efficiency depends heavily on how prompts are selected during optimization.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Benchmarking Agentic Review Systems
    Benchmarking agentic peer-review systems
    GPT Neural Network OpenAI
    The paper benchmarks agentic review systems, which are emerging to relieve the pressure AI-assisted research places on peer review. It evaluates two open-source systems, one proprietary system, and a zero-shot baseline, addressing the open question of how such systems should be assessed.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Training & Fine-tuning extract
    Beyond Uniform Forgetting: A Study of Sequential Direct Preference Optimization Across Preference Settings
    Sequential DPO and forgetting across preference settings
    Llama Machine Learning Reinforcement Learning Reinforcement Learning from Human Feedback (RLHF)
    The paper studies sequential Direct Preference Optimization (DPO) across different preference settings, examining how applying multiple alignment objectives one after another affects earlier ones. It looks beyond uniform forgetting to understand how later training stages interfere with previously learned preferences.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Multimodal extract
    NRITYAM: Language Models Meet Art and Heritage of Dance
    NRITYAM: a benchmark for cultural comprehension of dance traditions
    Neural Network Reinforcement Learning Software Engineering
    The paper presents NRITYAM, a benchmark for evaluating how well language models comprehend culture in the context of global dance traditions. It addresses the gap that the global effectiveness of language models depends on a nuanced understanding of local socio-cultural contexts.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • Hugging Face Blog · EN Safety & Evaluation extract
    Is it agentic enough? Benchmarking open models on your own tooling
    Hugging Face benchmarks open models' agentic skill on your own tools
    Hugging Face explores how to judge whether open models are 'agentic enough' by benchmarking them on your own tooling rather than generic suites. The approach evaluates models under realistic, user-specific tool setups to better gauge practical agent capability.
    Read original (Hugging Face Blog) ↗
  • Anthropic News · EN Industry Adoption extract
    Anthropic opens Seoul office and announces new partnerships across the Korean AI ecosystem
    Anthropic opens a Seoul office, announces new Korean AI partnerships
    Anthropic Claude Neural Network Reinforcement Learning
    Anthropic opened a Seoul office and announced new partnerships across Korea's AI ecosystem, including enterprises, startups, and researchers building on Claude. It frames Korea as treating innovation and safety as two sides of the same coin. Specifics are per the announcement and unverified independently.
    Read original (Anthropic News) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    Learning User Simulators with Turing Rewards
    User simulators learned with Turing rewards for agent training
    Reinforcement Learning
    Simulating human users in interactive settings could advance training of agent assistants, evaluation of personalization systems, and social-science research. This work learns user simulators using Turing rewards, aiming to reproduce more realistic user behavior.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    UBP2: Uncertainty-Balanced Preference Planning for Efficient Preference-based Reinforcement Learning
    UBP2: uncertainty-balanced planning for efficient preference-based RL
    Meta Neural Network Reinforcement Learning
    Preference-based RL learns reward models from pairwise behavior comparisons, bypassing explicit reward design, but existing methods often rely on passive data collection. UBP2 introduces uncertainty-balanced preference planning to actively select comparisons and learn efficiently from fewer preferences.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Inference & Efficiency extract
    Rethinking Reward Supervision: Rubric-Conditioned Self-Distillation
    Rubric-conditioned self-distillation rethinks reward supervision
    Neural Network Reinforcement Learning
    Post-training of reasoning models often combines supervised distillation with reinforcement learning from verifiable rewards, but distillation relies on costly chain-of-thought annotations. This work proposes rubric-conditioned self-distillation to rethink reward supervision while cutting annotation cost.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Reference-Driven Multi-Speaker Audio Scene Generation from In-the-Wild Priors
    Reference-driven generation of multi-speaker audio scenes
    Embeddings Retrieval-Augmented Generation (RAG) Reinforcement Learning Speech Processing
    Existing multi-speaker dialogue systems bind speakers to utterances through structured supervision such as per-turn tags, multi-stream transcriptions, or learnable speaker embeddings. This work generates multi-speaker audio scenes by drawing on in-the-wild reference priors for more natural synthesis.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Industry Adoption extract
    Data Intelligence Agents: Interpreting, Modeling, and Querying Enterprise Data via Autonomous Coding Agents
    Data Intelligence Agents query enterprise data autonomously
    AI Agents Neural Network
    Production data integration is bottlenecked by repeated, lossy handoffs among data owners, engineers, and analysts who must jointly discover, structure, and query enterprise data. The authors present Data Intelligence Agents (DIA), autonomous coding agents that interpret, model, and query that data.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Explaining Attention with Program Synthesis
    Explaining attention via program synthesis for interpretability
    GPT Llama Retrieval-Augmented Generation (RAG) Software Engineering Transformer
    A longstanding goal of interpretable deep learning is to replace opaque neural computations with human-meaningful symbolic descriptions. This paper approximates the behavior of attention components with synthesized programs, offering a route to explain attention and improve interpretability.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Confidence is Not Reliability: Rethinking MC Dropout in Brain Tumour Segmentation
    Confidence is not reliability: rethinking MC dropout in tumour segmentation
    Neural Network Reinforcement Learning
    Glioma segmentation in multiparametric MRI is critical for treatment planning, and a model that fails silently on treatment-critical sub-regions is a patient-safety risk that overlap metrics miss. This work shows MC dropout confidence does not equal reliability, rethinking uncertainty estimation.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Multimodal extract
    Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action Models
    Measuring commonsense and knowledge retention in VLA models
    AI Agents Computer Vision Fine-tuning Robotics Software Engineering
    Embodied Vision-Language-Action (VLA) models are typically obtained by fine-tuning powerful pretrained VLMs on robotics data, yet how much commonsense and factual knowledge they retain is unclear. This work measures that retention, revealing how much fine-tuning erodes prior world knowledge.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    NeSyCat Torch: A Differentiable Tensor Implementation of Categorical Semantics for Neurosymbolic Learning
    NeSyCat Torch unifies neurosymbolic semantics via category theory
    Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning
    Neurosymbolic semantics is fragmented: classical, fuzzy, probabilistic, and neural systems each define truth by their own rules. Extending ULLER, NeSyCat subsumes them under a single inductive definition of truth, delivered as a differentiable tensor implementation for neurosymbolic learning.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Beyond Algorithms: Conceptual Innovation in Medical Imaging AI
    Beyond algorithms: the case for conceptual innovation in medical imaging AI
    Algorithms & Theory Deep Learning Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning
    AI has driven rapid progress in medical imaging, yielding ever more sophisticated algorithms and steady benchmark gains. Yet this algorithm-centric trajectory reveals limits. This work argues for conceptual innovation beyond algorithms to achieve clinically meaningful advances in medical imaging AI.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    Trade-offs in Medical LLM Adaptation: An Empirical Study in French QA
    Trade-offs in medical LLM adaptation, studied on French QA
    Fine-tuning Reinforcement Learning Software Engineering
    As LLMs are adapted to specialized domains and languages, the effectiveness of adaptation strategies remains unclear. This empirical study on French medical question answering analyzes the trade-offs of various domain-adaptation methods, clarifying gains and losses in performance and generality.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Inference & Efficiency extract
    Structured Inference with Large Language Gibbs
    Structured probabilistic inference over LLMs via Gibbs sampling
    Inference Neural Network Reinforcement Learning
    Knowledge encoded in LLMs can serve as a substrate for structured reasoning over variables describing a complex world, but accessing it probabilistically is hard. This work performs structured inference over LLMs using Gibbs sampling, enabling probabilistic reasoning across interrelated variables.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Multimodal extract
    A Multi-Domain Benchmark for Detecting AI-Generated Text-Rich Images from GPT-Image-2
    A multi-domain benchmark to detect GPT-Image-2 text-rich images
    Computer Vision GPT OpenAI Retrieval-Augmented Generation (RAG)
    Text-rich images often hold privacy-sensitive, transactional, or decision-relevant information. As multimodal generators synthesize realistic text and layouts, this work builds a multi-domain benchmark for detecting AI-generated text-rich images from GPT-Image-2, assessing detector reliability.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Inference & Efficiency extract
    DreamReasoner-8B: Block-Size Curriculum Learning for Diffusion Reasoning Models
    DreamReasoner-8B: block-size curriculum learning for diffusion reasoning
    Inference Machine Learning
    Block diffusion language models speed decoding via parallel block-wise denoising, but reliably scaling them for long chain-of-thought reasoning is unresolved. The authors develop DreamReasoner-8B, using block-size curriculum learning to strengthen long-CoT reasoning in diffusion reasoning models.
    Read original (arXiv cs.CL (Computation and Language)) ↗