Safety & Evaluation A

Showing 1–30 of 322
  • ITmedia AI+ · JA Safety & Evaluation extract
    米大企業の7割が導入する「Databricks」とは何者か? 評価額20兆円の「AI向けデータ基盤」
    Databricks, the ~¥20T AI data platform used by 70% of big US firms
    Founded in 2013 by the creators of the open-source big-data engine Apache Spark, Databricks has grown into a data and AI platform valued around ¥20 trillion and used by roughly 70% of the Fortune 500. The article traces its rise and latest developments.
    Read original (ITmedia AI+) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    How Transparent is DiffusionGemma?
    Probing DiffusionGemma's reasoning transparency in latent space
    Algorithms & Theory
    DiffusionGemma performs much of its computation in a continuous latent space, raising the question of whether this reduces reasoning transparency. The authors decompose transparency into variable transparency (understanding intermediate computational states) and algorithmic transparency (reconstructing the process behind a model's answer).
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Infrastructure & Hardware extract
    Optimal Deterministic Multicalibration and Omniprediction
    A deterministic algorithm achieving optimal multicalibration
    Machine Learning
    A minimax-optimal multicalibration algorithm that outputs a deterministic predictor, resolving the open question of whether randomization is needed for optimal sample complexity. The result is extended to deterministic predictors satisfying outcome indistinguishability and omniprediction.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Multi-Task Bayesian In-Context Learning
    Multi-task Bayesian inference via in-context learning
    Inference Meta Reinforcement Learning Transformer
    The paper studies multi-task Bayesian in-context learning, using in-context learning to perform Bayesian predictive inference across tasks. It targets the intractability of exact inference and the cost or restrictiveness of scalable approximations, aiming for uncertainty quantification and data efficiency.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs
    StylisticBias: few visual cues drive most social bias in MLLMs
    Machine Learning Reinforcement Learning
    StylisticBias investigates the visual cues that shape how multimodal large language models judge people. The study finds that a small set of human visual cues drives most of the social biases exhibited by MLLMs, which are increasingly deployed in consequential settings.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    DeepSWIP: Quotient-WMC Counterfactuals for Neural Probabilistic Logic Programs
    DeepSWIP: quotient-WMC counterfactuals for neural probabilistic logic programs
    Inference Reinforcement Learning
    Neurosymbolic systems such as DeepProbLog combine neural perception with probabilistic logic, but standard inference has limits. DeepSWIP introduces quotient-WMC counterfactuals to enable counterfactual reasoning in neural probabilistic logic programs.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    SARLO-80: Worldwide Slant SAR Language Optic Dataset 80cm
    SARLO-80: a worldwide 80cm slant SAR-optical dataset
    Deep Learning Reinforcement Learning
    Multimodal foundation models have advanced rapidly thanks to large optical benchmarks, but comparable SAR resources are scarce. SARLO-80 provides a worldwide slant-range SAR and optical dataset at 80cm resolution to fill this gap.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages
    Multi-LCB: extending LiveCodeBench to multiple programming languages
    Reinforcement Learning Software Engineering
    LiveCodeBench has become a widely adopted benchmark for evaluating large language models on code. Multi-LCB extends it to multiple programming languages to assess multilingual code generation.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    What Do Safety-Aligned LLMs Learn From Mixed Compliance Demonstrations?
    What safety-aligned LLMs learn from mixed compliance demonstrations
    In-context demonstrations can jailbreak language models, but it has been unclear what safety-aligned models learn when demonstrations mix compliant and non-compliant behavior. This work analyzes that learning behavior.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    FreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA Mining
    FreeStyle: dual-reference style-content control via community LoRA mining
    Retrieval-Augmented Generation (RAG)
    Style-content dual-reference generation aims to synthesize an image that preserves structure while adopting a reference style. FreeStyle leverages community LoRA mining to give free control over style and content.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Entropy Estimation in Multi-Qutrit Systems via Variational and Classical Neural Networks
    Estimating entropy in multi-qutrit systems with VQAs and CNNs
    Algorithms & Theory Neural Network Software Engineering
    The paper presents a systematic study of von Neumann entropy estimation in multi-qutrit quantum systems, comparing variational quantum algorithms with classical convolutional neural networks on an ideal noise-free simulator for systems of up to three qutrits.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    Calibration Without Comprehension: Diagnosing the Limits of Fine-Tuning LLMs for Vulnerability Detection in Systems Software
    Diagnosing whether fine-tuned LLMs comprehend software vulnerabilities
    Fine-tuning Neural Network Reinforcement Learning
    It is unclear whether LLMs that score well on vulnerability benchmarks truly reason about security or merely pattern-match. This work diagnoses the limits of fine-tuning LLMs for vulnerability detection in systems software.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Contagion Networks: Evaluator Bias Propagation in Multi-Agent LLM Systems
    Contagion Networks: evaluator bias propagation in multi-agent LLMs
    AI Agents DeepSeek Reinforcement Learning
    When large language models act as evaluators in multi-agent systems, their systematic evaluation biases can spread through the system. This work analyzes how such evaluator bias propagates across agents.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Beyond Global Replanning: Hierarchical Recovery for Cross-Device Agent Systems
    Hierarchical recovery for cross-device agent systems
    AI Agents Neural Network Reinforcement Learning
    The paper proposes a hierarchical recovery mechanism for cross-device agent systems, moving beyond coarse-grained global replanning. It targets real-world computer-use tasks that span multiple applications and devices and must coordinate heterogeneous environments under dynamic runtime failures.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Training & Fine-tuning extract
    Your Mouse and Eyes Secretly Leak Your Preference: LLM Alignment using Implicit Feedback from Users
    Aligning LLMs with implicit user feedback from mouse and gaze
    Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning Reinforcement Learning from Human Feedback (RLHF)
    The paper proposes aligning large language models using implicit user signals—such as mouse and eye movements—instead of explicit human feedback. It addresses the limitation that users rarely provide explicit ratings, which makes high-quality preference data scarce for reward modeling.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN Inference & Efficiency extract
    Marginal Advantage Accumulation for Memory-Driven Agent Self-Evolution
    Marginal advantage accumulation for self-evolving memory agents
    The paper proposes marginal advantage accumulation, a cross-batch, operation-level mechanism for memory-driven agent self-evolution. It aims to distinguish stably effective memory operations from accidental hits, addressing contradictory feedback that the same operation can receive across different batches in trace distillation.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Analyzing Defensive Misdirection Against Model-Guided Automated Attacks on Agentic AI Systems
    Analyzing defensive misdirection against attacks on agentic AI
    AI Agents Reinforcement Learning Speech Processing
    Agentic AI systems increasingly rely on language-model components to interpret instructions, exposing them to attacks. This paper analyzes defensive misdirection as a countermeasure against model-guided automated attacks.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Fisher-Geometric Sharpness and the Implicit Bias of SGD toward Flat Minima
    Fisher-geometric sharpness and SGD's implicit bias to flat minima
    Deep Learning Neural Network
    The paper introduces a Fisher-geometric notion of sharpness to study the implicit bias of SGD toward flat minima. It addresses the fact that standard Euclidean flatness measures, such as the trace or maximum eigenvalue of the loss Hessian, are not invariant under reparametrizations that preserve the network function.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Agentic Symbolic Search: Characterizing PDEs Beyond Hand-crafted Expressions, Meshes, and Neural Networks
    Agentic symbolic search for characterizing PDE solutions
    Neural Network
    The paper proposes agentic symbolic search, an approach to characterize partial differential equation solutions through mathematical structures rather than tables of computed values. It targets the structural understanding that neither numerical simulation nor neural networks produce directly, traditionally derived by hand.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Data Bias Mitigation under Coverage Constraints & The Price of Fairness
    Data bias mitigation under coverage constraints and fairness cost
    Machine Learning Meta Retrieval-Augmented Generation (RAG) Reinforcement Learning
    The paper studies data bias mitigation under coverage constraints and the resulting price of fairness. It addresses discriminatory outcomes for individuals at the intersection of multiple sensitive attributes, including the lack of principled measures for quantifying intersectional bias.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Multi-View Decompilation for LLM-Based Malware Classification
    Multi-view decompilation for LLM-based malware classification
    Neural Network Retrieval-Augmented Generation (RAG)
    Malware analysts often inspect compiled binaries through decompiled pseudo-C when source code is unavailable. This work uses multi-view decompilation to improve LLM-based malware classification.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    LLM agent safety, multi-turn red-teaming, jailbreak benchmarks, adversarial robustness, safety-critical systems
    Multi-turn red-teaming of LLM agents for safety-critical systems
    AI Agents Neural Network Reinforcement Learning
    LLM agents are increasingly proposed as supervisory components for safety-critical systems. This work evaluates their safety via multi-turn red-teaming, jailbreak benchmarks, and adversarial robustness tests.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    DataMagic: Transforming Tabular Data into Data Insight Video
    DataMagic: turning tabular data into data-insight videos
    Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning
    Data videos combine dynamic charts, voice narration, and synchronized animation to convey insights. DataMagic automatically transforms tabular data into such data-insight videos.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Multimodal extract
    Towards Modality-imbalanced Federated Graph Learning: A Data Synthesis-based Approach
    Tackling modality imbalance in federated graph learning via synthesis
    The paper addresses modality imbalance in multimodal federated graph learning with a data-synthesis-based approach. It targets two granularities of imbalance—client-level, where some clients lack entire modalities, and node-level, where individual nodes have missing modalities.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    CRAX: Fast Safe Reinforcement Learning Benchmarking
    CRAX: fast benchmarking for safe reinforcement learning
    AI Agents Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning Robotics
    Safety is a core concern when deploying reinforcement learning agents in real-world domains. CRAX provides a framework for fast benchmarking of safe reinforcement learning methods.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Inference & Efficiency extract
    AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning
    AutoPass: evidence-guided LLM agents for compiler performance tuning
    AI Agents Fine-tuning Inference
    Large language models show promise for code compilation tasks but struggle with runtime performance tuning. AutoPass uses evidence-guided LLM agents to perform compiler performance tuning.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    CATCH-ME if you RAG: a dataset of Contextually Annotated multi-Turn Counterspeech against Hate and Misinformation Exchanges
    CATCH-ME: a counterspeech dataset against hate and misinformation
    Neural Network Natural Language Processing (NLP) Retrieval-Augmented Generation (RAG) Reinforcement Learning Speech Processing
    The paper introduces CATCH-ME, a dataset of contextually annotated multi-turn counterspeech against overlapping hate speech and misinformation. It addresses NLP's tendency to treat the two threats in isolation and the tendency of zero-shot LLMs to produce repetitive, vague counterspeech.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Judging to Improve: A De-biased VLM-as-3D-Judge Protocol for Single-Image 3D Generation
    Using a de-biased VLM 3D judge to improve single-image 3D generation
    Reinforcement Learning Software Engineering
    The paper presents a de-biased VLM-as-3D-judge protocol for single-image 3D generation. Building on a cross-model judge that ranks single-image-to-3D mesh quality where geometry and CLIP proxies fall short, it asks whether the judge's preferences can cheaply specialize a strong open generator, TRELLIS, on one asset class such as furniture without human labels.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    Automating SKILL.md Generation for Computer-Using Agents via Interaction Trajectory Mining
    Automating SKILL.md generation via interaction trajectory mining
    AI Agents Neural Network Reinforcement Learning
    Explicit skill libraries make computer-using agents easier to inspect, but building them is costly. This work automates SKILL.md generation by mining agents' interaction trajectories.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Training & Fine-tuning extract
    Train, Retrieve, or Both? A Four-Arm Head-to-Head for Correct Statutory Citation on the Ontario Residential Tenancies Act
    Train, retrieve, or both? Statutory citation on Ontario tenancy law
    Deep Learning Fine-tuning Neural Network Retrieval-Augmented Generation (RAG)
    The paper runs a four-arm head-to-head comparison of fine-tuning, retrieval, and their combination for producing correct statutory citations on the Ontario Residential Tenancies Act and its core regulation. It targets the practical need of tenants, landlords, and help-desk staff to be pointed at the governing provision.
    Read original (arXiv cs.LG (Machine Learning)) ↗