Safety & Evaluation A
Showing 1–30 of 322
-
米大企業の7割が導入する「Databricks」とは何者か? 評価額20兆円の「AI向けデータ基盤」Databricks, the ~¥20T AI data platform used by 70% of big US firmsFounded in 2013 by the creators of the open-source big-data engine Apache Spark, Databricks has grown into a data and AI platform valued around ¥20 trillion and used by roughly 70% of the Fortune 500. The article traces its rise and latest developments.
-
How Transparent is DiffusionGemma?Probing DiffusionGemma's reasoning transparency in latent spaceDiffusionGemma performs much of its computation in a continuous latent space, raising the question of whether this reduces reasoning transparency. The authors decompose transparency into variable transparency (understanding intermediate computational states) and algorithmic transparency (reconstructing the process behind a model's answer).
-
Optimal Deterministic Multicalibration and OmnipredictionA deterministic algorithm achieving optimal multicalibrationA minimax-optimal multicalibration algorithm that outputs a deterministic predictor, resolving the open question of whether randomization is needed for optimal sample complexity. The result is extended to deterministic predictors satisfying outcome indistinguishability and omniprediction.
-
Multi-Task Bayesian In-Context LearningMulti-task Bayesian inference via in-context learningThe paper studies multi-task Bayesian in-context learning, using in-context learning to perform Bayesian predictive inference across tasks. It targets the intractability of exact inference and the cost or restrictiveness of scalable approximations, aiming for uncertainty quantification and data efficiency.
-
StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMsStylisticBias: few visual cues drive most social bias in MLLMsStylisticBias investigates the visual cues that shape how multimodal large language models judge people. The study finds that a small set of human visual cues drives most of the social biases exhibited by MLLMs, which are increasingly deployed in consequential settings.
-
DeepSWIP: Quotient-WMC Counterfactuals for Neural Probabilistic Logic ProgramsDeepSWIP: quotient-WMC counterfactuals for neural probabilistic logic programsNeurosymbolic systems such as DeepProbLog combine neural perception with probabilistic logic, but standard inference has limits. DeepSWIP introduces quotient-WMC counterfactuals to enable counterfactual reasoning in neural probabilistic logic programs.
-
SARLO-80: Worldwide Slant SAR Language Optic Dataset 80cmSARLO-80: a worldwide 80cm slant SAR-optical datasetMultimodal foundation models have advanced rapidly thanks to large optical benchmarks, but comparable SAR resources are scarce. SARLO-80 provides a worldwide slant-range SAR and optical dataset at 80cm resolution to fill this gap.
-
Multi-LCB: Extending LiveCodeBench to Multiple Programming LanguagesMulti-LCB: extending LiveCodeBench to multiple programming languagesLiveCodeBench has become a widely adopted benchmark for evaluating large language models on code. Multi-LCB extends it to multiple programming languages to assess multilingual code generation.
-
What Do Safety-Aligned LLMs Learn From Mixed Compliance Demonstrations?What safety-aligned LLMs learn from mixed compliance demonstrationsIn-context demonstrations can jailbreak language models, but it has been unclear what safety-aligned models learn when demonstrations mix compliant and non-compliant behavior. This work analyzes that learning behavior.
-
FreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA MiningFreeStyle: dual-reference style-content control via community LoRA miningStyle-content dual-reference generation aims to synthesize an image that preserves structure while adopting a reference style. FreeStyle leverages community LoRA mining to give free control over style and content.
-
Entropy Estimation in Multi-Qutrit Systems via Variational and Classical Neural NetworksEstimating entropy in multi-qutrit systems with VQAs and CNNsThe paper presents a systematic study of von Neumann entropy estimation in multi-qutrit quantum systems, comparing variational quantum algorithms with classical convolutional neural networks on an ideal noise-free simulator for systems of up to three qutrits.
-
Calibration Without Comprehension: Diagnosing the Limits of Fine-Tuning LLMs for Vulnerability Detection in Systems SoftwareDiagnosing whether fine-tuned LLMs comprehend software vulnerabilitiesIt is unclear whether LLMs that score well on vulnerability benchmarks truly reason about security or merely pattern-match. This work diagnoses the limits of fine-tuning LLMs for vulnerability detection in systems software.
-
Contagion Networks: Evaluator Bias Propagation in Multi-Agent LLM SystemsContagion Networks: evaluator bias propagation in multi-agent LLMsWhen large language models act as evaluators in multi-agent systems, their systematic evaluation biases can spread through the system. This work analyzes how such evaluator bias propagates across agents.
-
Beyond Global Replanning: Hierarchical Recovery for Cross-Device Agent SystemsHierarchical recovery for cross-device agent systemsThe paper proposes a hierarchical recovery mechanism for cross-device agent systems, moving beyond coarse-grained global replanning. It targets real-world computer-use tasks that span multiple applications and devices and must coordinate heterogeneous environments under dynamic runtime failures.
-
Your Mouse and Eyes Secretly Leak Your Preference: LLM Alignment using Implicit Feedback from UsersAligning LLMs with implicit user feedback from mouse and gazeThe paper proposes aligning large language models using implicit user signals—such as mouse and eye movements—instead of explicit human feedback. It addresses the limitation that users rarely provide explicit ratings, which makes high-quality preference data scarce for reward modeling.
-
Marginal Advantage Accumulation for Memory-Driven Agent Self-EvolutionMarginal advantage accumulation for self-evolving memory agentsThe paper proposes marginal advantage accumulation, a cross-batch, operation-level mechanism for memory-driven agent self-evolution. It aims to distinguish stably effective memory operations from accidental hits, addressing contradictory feedback that the same operation can receive across different batches in trace distillation.
-
Analyzing Defensive Misdirection Against Model-Guided Automated Attacks on Agentic AI SystemsAnalyzing defensive misdirection against attacks on agentic AIAgentic AI systems increasingly rely on language-model components to interpret instructions, exposing them to attacks. This paper analyzes defensive misdirection as a countermeasure against model-guided automated attacks.
-
Fisher-Geometric Sharpness and the Implicit Bias of SGD toward Flat MinimaFisher-geometric sharpness and SGD's implicit bias to flat minimaThe paper introduces a Fisher-geometric notion of sharpness to study the implicit bias of SGD toward flat minima. It addresses the fact that standard Euclidean flatness measures, such as the trace or maximum eigenvalue of the loss Hessian, are not invariant under reparametrizations that preserve the network function.
-
Agentic Symbolic Search: Characterizing PDEs Beyond Hand-crafted Expressions, Meshes, and Neural NetworksAgentic symbolic search for characterizing PDE solutionsThe paper proposes agentic symbolic search, an approach to characterize partial differential equation solutions through mathematical structures rather than tables of computed values. It targets the structural understanding that neither numerical simulation nor neural networks produce directly, traditionally derived by hand.
-
Data Bias Mitigation under Coverage Constraints & The Price of FairnessData bias mitigation under coverage constraints and fairness costThe paper studies data bias mitigation under coverage constraints and the resulting price of fairness. It addresses discriminatory outcomes for individuals at the intersection of multiple sensitive attributes, including the lack of principled measures for quantifying intersectional bias.
-
Multi-View Decompilation for LLM-Based Malware ClassificationMulti-view decompilation for LLM-based malware classificationMalware analysts often inspect compiled binaries through decompiled pseudo-C when source code is unavailable. This work uses multi-view decompilation to improve LLM-based malware classification.
-
LLM agent safety, multi-turn red-teaming, jailbreak benchmarks, adversarial robustness, safety-critical systemsMulti-turn red-teaming of LLM agents for safety-critical systemsLLM agents are increasingly proposed as supervisory components for safety-critical systems. This work evaluates their safety via multi-turn red-teaming, jailbreak benchmarks, and adversarial robustness tests.
-
DataMagic: Transforming Tabular Data into Data Insight VideoDataMagic: turning tabular data into data-insight videosData videos combine dynamic charts, voice narration, and synchronized animation to convey insights. DataMagic automatically transforms tabular data into such data-insight videos.
-
Towards Modality-imbalanced Federated Graph Learning: A Data Synthesis-based ApproachTackling modality imbalance in federated graph learning via synthesisThe paper addresses modality imbalance in multimodal federated graph learning with a data-synthesis-based approach. It targets two granularities of imbalance—client-level, where some clients lack entire modalities, and node-level, where individual nodes have missing modalities.
-
CRAX: Fast Safe Reinforcement Learning BenchmarkingCRAX: fast benchmarking for safe reinforcement learningSafety is a core concern when deploying reinforcement learning agents in real-world domains. CRAX provides a framework for fast benchmarking of safe reinforcement learning methods.
-
AutoPass: Evidence-Guided LLM Agents for Compiler Performance TuningAutoPass: evidence-guided LLM agents for compiler performance tuningLarge language models show promise for code compilation tasks but struggle with runtime performance tuning. AutoPass uses evidence-guided LLM agents to perform compiler performance tuning.
-
CATCH-ME if you RAG: a dataset of Contextually Annotated multi-Turn Counterspeech against Hate and Misinformation ExchangesCATCH-ME: a counterspeech dataset against hate and misinformationThe paper introduces CATCH-ME, a dataset of contextually annotated multi-turn counterspeech against overlapping hate speech and misinformation. It addresses NLP's tendency to treat the two threats in isolation and the tendency of zero-shot LLMs to produce repetitive, vague counterspeech.
-
Judging to Improve: A De-biased VLM-as-3D-Judge Protocol for Single-Image 3D GenerationUsing a de-biased VLM 3D judge to improve single-image 3D generationThe paper presents a de-biased VLM-as-3D-judge protocol for single-image 3D generation. Building on a cross-model judge that ranks single-image-to-3D mesh quality where geometry and CLIP proxies fall short, it asks whether the judge's preferences can cheaply specialize a strong open generator, TRELLIS, on one asset class such as furniture without human labels.
-
Automating SKILL.md Generation for Computer-Using Agents via Interaction Trajectory MiningAutomating SKILL.md generation via interaction trajectory miningExplicit skill libraries make computer-using agents easier to inspect, but building them is costly. This work automates SKILL.md generation by mining agents' interaction trajectories.
-
Train, Retrieve, or Both? A Four-Arm Head-to-Head for Correct Statutory Citation on the Ontario Residential Tenancies ActTrain, retrieve, or both? Statutory citation on Ontario tenancy lawThe paper runs a four-arm head-to-head comparison of fine-tuning, retrieval, and their combination for producing correct statutory citations on the Ontario Residential Tenancies Act and its core regulation. It targets the practical need of tenants, landlords, and help-desk staff to be pointed at the governing provision.