Safety & Evaluation A
Showing 31–60 of 317
-
Quantum ring all-reduce: communication and privacy advantages for distributed learningQuantum ring all-reduce for efficient, private distributed learningThe paper proposes a quantum ring all-reduce scheme for distributed learning, arguing that quantum communication can make distributed training both more communication-efficient and information-theoretically private. The approach is discussed for both classical and quantum settings.
-
A Model-Driven Approach for Developing Families of Reinforcement Learning EnvironmentsA model-driven approach to building families of RL environmentsThe paper presents a model-driven approach for developing families of reinforcement learning environments. It treats virtual training environments as software-intensive systems and aims to make building these safe, cost-efficient alternatives to real-world training more systematic.
-
Statistical Properties of Training & GeneralizationPhysics-informed view of training and generalization in deep learningThe article investigates the key features and surprises of deep learning's training and generalization from a physics-informed perspective. It examines how deep learning departs from classical statistical intuitions to achieve strong real-world performance, justifying these observations where possible.
-
Shifting-based Optimizable Linear Relaxations for General Activation FunctionsShifting-based linear relaxations for general activation functionsThe paper proposes a shifting-based method for constructing optimizable linear relaxations of general activation functions, used in formal verification of neural networks. It removes the need for hand-crafted relaxations tailored to each activation, supporting formal guarantees in safety- and security-critical settings.
-
PsyScore: A Psychometrically-Aware Framework for Trait-Adaptive Essay Scoring and ZPD-Scaffolded FeedbackPsyScore: psychometric essay scoring with scaffolded feedbackThe paper presents PsyScore, a psychometrically-aware framework for automated essay scoring that adapts to writing traits and provides ZPD-scaffolded feedback. It aims to unify scoring and feedback, which existing methods treat separately, balancing reliable assessment with interpretable, actionable instruction.
-
Editorial Alignment: A Participatory Approach to Engaging Editorial Expertise in LLM-mediated Knowledge DisseminationEditorial alignment: engaging editorial expertise in LLM knowledge disseminationLLM-driven information services are reshaping how public knowledge is produced. This work proposes a participatory approach to engage editorial expertise in LLM-mediated knowledge dissemination.
-
The Register Gap: A Meaning Intelligence Framework for Nigerian Public DiscourseThe Register Gap: a meaning intelligence framework for Nigerian discourseThis work introduces the Meaning Intelligence Framework, a nine-dimension annotation and evaluation scheme, to study the register gap in Nigerian public discourse.
-
Finetuning Vision-Language-Action Models Requires Fewer Layers Than You ThinkFinetuning vision-language-action models needs fewer layers than expectedVision-Language-Action models pre-trained on massive video-robot datasets have transformed robot control. This work shows that finetuning them requires fewer layers than previously assumed.
-
ScholarQuest: A Taxonomy-Guided Benchmark for Agentic Academic Paper Search in Open Literature EnvironmentsScholarQuest: a taxonomy-guided benchmark for agentic paper searchAcademic paper search is a core step in research, and LLM-based search agents are emerging. ScholarQuest provides a taxonomy-guided benchmark for agentic academic paper search in open literature environments.
-
QMFOL: Benchmarking Large Language Model Reasoning via Quantifiable Monadic First-Order Logic Test Case GenerationQMFOL: benchmarking LLM reasoning via first-order logic test generationLarge language models have advanced in reasoning, especially deduction. QMFOL benchmarks LLM reasoning through quantifiable monadic first-order logic test-case generation.
-
Actionable Activation Directions for Detecting and Mitigating Emergent Misalignment Across Language Model FamiliesActivation directions for detecting emergent misalignment in LLMsThe paper investigates whether emergent misalignment—induced by fine-tuning language models on insecure code—corresponds to a causally actionable, shared direction in activation space. Across four instruction-tuned model families, it studies using such directions to detect and mitigate the misalignment.
-
Learner-based Concept Drift Detection: Analysis and EvaluationLearner-based concept drift detection: analysis and evaluationMachine learning deployed in evolving streaming environments must handle non-stationarity. This work analyzes and evaluates learner-based approaches to concept drift detection.
-
CzechDocs: A Multiway Parallel Dataset of Formatted Documents for Minority Languages in CzechiaCzechDocs: a parallel formatted-document MT dataset for CzechiaThe paper presents CzechDocs, a multiway parallel dataset of formatted documents in HTML, DOCX, and PDF covering Czech and minority languages used in Czechia—primarily Ukrainian and English, with smaller amounts of Vietnamese, Russian, and others. It is designed to support evaluation of machine translation.
-
Beyond Accuracy: Measuring Logical Compliance of Predictive ModelsBeyond accuracy: measuring logical compliance of predictive modelsMachine learning models are mostly evaluated through predictive metrics such as accuracy. This work goes beyond accuracy to measure the logical compliance of predictive models.
-
Off-Policy Evaluation for Missingness-Aware Policies in MDPs with Rewards Missing Not at RandomOff-policy evaluation when rewards are missing not at randomThe paper studies off-policy evaluation in finite-horizon MDPs when rewards are missing not at random, as in offline reinforcement learning with sparse, irregular, or censored reward records. It develops missingness-aware policies for settings such as health care and marketing.
-
Apparent Psychological Profiles of Large Language Models are Largely a Measurement ArtifactLLM psychological profiles are largely a measurement artifactPsychological instruments designed for humans are increasingly applied to large language models. This work argues that the apparent psychological profiles of LLMs are largely a measurement artifact.
-
Pitch Spelling Jazz Lead Sheets, Solo Transcriptions, Classical Piano and Monophonic ScoresAn algorithm for pitch spelling and key estimation from MIDIThe paper presents an algorithm for pitch spelling and key estimation across jazz lead sheets, solo transcriptions, classical piano, and monophonic scores. Given MIDI-like input with note pitches and bar boundaries, it estimates note names, a global key signature, and local scales.
-
HilDA: Hierarchical Distillation with Diffusion for Advancing Self-Supervised LiDAR Pre-traininHilDA: hierarchical distillation with diffusion for self-supervised LiDARUsing vision foundation models for camera-to-LiDAR knowledge distillation is promising. HilDA advances self-supervised LiDAR pre-training through hierarchical distillation with diffusion.
-
ReNikud: Audio-Supervised Hebrew Grapheme-to-Phoneme ConversionReNikud: audio-supervised Hebrew grapheme-to-phoneme conversionThe paper presents ReNikud, an audio-supervised approach to grapheme-to-phoneme conversion for Modern Hebrew. It addresses the ambiguity of Hebrew's abjad script, which leaves vowels largely unwritten, going beyond standard pipelines that first predict vowel diacritics (nikud).
-
NAMESAKES: Probing Identity Memorization in Text-to-Image ModelsNAMESAKES: probing identity memorization in text-to-image modelsThe paper introduces NAMESAKES, a study probing identity memorization in text-to-image models, which can generate realistic likenesses of individuals from their names. It addresses the difficulty of telling whether a generated face is memorized or fabricated without ground-truth photos, training data, or white-box model access.
-
Learning to Prompt: Improving Student Engagement with Adaptive LLM-based High-School TutoringAdaptive subject-aware prompting for LLM high-school tutoringThe paper develops an adaptive LLM-based high-school tutoring system with subject-aware prompting, built on 14 pedagogical features—such as tutor scaffolding and student understanding—extracted from transcripts. It aims to improve student engagement where static-prompt tutoring struggles to adapt across disciplines.
-
When Does Streaming Tool Use Help? Characterizing Tool-Intent Stabilization in Streaming Retrieval-Augmented GenerationWhen does streaming tool use help in streaming RAG?The paper characterizes when streaming tool use helps in streaming retrieval-augmented generation, which issues tool queries in parallel with ongoing user input to cut perceived latency. It argues the benefit is query-intrinsic and studies how tool intent stabilizes before an utterance is complete.
-
Self-Preference Is Weak or Absent in Verifiable Instruction-Following Revision: A Four-Model Test Under Genuine AuthorshipSelf-preference is weak in verifiable instruction-following revisionThe paper tests whether large language models resist valid corrections to their own writing during verifiable instruction-following revision. Across four models under genuine authorship, it finds that the documented self-preference bias is weak or absent in this revision setting.
-
IHUBERT: Vector-Based Semantic Deduplication and Domain-Balanced Pretraining for Persian ResourcesIHUBERT: a Persian language model with semantic dedup pretrainingThe paper presents IHUBERT, a monolingual Persian pretrained language model trained from scratch on a RoBERTa-base encoder. It uses vector-based semantic deduplication and domain-balanced pretraining to address the scarcity of large, high-quality Persian corpora and limited evaluation.
-
Improving health intelligence in ChatGPTOpenAI improves ChatGPT health responses with GPT-5.5 InstantOpenAI says GPT-5.5 Instant strengthens ChatGPT's health and wellness responses through better reasoning, richer context, and clearer communication. The work is backed by physician-informed evaluations aimed at delivering more reliable, trustworthy health guidance.
-
Source-Grounded Data Generation for Text-to-JSON LearningSource-grounded data generation for text-to-JSON extractionThe paper proposes source-grounded data generation for text-to-JSON learning, where models extract information from long unstructured documents into structured, machine-readable JSON. It targets domains such as financial filings and clinical records that store high-value information in unstructured text.
-
When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM AgentsInvestigating over-privileged tool selection in LLM agentsThe paper investigates over-privileged tool selection in LLM agents, which autonomously choose among tools with different privilege levels. It addresses a gap in prior tool-selection research, which focuses on safety-agnostic metadata preferences, by studying when lower-privilege tools would suffice.
-
Connect the Dots: Training LLMs for Long-Lifecycle Agents with Cross-Domain Generalization Via Reinforcement LearningConnect the Dots: RL training for long-lifecycle LLM agentsThe paper presents Connect the Dots (CoD), a reinforcement-learning framework for training large language models as long-lifecycle agents. It targets the meta-capability of solving a long sequence of tasks while continuously exploring an environment, aiming for cross-domain generalization.
-
Investigating Human-Model Discrepancies in Speech Quality Assessment via Acoustic and Prosodic PerturbationsHuman-model gaps in speech quality assessment under perturbationsThe paper investigates discrepancies between human judgments and MOS prediction models in speech quality assessment, using controlled acoustic and prosodic perturbations. It probes whether these models, widely used as proxy metrics in text-to-speech research, capture quality differences beyond acoustic fidelity.
-
Light-weight Pronunciation Assessment via Discrete Speech Token SurprisalLightweight pronunciation assessment via speech token surprisalThe paper proposes a lightweight framework for automated pronunciation assessment based on discrete speech token surprisal, trained only on native speech resources. It operates unsupervised or with light calibration from a small set of scored utterances, avoiding costly labeled learner-error or non-native corpora.