Developer Tools B
Showing 61–90 of 312
-
Segment-Level Mandarin Chinese Speech-Based Cognitive Impairment Detection via an Autoencoder with Contrastive LearningMandarin speech-based cognitive impairment detection via autoencoderThe paper proposes a segment-level method for detecting cognitive impairment from Mandarin Chinese speech, using an autoencoder with contrastive learning. It targets the challenges of limited labeled data and cross-dataset variability in speech-based screening, where speech serves as a low-cost, non-invasive biomarker.
-
Investigating Human-Model Discrepancies in Speech Quality Assessment via Acoustic and Prosodic PerturbationsHuman-model gaps in speech quality assessment under perturbationsThe paper investigates discrepancies between human judgments and MOS prediction models in speech quality assessment, using controlled acoustic and prosodic perturbations. It probes whether these models, widely used as proxy metrics in text-to-speech research, capture quality differences beyond acoustic fidelity.
-
GEMS: Geometric Constraints Enable Multi-Semantic Superposition in LLMsGEMS: geometric constraints for multi-semantic activation steeringThe paper introduces GEMS, which uses geometric constraints to enable superposing multiple semantic directions in LLM activation steering. It addresses the collapse that occurs when existing single-direction steering methods inject several semantic directions at once without constraints.
-
Light-weight Pronunciation Assessment via Discrete Speech Token SurprisalLightweight pronunciation assessment via speech token surprisalThe paper proposes a lightweight framework for automated pronunciation assessment based on discrete speech token surprisal, trained only on native speech resources. It operates unsupervised or with light calibration from a small set of scored utterances, avoiding costly labeled learner-error or non-native corpora.
-
Using AI to help physicians diagnose rare genetic diseases affecting childrenOpenAI reasoning model aids 18 new rare-disease diagnoses in childrenResearchers used an OpenAI reasoning model to help physicians diagnose rare genetic diseases in children, identifying 18 new diagnoses among previously unsolved cases. The work suggests AI can support complex clinical reasoning and improve diagnosis of rare conditions.
-
The Almost Intelligent Revolution: Options for Scaling Up Deliberation and Empowering People with AIScaling up democratic deliberation and empowering people with AIThe paper discusses options for scaling up democratic deliberation and empowering people with AI as large language models become prominent in public discourse. It weighs opportunities against persistent concerns such as linguistic constraints, biases, and the sycophantic tendencies of LLMs, beyond what red teaming addresses.
-
Large Language Models Do Not Always Need Readable LanguageLLMs don't always need human-readable languageThe paper investigates whether semantic information can be encoded in compact, non-standard text that sacrifices human readability while remaining usable by models. It argues large language models do not always need human-readable language, especially when the intended reader is another model.
-
Leverage Is Not Reach: A Control-Window Law for Single-Neuron Steering in Language ModelsA control-window law for single-neuron steering in LLMsThe paper develops a budget-normalized control-window framework for single-neuron steering in language models. It seeks to predict when intervening on one neuron coherently controls a behavior—such as refusal or language routing gated by sparse feed-forward neurons—rather than collapsing the output.
-
CREDENCE: Claim Reduction for Decomposition & Enhanced Credibility -- Semantic Metrics and Convergence AnalysisCREDENCE: semantic metrics for claim decomposition in fact-checkingThe paper presents CREDENCE, an approach to decomposing compound sentences into atomic, verifiable claims for automated fact-checking. It introduces semantic metrics that avoid token-overlap measures, which underestimate quality for paraphrastic claims, and adds convergence and termination analysis.
-
Think Again or Think Longer? Selective Verification for Budget-Aware ReasoningSelective verification for budget-aware test-time reasoningThe paper studies budget-aware test-time reasoning as a deployment allocation problem, asking whether to 'think again' or 'think longer.' It proposes selective verification, since extra reasoning is not uniformly useful—it can repair failures, waste compute on correct answers, or introduce harmful changes.
-
CombEval: A Framework for Evaluating Combinatorial Counting in Large Language ModelsCombEval: evaluating combinatorial counting in LLMsThe paper presents CombEval, a dynamic benchmark for evaluating combinatorial counting in large language models. Each problem is expressed as a typed Cofola specification over entities, combinatorial objects, dependencies, and constraints, enabling controlled generation of natural-language counting problems.
-
AgentFinVQA: A Deployable Multi-Agent Pipeline for Auditable Financial Chart QAAgentFinVQA: an auditable multi-agent pipeline for financial chart QAThe paper presents AgentFinVQA, a deployable multi-agent pipeline for auditable financial chart question answering. It targets regulated settings where practitioners must know which answers to trust and cannot send client data to external model providers, unlike existing accuracy-focused, opaque chart-QA agents.
-
Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language ModelsManifold Bandits: Bayesian curriculum learning for LLM reasoningThe paper proposes Manifold Bandits, a Bayesian curriculum-learning method that samples training problems over the latent geometry of large language models. It targets reinforcement learning for LLM reasoning, where training efficiency depends heavily on how prompts are selected during optimization.
-
Beyond Uniform Forgetting: A Study of Sequential Direct Preference Optimization Across Preference SettingsSequential DPO and forgetting across preference settingsThe paper studies sequential Direct Preference Optimization (DPO) across different preference settings, examining how applying multiple alignment objectives one after another affects earlier ones. It looks beyond uniform forgetting to understand how later training stages interfere with previously learned preferences.
-
NRITYAM: Language Models Meet Art and Heritage of DanceNRITYAM: a benchmark for cultural comprehension of dance traditionsThe paper presents NRITYAM, a benchmark for evaluating how well language models comprehend culture in the context of global dance traditions. It addresses the gap that the global effectiveness of language models depends on a nuanced understanding of local socio-cultural contexts.
-
Midjourney MedicalMidjourney MedicalAn item on Midjourney Medical, a medical-focused offering from the image-generation company Midjourney. Accompanied by a demo video, it is presented as a new effort to apply generative imaging technology in the medical domain.
-
Anthropic、デザインツール「Claude Design」を強化 Codeとの双方向連携やCanvaなどへの出力をサポートAnthropic beefs up Claude Design with Code links and Canva exportAnthropic substantially expanded the beta features of its design tool Claude Design. It can now ingest multiple design systems and maintain them across projects, adds seamless two-way integration with Claude Code, and broadens export connectors to external tools such as Adobe and Canva.
-
Anthropic opens Seoul office and announces new partnerships across the Korean AI ecosystemAnthropic opens a Seoul office, announces new Korean AI partnershipsAnthropic opened a Seoul office and announced new partnerships across Korea's AI ecosystem, including enterprises, startups, and researchers building on Claude. It frames Korea as treating innovation and safety as two sides of the same coin. Specifics are per the announcement and unverified independently.
-
Freeing the Law with LOCUS: A Local Ordinance Corpus for the United StatesLOCUS releases a US local-ordinance corpus for legal AIProgress in legal AI depends on authoritative legal text at scale, yet US local ordinances—a consequential layer of American law—are largely missing from machine-readable corpora. The authors build LOCUS, a corpus of US local ordinances, to broaden legal-AI research data.
-
The Chandra-Gaia Catalog of Counterparts: Resolving ambiguous Gaia matches to X-ray sources in the Chandra Source Catalog using Machine LearningML resolves ambiguous Gaia matches to Chandra X-ray sourcesThe authors cross-match the Chandra Source Catalog (CSC v2.1) with Gaia Data Release 3 optical sources. Rather than purely spatial matching, they use source properties such as magnitudes, colors, and distances with machine learning to resolve ambiguous counterparts to X-ray sources.
-
Reference-Driven Multi-Speaker Audio Scene Generation from In-the-Wild PriorsReference-driven generation of multi-speaker audio scenesExisting multi-speaker dialogue systems bind speakers to utterances through structured supervision such as per-turn tags, multi-stream transcriptions, or learnable speaker embeddings. This work generates multi-speaker audio scenes by drawing on in-the-wild reference priors for more natural synthesis.
-
Enhancing Decision-Making with Large Language Models through Multi-Agent Fictitious PlayMulti-agent fictitious play boosts LLM decision-makingLLM-based multi-agent systems show promise on complex tasks by distributing subtasks across cooperative agents, but coordination remains hard. This work applies game-theoretic fictitious play so agents iteratively best-respond to one another, improving collective decision-making.
-
P-K-GCN: Physics-augmented Koopman-enhanced Graph Convolutional Network for Deep Spatiotemporal Super-resolutionP-K-GCN fuses physics and Koopman for spatiotemporal super-resolutionHigh-fidelity simulation of spatiotemporal dynamics is computationally prohibitive, demanding efficient super-resolution. P-K-GCN integrates physical constraints and Koopman operator theory into a graph convolutional network to reconstruct high-resolution spatiotemporal fields from coarse data.
-
Confidence is Not Reliability: Rethinking MC Dropout in Brain Tumour SegmentationConfidence is not reliability: rethinking MC dropout in tumour segmentationGlioma segmentation in multiparametric MRI is critical for treatment planning, and a model that fails silently on treatment-critical sub-regions is a patient-safety risk that overlap metrics miss. This work shows MC dropout confidence does not equal reliability, rethinking uncertainty estimation.
-
Quoting Charity MajorsSimon Willison quotes Charity Majors: AI demands more disciplineSimon Willison quotes Charity Majors arguing that as AI made code production effectively free and instant, turning code from curated to disposable, AI work demands more engineering discipline, not less. Presented as commentary; not independently verified.
-
NeSyCat Torch: A Differentiable Tensor Implementation of Categorical Semantics for Neurosymbolic LearningNeSyCat Torch unifies neurosymbolic semantics via category theoryNeurosymbolic semantics is fragmented: classical, fuzzy, probabilistic, and neural systems each define truth by their own rules. Extending ULLER, NeSyCat subsumes them under a single inductive definition of truth, delivered as a differentiable tensor implementation for neurosymbolic learning.
-
Beyond Algorithms: Conceptual Innovation in Medical Imaging AIBeyond algorithms: the case for conceptual innovation in medical imaging AIAI has driven rapid progress in medical imaging, yielding ever more sophisticated algorithms and steady benchmark gains. Yet this algorithm-centric trajectory reveals limits. This work argues for conceptual innovation beyond algorithms to achieve clinically meaningful advances in medical imaging AI.
-
Trade-offs in Medical LLM Adaptation: An Empirical Study in French QATrade-offs in medical LLM adaptation, studied on French QAAs LLMs are adapted to specialized domains and languages, the effectiveness of adaptation strategies remains unclear. This empirical study on French medical question answering analyzes the trade-offs of various domain-adaptation methods, clarifying gains and losses in performance and generality.
-
Detecting Hidden ML Training With Zero-Overhead TelemetryZero-overhead telemetry detects hidden ML training runsHardware-enabled monitoring of GPU workloads underpins many AI compute-governance proposals, but if developers can defeat monitoring, such schemes fail. This work evaluates detecting hidden ML training using zero-overhead telemetry, testing how robustly monitoring can support compute governance.
-
X+Slides: Benchmarking Audience-Conditioned Slide GenerationX+Slides benchmarks audience-conditioned slide generationAutomatically generating slide decks from documents is an important LLM application, but existing benchmarks mainly assess completeness and technical depth. X+Slides introduces a benchmark for audience-conditioned slide generation, evaluating how well decks adapt to their intended audience.