Developer Tools B

Showing 61–90 of 312
  • arXiv cs.CL (Computation and Language) · EN Developer Tools extract
    Segment-Level Mandarin Chinese Speech-Based Cognitive Impairment Detection via an Autoencoder with Contrastive Learning
    Mandarin speech-based cognitive impairment detection via autoencoder
    Reinforcement Learning Speech Processing
    The paper proposes a segment-level method for detecting cognitive impairment from Mandarin Chinese speech, using an autoencoder with contrastive learning. It targets the challenges of limited labeled data and cross-dataset variability in speech-based screening, where speech serves as a low-cost, non-invasive biomarker.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Multimodal extract
    Investigating Human-Model Discrepancies in Speech Quality Assessment via Acoustic and Prosodic Perturbations
    Human-model gaps in speech quality assessment under perturbations
    Speech Processing
    The paper investigates discrepancies between human judgments and MOS prediction models in speech quality assessment, using controlled acoustic and prosodic perturbations. It probes whether these models, widely used as proxy metrics in text-to-speech research, capture quality differences beyond acoustic fidelity.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Inference & Efficiency extract
    GEMS: Geometric Constraints Enable Multi-Semantic Superposition in LLMs
    GEMS: geometric constraints for multi-semantic activation steering
    Deep Learning Inference
    The paper introduces GEMS, which uses geometric constraints to enable superposing multiple semantic directions in LLM activation steering. It addresses the collapse that occurs when existing single-direction steering methods inject several semantic directions at once without constraints.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    Light-weight Pronunciation Assessment via Discrete Speech Token Surprisal
    Lightweight pronunciation assessment via speech token surprisal
    Inference Speech Processing
    The paper proposes a lightweight framework for automated pronunciation assessment based on discrete speech token surprisal, trained only on native speech resources. It operates unsupervised or with light calibration from a small set of scored utterances, avoiding costly labeled learner-error or non-native corpora.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • OpenAI Blog · EN Developer Tools extract
    Using AI to help physicians diagnose rare genetic diseases affecting children
    OpenAI reasoning model aids 18 new rare-disease diagnoses in children
    OpenAI
    Researchers used an OpenAI reasoning model to help physicians diagnose rare genetic diseases in children, identifying 18 new diagnoses among previously unsolved cases. The work suggests AI can support complex clinical reasoning and improve diagnosis of rare conditions.
    Read original (OpenAI Blog) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    The Almost Intelligent Revolution: Options for Scaling Up Deliberation and Empowering People with AI
    Scaling up democratic deliberation and empowering people with AI
    Embeddings Reinforcement Learning
    The paper discusses options for scaling up democratic deliberation and empowering people with AI as large language models become prominent in public discourse. It weighs opportunities against persistent concerns such as linguistic constraints, biases, and the sycophantic tendencies of LLMs, beyond what red teaming addresses.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    Large Language Models Do Not Always Need Readable Language
    LLMs don't always need human-readable language
    Neural Network
    The paper investigates whether semantic information can be encoded in compact, non-standard text that sacrifices human readability while remaining usable by models. It argues large language models do not always need human-readable language, especially when the intended reader is another model.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    Leverage Is Not Reach: A Control-Window Law for Single-Neuron Steering in Language Models
    A control-window law for single-neuron steering in LLMs
    Retrieval-Augmented Generation (RAG)
    The paper develops a budget-normalized control-window framework for single-neuron steering in language models. It seeks to predict when intervening on one neuron coherently controls a behavior—such as refusal or language routing gated by sparse feed-forward neurons—rather than collapsing the output.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    CREDENCE: Claim Reduction for Decomposition & Enhanced Credibility -- Semantic Metrics and Convergence Analysis
    CREDENCE: semantic metrics for claim decomposition in fact-checking
    Neural Network Reinforcement Learning
    The paper presents CREDENCE, an approach to decomposing compound sentences into atomic, verifiable claims for automated fact-checking. It introduces semantic metrics that avoid token-overlap measures, which underestimate quality for paraphrastic claims, and adds convergence and termination analysis.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Think Again or Think Longer? Selective Verification for Budget-Aware Reasoning
    Selective verification for budget-aware test-time reasoning
    Machine Learning Software Engineering
    The paper studies budget-aware test-time reasoning as a deployment allocation problem, asking whether to 'think again' or 'think longer.' It proposes selective verification, since extra reasoning is not uniformly useful—it can repair failures, waste compute on correct answers, or introduce harmful changes.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    CombEval: A Framework for Evaluating Combinatorial Counting in Large Language Models
    CombEval: evaluating combinatorial counting in LLMs
    Reinforcement Learning Software Engineering
    The paper presents CombEval, a dynamic benchmark for evaluating combinatorial counting in large language models. Each problem is expressed as a typed Cofola specification over entities, combinatorial objects, dependencies, and constraints, enabling controlled generation of natural-language counting problems.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Developer Tools extract
    AgentFinVQA: A Deployable Multi-Agent Pipeline for Auditable Financial Chart QA
    AgentFinVQA: an auditable multi-agent pipeline for financial chart QA
    AI Agents Gemini Neural Network Reinforcement Learning Software Engineering
    The paper presents AgentFinVQA, a deployable multi-agent pipeline for auditable financial chart question answering. It targets regulated settings where practitioners must know which answers to trust and cannot send client data to external model providers, unlike existing accuracy-focused, opaque chart-QA agents.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models
    Manifold Bandits: Bayesian curriculum learning for LLM reasoning
    Retrieval-Augmented Generation (RAG) Reinforcement Learning
    The paper proposes Manifold Bandits, a Bayesian curriculum-learning method that samples training problems over the latent geometry of large language models. It targets reinforcement learning for LLM reasoning, where training efficiency depends heavily on how prompts are selected during optimization.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Training & Fine-tuning extract
    Beyond Uniform Forgetting: A Study of Sequential Direct Preference Optimization Across Preference Settings
    Sequential DPO and forgetting across preference settings
    Llama Machine Learning Reinforcement Learning Reinforcement Learning from Human Feedback (RLHF)
    The paper studies sequential Direct Preference Optimization (DPO) across different preference settings, examining how applying multiple alignment objectives one after another affects earlier ones. It looks beyond uniform forgetting to understand how later training stages interfere with previously learned preferences.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Multimodal extract
    NRITYAM: Language Models Meet Art and Heritage of Dance
    NRITYAM: a benchmark for cultural comprehension of dance traditions
    Neural Network Reinforcement Learning Software Engineering
    The paper presents NRITYAM, a benchmark for evaluating how well language models comprehend culture in the context of global dance traditions. It addresses the gap that the global effectiveness of language models depends on a nuanced understanding of local socio-cultural contexts.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • Hacker News (Front Page) · EN Multimodal extract
    Midjourney Medical
    Midjourney Medical
    An item on Midjourney Medical, a medical-focused offering from the image-generation company Midjourney. Accompanied by a demo video, it is presented as a new effort to apply generative imaging technology in the medical domain.
    Read original (Hacker News (Front Page)) ↗
  • ITmedia AI+ · JA Developer Tools extract
    Anthropic、デザインツール「Claude Design」を強化 Codeとの双方向連携やCanvaなどへの出力をサポート
    Anthropic beefs up Claude Design with Code links and Canva export
    Anthropic Claude
    Anthropic substantially expanded the beta features of its design tool Claude Design. It can now ingest multiple design systems and maintain them across projects, adds seamless two-way integration with Claude Code, and broadens export connectors to external tools such as Adobe and Canva.
    Read original (ITmedia AI+) ↗
  • Anthropic News · EN Industry Adoption extract
    Anthropic opens Seoul office and announces new partnerships across the Korean AI ecosystem
    Anthropic opens a Seoul office, announces new Korean AI partnerships
    Anthropic Claude Neural Network Reinforcement Learning
    Anthropic opened a Seoul office and announced new partnerships across Korea's AI ecosystem, including enterprises, startups, and researchers building on Claude. It frames Korea as treating innovation and safety as two sides of the same coin. Specifics are per the announcement and unverified independently.
    Read original (Anthropic News) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States
    LOCUS releases a US local-ordinance corpus for legal AI
    Deep Learning Meta Retrieval-Augmented Generation (RAG) Reinforcement Learning
    Progress in legal AI depends on authoritative legal text at scale, yet US local ordinances—a consequential layer of American law—are largely missing from machine-readable corpora. The authors build LOCUS, a corpus of US local ordinances, to broaden legal-AI research data.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN Developer Tools extract
    The Chandra-Gaia Catalog of Counterparts: Resolving ambiguous Gaia matches to X-ray sources in the Chandra Source Catalog using Machine Learning
    ML resolves ambiguous Gaia matches to Chandra X-ray sources
    Machine Learning
    The authors cross-match the Chandra Source Catalog (CSC v2.1) with Gaia Data Release 3 optical sources. Rather than purely spatial matching, they use source properties such as magnitudes, colors, and distances with machine learning to resolve ambiguous counterparts to X-ray sources.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Reference-Driven Multi-Speaker Audio Scene Generation from In-the-Wild Priors
    Reference-driven generation of multi-speaker audio scenes
    Embeddings Retrieval-Augmented Generation (RAG) Reinforcement Learning Speech Processing
    Existing multi-speaker dialogue systems bind speakers to utterances through structured supervision such as per-turn tags, multi-stream transcriptions, or learnable speaker embeddings. This work generates multi-speaker audio scenes by drawing on in-the-wild reference priors for more natural synthesis.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Developer Tools extract
    Enhancing Decision-Making with Large Language Models through Multi-Agent Fictitious Play
    Multi-agent fictitious play boosts LLM decision-making
    AI Agents Neural Network Reinforcement Learning
    LLM-based multi-agent systems show promise on complex tasks by distributing subtasks across cooperative agents, but coordination remains hard. This work applies game-theoretic fictitious play so agents iteratively best-respond to one another, improving collective decision-making.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN Infrastructure & Hardware extract
    P-K-GCN: Physics-augmented Koopman-enhanced Graph Convolutional Network for Deep Spatiotemporal Super-resolution
    P-K-GCN fuses physics and Koopman for spatiotemporal super-resolution
    High-fidelity simulation of spatiotemporal dynamics is computationally prohibitive, demanding efficient super-resolution. P-K-GCN integrates physical constraints and Koopman operator theory into a graph convolutional network to reconstruct high-resolution spatiotemporal fields from coarse data.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Confidence is Not Reliability: Rethinking MC Dropout in Brain Tumour Segmentation
    Confidence is not reliability: rethinking MC dropout in tumour segmentation
    Neural Network Reinforcement Learning
    Glioma segmentation in multiparametric MRI is critical for treatment planning, and a model that fails silently on treatment-critical sub-regions is a patient-safety risk that overlap metrics miss. This work shows MC dropout confidence does not equal reliability, rethinking uncertainty estimation.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • Simon Willison's Weblog · EN Developer Tools extract
    Quoting Charity Majors
    Simon Willison quotes Charity Majors: AI demands more discipline
    Simon Willison quotes Charity Majors arguing that as AI made code production effectively free and instant, turning code from curated to disposable, AI work demands more engineering discipline, not less. Presented as commentary; not independently verified.
    Read original (Simon Willison's Weblog) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    NeSyCat Torch: A Differentiable Tensor Implementation of Categorical Semantics for Neurosymbolic Learning
    NeSyCat Torch unifies neurosymbolic semantics via category theory
    Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning
    Neurosymbolic semantics is fragmented: classical, fuzzy, probabilistic, and neural systems each define truth by their own rules. Extending ULLER, NeSyCat subsumes them under a single inductive definition of truth, delivered as a differentiable tensor implementation for neurosymbolic learning.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Beyond Algorithms: Conceptual Innovation in Medical Imaging AI
    Beyond algorithms: the case for conceptual innovation in medical imaging AI
    Algorithms & Theory Deep Learning Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning
    AI has driven rapid progress in medical imaging, yielding ever more sophisticated algorithms and steady benchmark gains. Yet this algorithm-centric trajectory reveals limits. This work argues for conceptual innovation beyond algorithms to achieve clinically meaningful advances in medical imaging AI.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    Trade-offs in Medical LLM Adaptation: An Empirical Study in French QA
    Trade-offs in medical LLM adaptation, studied on French QA
    Fine-tuning Reinforcement Learning Software Engineering
    As LLMs are adapted to specialized domains and languages, the effectiveness of adaptation strategies remains unclear. This empirical study on French medical question answering analyzes the trade-offs of various domain-adaptation methods, clarifying gains and losses in performance and generality.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Infrastructure & Hardware extract
    Detecting Hidden ML Training With Zero-Overhead Telemetry
    Zero-overhead telemetry detects hidden ML training runs
    Machine Learning Neural Network
    Hardware-enabled monitoring of GPU workloads underpins many AI compute-governance proposals, but if developers can defeat monitoring, such schemes fail. This work evaluates detecting hidden ML training using zero-overhead telemetry, testing how robustly monitoring can support compute governance.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    X+Slides: Benchmarking Audience-Conditioned Slide Generation
    X+Slides benchmarks audience-conditioned slide generation
    Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning
    Automatically generating slide decks from documents is an important LLM application, but existing benchmarks mainly assess completeness and technical depth. X+Slides introduces a benchmark for audience-conditioned slide generation, evaluating how well decks adapt to their intended audience.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗