New Model Releases A

Showing 121–150 of 261
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    RubricsTree: Scalable and Evolving Open-Ended Evaluation of Personal Health Agents across Health Memory and Medical Skills
    RubricsTree: scalable open-ended evaluation of personal health agents
    AI Agents Gemini GPT Meta Neural Network
    LLM personal health agents using sensor metrics promise to ease healthcare disparities, but an open-ended evaluation bottleneck limits clinical deployment. RubricsTree offers scalable, evolving open-ended evaluation across health memory and medical skills.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Learning from the Self-future: On-policy Self-distillation for dLLMs
    On-policy self-distillation explored for diffusion LLMs
    Deep Learning Fine-tuning Reinforcement Learning Software Engineering
    On-policy self-distillation (OPSD) helps post-training of LLMs but is unexplored for diffusion LLMs (dLLMs). Existing OPSD methods are autoregressive-centric, injecting privileged information via left-to-right prefix conditioning; this work studies self-distillation suited to dLLMs.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    The Stanford EDGAR Filings Dataset: Reconstructing U.S. Corporate and Financial Disclosures into Layout-Faithful and Token-Efficient Pretraining Data
    SEFD: an open, layout-faithful reconstruction of SEC filings for LLMs
    Reinforcement Learning
    The paper introduces the Stanford EDGAR Filings Dataset (SEFD), an open reconstruction of SEC filings into layout-faithful MultiMarkdown, providing audited financial disclosures as token-efficient pretraining and evaluation data for financial language modeling.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    DRFLOW: A Deep Research Benchmark for Personalized Workflow Prediction
    DRFLOW: a deep research benchmark for personalized workflow prediction
    AI Agents Retrieval-Augmented Generation (RAG) Software Engineering
    The paper introduces DRFLOW, a benchmark for evaluating personalized workflow prediction in deep research systems, focusing on identifying concrete action-step workflows for enterprise tasks rather than generating reports or summaries.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    Kolmogorov Regression for Robust Diffusion Policies
    Kolmogorov regression yields robust diffusion policies
    Inference Neural Network Reinforcement Learning
    Finite-dimensional diffusion policies suffer temporal drift from discretization that degrades long-horizon performance. The paper introduces a backward Kolmogorov equation that lifts diffusion policies into a Cameron-Martin space to make them more robust.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    A Diffusion Approximation for Temporal-Difference Learning with Linear Features under Markovian Noise
    A diffusion approximation for TD learning under Markovian noise
    The classical continuous-time description of temporal-difference learning with linear features is an ODE capturing asymptotic mean dynamics but neglecting stochasticity. This work provides a diffusion approximation for TD learning under Markovian noise to capture those fluctuations.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    ReAge3D: Re-Aging 3D Faces with View Consistency
    ReAge3D: identity-preserving, view-consistent 3D face re-aging
    Retrieval-Augmented Generation (RAG)
    The paper presents ReAge3D, a framework for identity-preserving 3D face re-aging that introduces a 2D diffusion-based re-aging model (DiffReaging) trained on synthetic image pairs and a center-out approach to maintain detail and view consistency.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Your AI Travel Agent Would Book You a Bullfight: An Agentic Benchmark for Implicit Animal Welfare in Frontier AI Models
    An agentic benchmark for implicit animal welfare in frontier AI
    AI Agents Claude DeepSeek Gemini GPT
    AI agents are shifting from advisors to actors that book travel and run procurement. Existing animal-welfare benchmarks grade only text answers, so this work introduces an agentic benchmark testing whether implicit animal-welfare reasoning transfers to agent actions in frontier models.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    Descriptor: Certus Caliber Classification Gunshot Dataset (C3GD)
    C3GD: a public field-collected gunshot muzzle-blast sound dataset
    Meta Reinforcement Learning
    The paper introduces the Certus Caliber Classification Gunshot Dataset (C3GD), a public dataset of firearm muzzle-blast sounds with over 8,000 field-collected data points from 28 firearms across 16 calibers, with detailed metadata.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    Knowledge Reutilization in Meta-Reinforcement Learning
    A meta-knowledge reutilization framework for meta-RL across agents
    AI Agents Inference Meta Reinforcement Learning
    The paper proposes a meta-knowledge reutilization framework for meta-reinforcement learning that learns task-level knowledge on a dynamics-simplified agent and transfers it to heterogeneous agents, using a Bayesian non-parametric prior to organize latent task modes.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Towards Understanding and Measuring COGNITIVE ATROPHY in LLM Behaviour
    Formalizing 'cognitive atrophy' as a process-level measure of LLM behaviour
    Neural Network
    The paper formalizes 'cognitive atrophy,' a process-level behavioural measure of AI-mediated mental-health support, capturing whether interactions help users keep reflecting, coping, and deciding, a dimension distinct from safety and static response quality.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Unintended Effects of Geographic Conditioning in Large Language Models
    Unintended regional biases from geographic conditioning in LLMs
    Claude Llama Meta Neural Network Reinforcement Learning
    Conversational AI localizes responses using user metadata, yet the regional biases this hidden context introduces remain poorly understood. The paper analyzes the unintended effects of geographic conditioning on large language model outputs.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Developer Tools extract
    Structural Role Injection in Handlebars-Templated LLM Prompts: Triple-Brace Interpolation, Delimiter Family, and the Limits of HTML Auto-Escaping
    Structural role injection in Handlebars-templated LLM prompts
    Claude GPT Llama Machine Learning Microsoft
    LLM apps build prompts from templates, with Handlebars the default in Microsoft Semantic Kernel. While double-brace expressions HTML-escape values, triple-brace interpolation inserts them raw. The paper studies structural role injection and the limits of HTML auto-escaping.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • Simon Willison's Weblog · EN New Model Releases extract
    datasette-tailscale 0.1a0
    Simon Willison releases datasette-tailscale, an experimental Tailscale plugin
    Neural Network
    Simon Willison released datasette-tailscale 0.1a0, a very experimental alpha plugin that runs a local Datasette server with a Tailscale sidecar so it is reachable inside your Tailnet via a chosen hostname. You launch it with an auth key and hostname. It relies on Python bindings for the experimental tailscale-rs library, and he filed an issue asking for a cleaner way to set up the proxy.
    Read original (Simon Willison's Weblog) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    Querying an astronomical database using large language models: the ALeRCE text-to-SQL system
    A text-to-SQL system for querying the ALeRCE astronomical database
    Claude Gemini GPT Inference
    The paper develops an LLM-based text-to-SQL system using in-context learning, applied to the ALeRCE astronomical broker database, generating executable SQL from natural language and evaluated on a dataset of 110 NL/SQL pairs via step-by-step generation.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    HistoRAG: Embedding Historical Methodology in Retrieval-Augmented Generation Through Critical Technical Practice
    HistoRAG embeds historical methodology into RAG via critical practice
    Embeddings Retrieval-Augmented Generation (RAG) Reinforcement Learning Software Engineering
    RAG grounds model outputs in external evidence, but its dominant evaluations and defaults are oriented toward factual question answering. HistoRAG embeds historical methodology into retrieval-augmented generation through critical technical practice for interpretive historical studies.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    Volterra Generative Models
    Volterra generative models add memory to diffusion perturbations
    Deep Learning
    Score-based diffusion models use memoryless Brownian perturbations that yield tractable reverse-time dynamics. Volterra generative models introduce continuous-time perturbations with memory, generalizing diffusion-based generation.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment
    NoiseTilt injects reward gradients via the noise term in diffusion
    Inference
    NoiseTilt (NTRK) is a reward-guided diffusion sampler that injects reward gradients through the noise term, leaving the score kernel unchanged and needing only a single sample per step, improving reward alignment of pretrained diffusion models.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Security and Privacy Prompts in the Wild: What Users Ask LLMs and How LLMs Respond
    Security and privacy prompts in the wild: what users ask LLMs
    GPT Llama Retrieval-Augmented Generation (RAG) Reinforcement Learning
    The paper analyzes, in the wild, what users ask large language models about security and privacy and how the models respond, characterizing the questions, response patterns and associated concerns.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    When AI Says "I have been in similar situations": Synthetic Lived Experience in Peer-Like Caregiver Support
    Synthetic lived experience in AI peer-like caregiver support
    GPT Llama Neural Network
    Caregivers seek informational and emotional support in online communities where peers draw on personal narratives. As LLMs are designed as peer-like supporters, the paper examines the tension introduced when AI claims synthetic lived experience in caregiver support.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    Compositional Skill Routing for LLM Agents: Decompose, Retrieve, and Compose
    Compositional skill routing for LLM agents: decompose, retrieve, compose
    AI Agents Model Context Protocol (MCP) Neural Network Reinforcement Learning
    LLM agents rely on reusable tool specifications (skills), but real tasks require composing multiple skills. The paper formalizes compositional skill routing: decomposing a complex query into atomic sub-tasks, retrieving relevant skills, and composing them.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling
    LoopCoder-v2: loop once for efficient test-time compute scaling
    Deep Learning Software Engineering Transformer
    Looped transformers scale latent computation by repeating shared blocks, but sequential looping raises latency and KV-cache memory with loop count. Building on parallel loop transformers, LoopCoder-v2 makes loop count a practical knob for efficient test-time computation scaling.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    Recursive Scaling in Masked Diffusion Models
    Recursive scaling in masked diffusion models
    Deep Learning Inference Transformer
    Masked diffusion models (MDMs) have recently emerged as a generative approach. The paper investigates recursive scaling in MDMs, offering insights into their behavior and efficiency.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Reading between the Lines: Leveraging Large Language Models for Global Dementia and Depression Assessment from Clinical Interviews
    Using LLMs to assess dementia and depression from clinical interviews
    Mistral Retrieval-Augmented Generation (RAG) Reinforcement Learning Speech Processing
    Dementia and depression are the most prevalent geriatric neuropsychiatric disorders, with overlapping symptoms complicating diagnosis. The study investigates open-weights LLMs for predicting dementia and depression severity from speech collected during clinical interviews.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    Fast Nonparametric Conditional Independence Testing via Two-Stage Regression
    Fast nonparametric conditional independence testing via two-stage regression
    Algorithms & Theory Reinforcement Learning from Human Feedback (RLHF)
    Conditional independence testing is fundamental to statistics and causal inference. The paper proposes a fast nonparametric conditional independence test based on two-stage regression, aiming to improve computational efficiency and power.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    LLM Consumer Behavior Theory: Foundations of a Novel Research Field
    LLM Consumer Behavior Theory: a new field for agentic markets
    AI Agents Natural Language Processing (NLP) Retrieval-Augmented Generation (RAG)
    The paper introduces LLM Consumer Behavior Theory, a proposed field analyzing consumer behavior in agentic markets where LLMs make consumption decisions on behalf of users, drawing on classical and behavioral economics alongside NLP.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Funding & M&A extract
    C2FL: Clustered Continual Federated Learning under Spatial and Temporal Drift
    C2FL: clustered continual federated learning under drift
    Machine Learning Retrieval-Augmented Generation (RAG)
    Collective adaptive systems let nodes learn from locally sensed data, but privacy-sensitive data and node mobility hinder scaling. C2FL proposes clustered continual federated learning that handles spatial and temporal drift.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    VoidPadding: Let [VOID] Handle Padding in Masked Diffusion Language Models so that [EOS] Can Focus on Semantic Termination
    VoidPadding lets [VOID] handle padding so [EOS] focuses on termination
    Deep Learning Inference Retrieval-Augmented Generation (RAG) Reinforcement Learning
    In masked diffusion language models, padding and semantic termination roles get entangled. VoidPadding introduces a [VOID] token to handle padding so that [EOS] can focus on signaling semantic termination, improving generation behavior.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    Recover Semantics First, Generate Better: Improved Latent Modeling for 3D MRI Reconstruction and Cross-Contrast Synthesis
    Improved latent modeling for 3D MRI reconstruction and synthesis
    The paper proposes an improved latent modeling approach for 3D MRI reconstruction and cross-contrast synthesis, addressing the heavy computational cost of large 3D volumes by recovering semantics first to better infer absent MRI contrasts.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Fine-tuning LLMs for Passive Depression Severity Estimation from AI Mental Health Dialogue
    Fine-tuning LLMs for passive depression severity from AI dialogue
    Claude Fine-tuning Neural Network Reinforcement Learning
    The paper fine-tunes LLMs for passive estimation of depression severity from AI mental-health dialogue, exploring how conversational signals can indicate severity. Figures and efficacy are as reported by the source and not independently verified.
    Read original (arXiv cs.CL (Computation and Language)) ↗