Developer Tools B
Showing 151–180 of 292
-
Fixed-Point Reasoners: Stable and Adaptive Deep Looped TransformersFixed-Point Reasoners: stabilizing deep looped Transformers (FPRM)The paper addresses the depth-induced signal propagation problem in looped Transformer architectures using pre-norm layers and residual scaling, and proposes FPRM, a looped Transformer model built on these architectural modifications.
-
Analyzing and Encoding the Al-Mawrid Arabic-English Dictionary with the ISO Language Markup Framework and TEI Lex-0Encoding the Al-Mawrid Arabic-English dictionary with LMF and TEI Lex-0The paper presents a methodology to systematically digitize and encode the legacy print Al-Mawrid Arabic-English dictionary using the ISO Language Markup Framework and TEI Lex-0, addressing a gap in Arabic lexical infrastructure by producing a standardized computational lexicon.
-
DRFLOW: A Deep Research Benchmark for Personalized Workflow PredictionDRFLOW: a deep research benchmark for personalized workflow predictionThe paper introduces DRFLOW, a benchmark for evaluating personalized workflow prediction in deep research systems, focusing on identifying concrete action-step workflows for enterprise tasks rather than generating reports or summaries.
-
Multi-Source Cybersecurity Logs: An ATT&CK-Labeled Dataset and SLM EvaluationATT&CK-labeled multi-source security log dataset with SLM evaluationThe work builds a dataset of multi-source cybersecurity logs labeled with MITRE ATT&CK and evaluates small language models (SLMs) on it. Summary is title-based and neutral; details and figures are as presented by the source and not independently verified.
-
IUU+DB: Tracking Illegal, Unreported, and Unregulated Fishing, Seafood Fraud, and Labor Abuse through LLM-driven Information ExtractionIUU+DB: LLM-driven extraction to track illegal fishing and related crimesThe paper proposes the IUU+ concept extending illegal, unreported, and unregulated fishing to broader fisheries-related crimes, and IUU+DB, an LLM-driven information extraction system to quantify the frequency, geography, and actors of such incidents.
-
All Smoke, No Alarm: Oracle Signals in Agent-Authored Test CodeStudy finds agent-authored test code often lacks real verification logicThe paper examines test code generated by AI coding agents in open-source pull requests, arguing that test files lacking explicit assertions verify no behavior, so presence-based quality gates overestimate verification strength.
-
Build On-Device AI Companions with the NVIDIA ACE Game Agent SDK and Unreal Engine 5 PluginsNVIDIA unveils ACE Game Agent SDK and UE5 plugins for on-device AINVIDIA announced the ACE Game Agent SDK and Unreal Engine 5 plugins for developers to build on-device AI companions—AI agents that run locally on the device rather than in the cloud—for in-game characters. The export raw_excerpt was blocked (cookie/query string data), so this is summarized neutrally from the title and the NVIDIA developer blog framing; specific figures and performance claims are unverified.
-
ReAge3D: Re-Aging 3D Faces with View ConsistencyReAge3D: identity-preserving, view-consistent 3D face re-agingThe paper presents ReAge3D, a framework for identity-preserving 3D face re-aging that introduces a 2D diffusion-based re-aging model (DiffReaging) trained on synthetic image pairs and a center-out approach to maintain detail and view consistency.
-
Learning Cardiac Electrophysiology Digital Twins Through Agentic Discovery of Hybrid StructureAgentic discovery of hybrid structure for cardiac EP digital twinsThe paper proposes an agentic discovery method that identifies hybrid physics-neural model structures for personalized cardiac electrophysiology digital twins, aiming to reduce reliance on expert-prescribed architectures and improve cross-patient transfer.
-
Memory as a Wasting Asset: Pricing Flash Endurance for Embodied Agents, and the Limits of Doing SoPricing flash endurance as a wasting asset for embodied agentsA robot's flash endurance is a non-renewable stock: each persisted write spends one of a few thousand program/erase cycles and never refills. The paper frames flash endurance as a wasting asset, proposes pricing it for embodied agents, and examines the limits of doing so.
-
Descriptor: Certus Caliber Classification Gunshot Dataset (C3GD)C3GD: a public field-collected gunshot muzzle-blast sound datasetThe paper introduces the Certus Caliber Classification Gunshot Dataset (C3GD), a public dataset of firearm muzzle-blast sounds with over 8,000 field-collected data points from 28 firearms across 16 calibers, with detailed metadata.
-
Structural Role Injection in Handlebars-Templated LLM Prompts: Triple-Brace Interpolation, Delimiter Family, and the Limits of HTML Auto-EscapingStructural role injection in Handlebars-templated LLM promptsLLM apps build prompts from templates, with Handlebars the default in Microsoft Semantic Kernel. While double-brace expressions HTML-escape values, triple-brace interpolation inserts them raw. The paper studies structural role injection and the limits of HTML auto-escaping.
-
First Proof Second BatchTesting AI systems on ten research-level mathematics problemsThis document reports testing several AI systems on ten research-level mathematics problems spanning broad fields that arose in the contributors' research, providing the problems, methodology, results, and links to human and AI solutions plus referee reports.
-
datasette-tailscale 0.1a0Simon Willison releases datasette-tailscale, an experimental Tailscale pluginSimon Willison released datasette-tailscale 0.1a0, a very experimental alpha plugin that runs a local Datasette server with a Tailscale sidecar so it is reachable inside your Tailnet via a chosen hostname. You launch it with an auth key and hostname. It relies on Python bindings for the experimental tailscale-rs library, and he filed an issue asking for a cleaner way to set up the proxy.
-
Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement LearningLearning fair Pareto-optimal policies in multi-objective RLIn multi-objective reinforcement learning, policies must balance optimality and equity across potentially conflicting objectives. The paper proposes learning fair, Pareto-optimal policies using generalized welfare functions.
-
Querying an astronomical database using large language models: the ALeRCE text-to-SQL systemA text-to-SQL system for querying the ALeRCE astronomical databaseThe paper develops an LLM-based text-to-SQL system using in-context learning, applied to the ALeRCE astronomical broker database, generating executable SQL from natural language and evaluated on a dataset of 110 NL/SQL pairs via step-by-step generation.
-
Deep Reinforcement Learning for Minimum Zero-Forcing SetsDeep reinforcement learning for minimum zero-forcing setsThe paper tackles the minimum zero-forcing set problem on undirected graphs, a coloring problem where an initial set's color propagates through the network, and proposes an adapted deep reinforcement learning framework to solve it.
-
Trust the Right Teacher: Quality-Aware Self-Distillation for GUI GroundingQuality-aware self-distillation for GUI grounding in VLMsThe paper proposes a quality-aware self-distillation method for GUI grounding, where vision-language models predict precise screen coordinates, addressing how naive on-policy self-distillation can degrade coordinate-token teacher signals.
-
EAGG: Embodiment-Aligned Grasp Generation via Geometry-Aware Graph ConditioningEAGG: embodiment-aligned grasp generation via graph conditioningThe paper presents EAGG, an embodiment-aligned grasp generator that represents each end-effector with a topology-aware graph and embodiment-specific conditioning, aiming to generalize grasp generation across objects and diverse robot embodiments.
-
From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model ReasoningFrom reasoning traces to reusable modules for compositional reasoningPost-training pipelines combining supervised fine-tuning with reinforcement learning are key to turning LLMs into robust reasoners. The paper studies compositional generalization in LM reasoning by converting reasoning traces into reusable modules.
-
Edge Flow: A Tractable and Predictive Continuous-Time Model for Gradient Descent at the Edge of StabilityEdge Flow: a tractable continuous-time model for GD at the edge of stabilityGradient descent in deep learning can operate at the edge of stability, where the loss Hessian's top eigenvalue hovers near the stability threshold. Classical tools fail there, so Edge Flow offers a tractable, predictive continuous-time model of this regime.
-
Tensor-based second-order causal discoveryTensor-based second-order causal discovery (TSCD)To uncover causal dependencies among variables, the paper proposes TSCD, a tensor-based second-order causal discovery algorithm whose input is a tensor formed from covariance matrices of observational and interventional data, assuming linear structural equations.
-
Volterra Generative ModelsVolterra generative models add memory to diffusion perturbationsScore-based diffusion models use memoryless Brownian perturbations that yield tractable reverse-time dynamics. Volterra generative models introduce continuous-time perturbations with memory, generalizing diffusion-based generation.
-
Agentic AI-based Framework for Mitigating Premature Diagnostic Handoff and Silent Hallucination in Healthcare ApplicationsA multi-agent framework against premature handoff and silent hallucinationThe paper proposes a multi-agent framework for healthcare that mitigates premature diagnostic handoff and silent clinical hallucinations, replacing LLM-as-a-judge routing with deterministic orchestration constraints and adding two safety mechanisms.
-
Security and Privacy Prompts in the Wild: What Users Ask LLMs and How LLMs RespondSecurity and privacy prompts in the wild: what users ask LLMsThe paper analyzes, in the wild, what users ask large language models about security and privacy and how the models respond, characterizing the questions, response patterns and associated concerns.
-
PseudoBench: Measuring How Agentic Auto-Research Fuels PseudosciencePseudoBench measures how agentic auto-research fuels pseudoscienceAs LLM-based agents enter autonomous scientific research, resisting pseudoscience matters. PseudoBench is an adversarial benchmark measuring how such agents may rapidly generate plausible yet misleading studies that contaminate academic literature.
-
When AI Says "I have been in similar situations": Synthetic Lived Experience in Peer-Like Caregiver SupportSynthetic lived experience in AI peer-like caregiver supportCaregivers seek informational and emotional support in online communities where peers draw on personal narratives. As LLMs are designed as peer-like supporters, the paper examines the tension introduced when AI claims synthetic lived experience in caregiver support.
-
ConTex: Reformulating Counterfactual Generation For Time Series ForecastingConTex reformulates counterfactual generation for time-series forecastingDecision-making with deep time-series forecasting needs not just accurate predictions but actionable insight, which current architectures lack. ConTex reformulates counterfactual generation to indicate how present conditions must change to shift a predicted outcome toward a desired future.
-
ProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM AgentsProvenanceGuard: source-aware factuality verification for MCP agentsTool-using LLM agents use the Model Context Protocol to answer from heterogeneous sources like search, APIs, databases and clinical records. ProvenanceGuard provides source-aware factuality verification to catch provenance-sensitive failure modes that standard metrics miss.
-
When English Isn't the Best Teacher: Source Language Effects in Cross-Lingual In-Context LearningSource-language effects in cross-lingual in-context learningCross-lingual transfer is well studied under supervised fine-tuning, where data and linguistic similarity drive quality. As the field shifts to few-shot in-context learning, this paper examines source-language effects and shows English is not always the best teacher.