Developer Tools B

Showing 151–180 of 292
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Fixed-Point Reasoners: Stable and Adaptive Deep Looped Transformers
    Fixed-Point Reasoners: stabilizing deep looped Transformers (FPRM)
    Transformer
    The paper addresses the depth-induced signal propagation problem in looped Transformer architectures using pre-norm layers and residual scaling, and proposes FPRM, a looped Transformer model built on these architectural modifications.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    Analyzing and Encoding the Al-Mawrid Arabic-English Dictionary with the ISO Language Markup Framework and TEI Lex-0
    Encoding the Al-Mawrid Arabic-English dictionary with LMF and TEI Lex-0
    The paper presents a methodology to systematically digitize and encode the legacy print Al-Mawrid Arabic-English dictionary using the ISO Language Markup Framework and TEI Lex-0, addressing a gap in Arabic lexical infrastructure by producing a standardized computational lexicon.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    DRFLOW: A Deep Research Benchmark for Personalized Workflow Prediction
    DRFLOW: a deep research benchmark for personalized workflow prediction
    AI Agents Retrieval-Augmented Generation (RAG) Software Engineering
    The paper introduces DRFLOW, a benchmark for evaluating personalized workflow prediction in deep research systems, focusing on identifying concrete action-step workflows for enterprise tasks rather than generating reports or summaries.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Training & Fine-tuning extract
    Multi-Source Cybersecurity Logs: An ATT&CK-Labeled Dataset and SLM Evaluation
    ATT&CK-labeled multi-source security log dataset with SLM evaluation
    Fine-tuning Llama Machine Learning Neural Network Reinforcement Learning from Human Feedback (RLHF)
    The work builds a dataset of multi-source cybersecurity logs labeled with MITRE ATT&CK and evaluates small language models (SLMs) on it. Summary is title-based and neutral; details and figures are as presented by the source and not independently verified.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Developer Tools extract
    IUU+DB: Tracking Illegal, Unreported, and Unregulated Fishing, Seafood Fraud, and Labor Abuse through LLM-driven Information Extraction
    IUU+DB: LLM-driven extraction to track illegal fishing and related crimes
    Retrieval-Augmented Generation (RAG)
    The paper proposes the IUU+ concept extending illegal, unreported, and unregulated fishing to broader fisheries-related crimes, and IUU+DB, an LLM-driven information extraction system to quantify the frequency, geography, and actors of such incidents.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Developer Tools extract
    All Smoke, No Alarm: Oracle Signals in Agent-Authored Test Code
    Study finds agent-authored test code often lacks real verification logic
    AI Agents Claude OpenAI
    The paper examines test code generated by AI coding agents in open-source pull requests, arguing that test files lacking explicit assertions verify no behavior, so presence-based quality gates overestimate verification strength.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • NVIDIA Developer Blog · EN Infrastructure & Hardware extract
    Build On-Device AI Companions with the NVIDIA ACE Game Agent SDK and Unreal Engine 5 Plugins
    NVIDIA unveils ACE Game Agent SDK and UE5 plugins for on-device AI
    Deep Learning NVIDIA
    NVIDIA announced the ACE Game Agent SDK and Unreal Engine 5 plugins for developers to build on-device AI companions—AI agents that run locally on the device rather than in the cloud—for in-game characters. The export raw_excerpt was blocked (cookie/query string data), so this is summarized neutrally from the title and the NVIDIA developer blog framing; specific figures and performance claims are unverified.
    Read original (NVIDIA Developer Blog) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    ReAge3D: Re-Aging 3D Faces with View Consistency
    ReAge3D: identity-preserving, view-consistent 3D face re-aging
    Retrieval-Augmented Generation (RAG)
    The paper presents ReAge3D, a framework for identity-preserving 3D face re-aging that introduces a 2D diffusion-based re-aging model (DiffReaging) trained on synthetic image pairs and a center-out approach to maintain detail and view consistency.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Developer Tools extract
    Learning Cardiac Electrophysiology Digital Twins Through Agentic Discovery of Hybrid Structure
    Agentic discovery of hybrid structure for cardiac EP digital twins
    Deep Learning
    The paper proposes an agentic discovery method that identifies hybrid physics-neural model structures for personalized cardiac electrophysiology digital twins, aiming to reduce reliance on expert-prescribed architectures and improve cross-patient transfer.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Developer Tools extract
    Memory as a Wasting Asset: Pricing Flash Endurance for Embodied Agents, and the Limits of Doing So
    Pricing flash endurance as a wasting asset for embodied agents
    AI Agents
    A robot's flash endurance is a non-renewable stock: each persisted write spends one of a few thousand program/erase cycles and never refills. The paper frames flash endurance as a wasting asset, proposes pricing it for embodied agents, and examines the limits of doing so.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    Descriptor: Certus Caliber Classification Gunshot Dataset (C3GD)
    C3GD: a public field-collected gunshot muzzle-blast sound dataset
    Meta Reinforcement Learning
    The paper introduces the Certus Caliber Classification Gunshot Dataset (C3GD), a public dataset of firearm muzzle-blast sounds with over 8,000 field-collected data points from 28 firearms across 16 calibers, with detailed metadata.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Developer Tools extract
    Structural Role Injection in Handlebars-Templated LLM Prompts: Triple-Brace Interpolation, Delimiter Family, and the Limits of HTML Auto-Escaping
    Structural role injection in Handlebars-templated LLM prompts
    Claude GPT Llama Machine Learning Microsoft
    LLM apps build prompts from templates, with Handlebars the default in Microsoft Semantic Kernel. While double-brace expressions HTML-escape values, triple-brace interpolation inserts them raw. The paper studies structural role injection and the limits of HTML auto-escaping.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Developer Tools extract
    First Proof Second Batch
    Testing AI systems on ten research-level mathematics problems
    Neural Network
    This document reports testing several AI systems on ten research-level mathematics problems spanning broad fields that arose in the contributors' research, providing the problems, methodology, results, and links to human and AI solutions plus referee reports.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • Simon Willison's Weblog · EN New Model Releases extract
    datasette-tailscale 0.1a0
    Simon Willison releases datasette-tailscale, an experimental Tailscale plugin
    Neural Network
    Simon Willison released datasette-tailscale 0.1a0, a very experimental alpha plugin that runs a local Datasette server with a Tailscale sidecar so it is reachable inside your Tailnet via a chosen hostname. You launch it with an auth key and hostname. It relies on Python bindings for the experimental tailscale-rs library, and he filed an issue asking for a cleaner way to set up the proxy.
    Read original (Simon Willison's Weblog) ↗
  • arXiv cs.LG (Machine Learning) · EN Developer Tools extract
    Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning
    Learning fair Pareto-optimal policies in multi-objective RL
    Algorithms & Theory Meta Retrieval-Augmented Generation (RAG) Reinforcement Learning
    In multi-objective reinforcement learning, policies must balance optimality and equity across potentially conflicting objectives. The paper proposes learning fair, Pareto-optimal policies using generalized welfare functions.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    Querying an astronomical database using large language models: the ALeRCE text-to-SQL system
    A text-to-SQL system for querying the ALeRCE astronomical database
    Claude Gemini GPT Inference
    The paper develops an LLM-based text-to-SQL system using in-context learning, applied to the ALeRCE astronomical broker database, generating executable SQL from natural language and evaluated on a dataset of 110 NL/SQL pairs via step-by-step generation.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Developer Tools extract
    Deep Reinforcement Learning for Minimum Zero-Forcing Sets
    Deep reinforcement learning for minimum zero-forcing sets
    Reinforcement Learning
    The paper tackles the minimum zero-forcing set problem on undirected graphs, a coloring problem where an initial set's color propagates through the network, and proposes an adapted deep reinforcement learning framework to solve it.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Multimodal extract
    Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding
    Quality-aware self-distillation for GUI grounding in VLMs
    Computer Vision
    The paper proposes a quality-aware self-distillation method for GUI grounding, where vision-language models predict precise screen coordinates, addressing how naive on-policy self-distillation can degrade coordinate-token teacher signals.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    EAGG: Embodiment-Aligned Grasp Generation via Geometry-Aware Graph Conditioning
    EAGG: embodiment-aligned grasp generation via graph conditioning
    Fine-tuning Retrieval-Augmented Generation (RAG)
    The paper presents EAGG, an embodiment-aligned grasp generator that represents each end-effector with a topology-aware graph and embodiment-specific conditioning, aiming to generalize grasp generation across objects and diverse robot embodiments.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Training & Fine-tuning extract
    From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning
    From reasoning traces to reusable modules for compositional reasoning
    Fine-tuning Retrieval-Augmented Generation (RAG) Reinforcement Learning
    Post-training pipelines combining supervised fine-tuning with reinforcement learning are key to turning LLMs into robust reasoners. The paper studies compositional generalization in LM reasoning by converting reasoning traces into reusable modules.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Edge Flow: A Tractable and Predictive Continuous-Time Model for Gradient Descent at the Edge of Stability
    Edge Flow: a tractable continuous-time model for GD at the edge of stability
    Deep Learning
    Gradient descent in deep learning can operate at the edge of stability, where the loss Hessian's top eigenvalue hovers near the stability threshold. Classical tools fail there, so Edge Flow offers a tractable, predictive continuous-time model of this regime.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Infrastructure & Hardware extract
    Tensor-based second-order causal discovery
    Tensor-based second-order causal discovery (TSCD)
    Deep Learning
    To uncover causal dependencies among variables, the paper proposes TSCD, a tensor-based second-order causal discovery algorithm whose input is a tensor formed from covariance matrices of observational and interventional data, assuming linear structural equations.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    Volterra Generative Models
    Volterra generative models add memory to diffusion perturbations
    Deep Learning
    Score-based diffusion models use memoryless Brownian perturbations that yield tractable reverse-time dynamics. Volterra generative models introduce continuous-time perturbations with memory, generalizing diffusion-based generation.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Agentic AI-based Framework for Mitigating Premature Diagnostic Handoff and Silent Hallucination in Healthcare Applications
    A multi-agent framework against premature handoff and silent hallucination
    AI Agents Llama
    The paper proposes a multi-agent framework for healthcare that mitigates premature diagnostic handoff and silent clinical hallucinations, replacing LLM-as-a-judge routing with deterministic orchestration constraints and adding two safety mechanisms.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Security and Privacy Prompts in the Wild: What Users Ask LLMs and How LLMs Respond
    Security and privacy prompts in the wild: what users ask LLMs
    GPT Llama Retrieval-Augmented Generation (RAG) Reinforcement Learning
    The paper analyzes, in the wild, what users ask large language models about security and privacy and how the models respond, characterizing the questions, response patterns and associated concerns.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    PseudoBench: Measuring How Agentic Auto-Research Fuels Pseudoscience
    PseudoBench measures how agentic auto-research fuels pseudoscience
    AI Agents Deep Learning
    As LLM-based agents enter autonomous scientific research, resisting pseudoscience matters. PseudoBench is an adversarial benchmark measuring how such agents may rapidly generate plausible yet misleading studies that contaminate academic literature.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    When AI Says "I have been in similar situations": Synthetic Lived Experience in Peer-Like Caregiver Support
    Synthetic lived experience in AI peer-like caregiver support
    GPT Llama Neural Network
    Caregivers seek informational and emotional support in online communities where peers draw on personal narratives. As LLMs are designed as peer-like supporters, the paper examines the tension introduced when AI claims synthetic lived experience in caregiver support.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN Infrastructure & Hardware extract
    ConTex: Reformulating Counterfactual Generation For Time Series Forecasting
    ConTex reformulates counterfactual generation for time-series forecasting
    Deep Learning
    Decision-making with deep time-series forecasting needs not just accurate predictions but actionable insight, which current architectures lack. ConTex reformulates counterfactual generation to indicate how present conditions must change to shift a predicted outcome toward a desired future.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN Agents & Tool Use extract
    ProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM Agents
    ProvenanceGuard: source-aware factuality verification for MCP agents
    AI Agents Model Context Protocol (MCP) Software Engineering
    Tool-using LLM agents use the Model Context Protocol to answer from heterogeneous sources like search, APIs, databases and clinical records. ProvenanceGuard provides source-aware factuality verification to catch provenance-sensitive failure modes that standard metrics miss.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Training & Fine-tuning extract
    When English Isn't the Best Teacher: Source Language Effects in Cross-Lingual In-Context Learning
    Source-language effects in cross-lingual in-context learning
    Fine-tuning Neural Network Natural Language Processing (NLP)
    Cross-lingual transfer is well studied under supervised fine-tuning, where data and linguistic similarity drive quality. As the field shifts to few-shot in-context learning, this paper examines source-language effects and shows English is not always the best teacher.
    Read original (arXiv cs.CL (Computation and Language)) ↗