Developer Tools B

Showing 1–30 of 315
  • ITmedia AI+ · JA New Model Releases extract
    画面操作を“録画”→AIが作業代行 Codexに新機能「Record & Replay」
    OpenAI adds 'Record & Replay' to Codex to automate recorded UI steps
    OpenAI
    OpenAI has added a new 'Record & Replay' feature to its Codex coding agent. Users record on-screen operations, and the AI then reproduces those steps to carry out the task automatically, according to ITmedia.
    Read original (ITmedia AI+) ↗
  • Simon Willison's Weblog · EN New Model Releases extract
    Datasette Apps: Host custom HTML applications inside Datasette
    Datasette Apps lets you host custom HTML apps inside Datasette
    Machine Learning Neural Network
    Simon Willison introduced Datasette Apps, letting developers host custom HTML/JS applications inside a Datasette instance. The apps can read Datasette's databases, enabling lightweight, data-backed web apps served directly from the data exploration tool itself.
    Read original (Simon Willison's Weblog) ↗
  • arXiv cs.LG (Machine Learning) · EN Infrastructure & Hardware extract
    Optimal Deterministic Multicalibration and Omniprediction
    A deterministic algorithm achieving optimal multicalibration
    Machine Learning
    A minimax-optimal multicalibration algorithm that outputs a deterministic predictor, resolving the open question of whether randomization is needed for optimal sample complexity. The result is extended to deterministic predictors satisfying outcome indistinguishability and omniprediction.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    Predictability as a Fine-Grained Measure for Privacy
    Privacy via predictability, a fine-grained privacy measure
    The paper introduces 'privacy via predictability,' a fine-grained privacy framework that explicitly incorporates an attacker's core prior knowledge. It aims to ease the costly privacy-accuracy tradeoff imposed by the worst-case guarantees of differential privacy.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents
    LedgerAgent: structured state for policy-adherent tool-calling agents
    AI Agents Inference Retrieval-Augmented Generation (RAG)
    Policy-adherent tool-calling agents in customer-service domains must track task state across turns while following rules. LedgerAgent introduces structured state to help such agents stay consistent and policy-compliant.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    SARLO-80: Worldwide Slant SAR Language Optic Dataset 80cm
    SARLO-80: a worldwide 80cm slant SAR-optical dataset
    Deep Learning Reinforcement Learning
    Multimodal foundation models have advanced rapidly thanks to large optical benchmarks, but comparable SAR resources are scarce. SARLO-80 provides a worldwide slant-range SAR and optical dataset at 80cm resolution to fill this gap.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Developer Tools extract
    Sovereign Execution Brokers: Enforcing Certificate-Bound Authority in Agentic Control Planes
    Sovereign Execution Brokers for agentic control planes
    AI Agents Neural Network
    Autonomous agents are increasingly wired into cloud, deployment, and data-control workflows, straining production security. This work proposes sovereign execution brokers that enforce certificate-bound authority within agentic control planes.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages
    Multi-LCB: extending LiveCodeBench to multiple programming languages
    Reinforcement Learning Software Engineering
    LiveCodeBench has become a widely adopted benchmark for evaluating large language models on code. Multi-LCB extends it to multiple programming languages to assess multilingual code generation.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    What Do Safety-Aligned LLMs Learn From Mixed Compliance Demonstrations?
    What safety-aligned LLMs learn from mixed compliance demonstrations
    In-context demonstrations can jailbreak language models, but it has been unclear what safety-aligned models learn when demonstrations mix compliant and non-compliant behavior. This work analyzes that learning behavior.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Entropy Estimation in Multi-Qutrit Systems via Variational and Classical Neural Networks
    Estimating entropy in multi-qutrit systems with VQAs and CNNs
    Algorithms & Theory Neural Network Software Engineering
    The paper presents a systematic study of von Neumann entropy estimation in multi-qutrit quantum systems, comparing variational quantum algorithms with classical convolutional neural networks on an ideal noise-free simulator for systems of up to three qutrits.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Contagion Networks: Evaluator Bias Propagation in Multi-Agent LLM Systems
    Contagion Networks: evaluator bias propagation in multi-agent LLMs
    AI Agents DeepSeek Reinforcement Learning
    When large language models act as evaluators in multi-agent systems, their systematic evaluation biases can spread through the system. This work analyzes how such evaluator bias propagates across agents.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Beyond Global Replanning: Hierarchical Recovery for Cross-Device Agent Systems
    Hierarchical recovery for cross-device agent systems
    AI Agents Neural Network Reinforcement Learning
    The paper proposes a hierarchical recovery mechanism for cross-device agent systems, moving beyond coarse-grained global replanning. It targets real-world computer-use tasks that span multiple applications and devices and must coordinate heterogeneous environments under dynamic runtime failures.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    Optimal Order of Multi-Agent and General Many-Body Systems
    Optimal order of multi-agent and general many-body systems
    AI Agents Retrieval-Augmented Generation (RAG)
    This paper develops a general framework for analyzing multi-agent systems with feedback loops between agents, as well as general many-body systems, and characterizes their optimal order.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Training & Fine-tuning extract
    Your Mouse and Eyes Secretly Leak Your Preference: LLM Alignment using Implicit Feedback from Users
    Aligning LLMs with implicit user feedback from mouse and gaze
    Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning Reinforcement Learning from Human Feedback (RLHF)
    The paper proposes aligning large language models using implicit user signals—such as mouse and eye movements—instead of explicit human feedback. It addresses the limitation that users rarely provide explicit ratings, which makes high-quality preference data scarce for reward modeling.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • OpenAI Blog · EN New Model Releases extract
    New usage analytics and updated spend controls for enterprises
    OpenAI adds usage analytics and spend controls to ChatGPT Enterprise
    GPT OpenAI
    OpenAI introduced new usage analytics and updated spend controls for ChatGPT Enterprise, helping organizations track and manage AI costs while scaling with confidence. Admins gain visibility into per-team consumption and can set limits to optimize spend.
    Read original (OpenAI Blog) ↗
  • arXiv cs.LG (Machine Learning) · EN Inference & Efficiency extract
    Marginal Advantage Accumulation for Memory-Driven Agent Self-Evolution
    Marginal advantage accumulation for self-evolving memory agents
    The paper proposes marginal advantage accumulation, a cross-batch, operation-level mechanism for memory-driven agent self-evolution. It aims to distinguish stably effective memory operations from accidental hits, addressing contradictory feedback that the same operation can receive across different batches in trace distillation.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Inference & Efficiency extract
    UltraQuant: 4-bit KV Caching for Context-Heavy Agents
    UltraQuant: 4-bit KV caching for context-heavy agents
    AI Agents Inference Quantization
    Context-heavy agents put unusual pressure on the key-value cache as long prefixes are reused across calls. UltraQuant applies 4-bit quantization to compress the KV cache while preserving quality.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Analyzing Defensive Misdirection Against Model-Guided Automated Attacks on Agentic AI Systems
    Analyzing defensive misdirection against attacks on agentic AI
    AI Agents Reinforcement Learning Speech Processing
    Agentic AI systems increasingly rely on language-model components to interpret instructions, exposing them to attacks. This paper analyzes defensive misdirection as a countermeasure against model-guided automated attacks.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Fisher-Geometric Sharpness and the Implicit Bias of SGD toward Flat Minima
    Fisher-geometric sharpness and SGD's implicit bias to flat minima
    Deep Learning Neural Network
    The paper introduces a Fisher-geometric notion of sharpness to study the implicit bias of SGD toward flat minima. It addresses the fact that standard Euclidean flatness measures, such as the trace or maximum eigenvalue of the loss Hessian, are not invariant under reparametrizations that preserve the network function.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Agentic Symbolic Search: Characterizing PDEs Beyond Hand-crafted Expressions, Meshes, and Neural Networks
    Agentic symbolic search for characterizing PDE solutions
    Neural Network
    The paper proposes agentic symbolic search, an approach to characterize partial differential equation solutions through mathematical structures rather than tables of computed values. It targets the structural understanding that neither numerical simulation nor neural networks produce directly, traditionally derived by hand.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Developer Tools extract
    Repurposing a Speech Classifier for Guided Diffusion-Based Speech Generation
    Repurposing a speech classifier for guided diffusion speech generation
    Speech Processing
    Classifier guidance controls diffusion generation using a noise-conditioned classifier. This work repurposes an existing speech classifier to guide diffusion-based speech generation.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Infrastructure & Hardware extract
    SSH-Net: A Deep Neural Network for Predicting Failure Time Distribution Functions under Competing Risks with Application to GPU Data
    SSH-Net: predicting failure-time distributions under competing risks
    Data Mining Neural Network
    The paper proposes SSH-Net, a deep neural network for predicting failure-time distribution functions under competing risks. It targets time-to-event modeling in complex engineering settings and is demonstrated on GPU failure data, building on the flexibility of neural networks for competing-risk prediction.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Developer Tools extract
    Evolutionary Two-Stage Hyperparameter Optimization Strategies for Physics-Informed Neural Networks
    Evolutionary two-stage hyperparameter optimization for PINNs
    Algorithms & Theory Deep Learning Embeddings Neural Network
    The paper proposes evolutionary two-stage hyperparameter optimization strategies for physics-informed neural networks (PINNs). It targets PINNs' unstable convergence, training plateaus, and strong sensitivity to architectural and optimization hyperparameters arising from their highly non-convex training.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Developer Tools extract
    Interpretable Sperm Morphology Classification via Attention-Guided Deep Learning
    Interpretable sperm morphology classification via attention-guided deep learning
    Deep Learning Neural Network
    Male infertility is a major cause of couple infertility and is often linked to abnormal sperm morphology. This work uses attention-guided deep learning for interpretable sperm morphology classification.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Multi-View Decompilation for LLM-Based Malware Classification
    Multi-view decompilation for LLM-based malware classification
    Neural Network Retrieval-Augmented Generation (RAG)
    Malware analysts often inspect compiled binaries through decompiled pseudo-C when source code is unavailable. This work uses multi-view decompilation to improve LLM-based malware classification.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    Neural network surrogates with uncertainty quantification for inverse problems in partial differential equations
    NN surrogates with uncertainty quantification for PDE inverse problems
    Inference Neural Network Reinforcement Learning
    The paper develops neural network surrogates with uncertainty quantification for inverse problems in partial differential equations. It targets the inference of unknown model parameters from noisy or incomplete observations, where traditional numerical methods are costly, particularly in Bayesian settings.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Infrastructure & Hardware extract
    On the Redundancy of Timestep Embeddings in Diffusion Models
    Are timestep embeddings redundant in diffusion models?
    Embeddings Transformer
    The paper challenges the necessity of explicit timestep embeddings in diffusion models, which are typically used to modulate denoising across noise scales. Through empirical analysis of U-Net and Diffusion Transformer architectures, together with theoretical arguments, it examines whether these temporal signals are redundant.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Multimodal extract
    Towards Modality-imbalanced Federated Graph Learning: A Data Synthesis-based Approach
    Tackling modality imbalance in federated graph learning via synthesis
    The paper addresses modality imbalance in multimodal federated graph learning with a data-synthesis-based approach. It targets two granularities of imbalance—client-level, where some clients lack entire modalities, and node-level, where individual nodes have missing modalities.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    CRAX: Fast Safe Reinforcement Learning Benchmarking
    CRAX: fast benchmarking for safe reinforcement learning
    AI Agents Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning Robotics
    Safety is a core concern when deploying reinforcement learning agents in real-world domains. CRAX provides a framework for fast benchmarking of safe reinforcement learning methods.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Judging to Improve: A De-biased VLM-as-3D-Judge Protocol for Single-Image 3D Generation
    Using a de-biased VLM 3D judge to improve single-image 3D generation
    Reinforcement Learning Software Engineering
    The paper presents a de-biased VLM-as-3D-judge protocol for single-image 3D generation. Building on a cross-model judge that ranks single-image-to-3D mesh quality where geometry and CLIP proxies fall short, it asks whether the judge's preferences can cheaply specialize a strong open generator, TRELLIS, on one asset class such as furniture without human labels.
    Read original (arXiv cs.LG (Machine Learning)) ↗