Infrastructure & Hardware B

Showing 91–110 of 110
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    Uncertainty Is Not a Safety Net for Clinical VQA, but Can It Anticipate Model Failure?
    Uncertainty estimation fails as a safety net for clinical VQA
    Computer Vision Retrieval-Augmented Generation (RAG) Software Engineering
    This arXiv paper tests whether uncertainty estimation (UE) gives clinical vision-language models a reliable trust-or-escalate signal. Benchmarking 8 methods across 12 VLMs on clinical visual question-answering, the authors find UE quality is not intrinsic to the method but tracks model accuracy—degrading exactly where performance is weakest and reliability most needed. Under perturbations that hide the correct option, accuracy collapses while uncertainty barely changes.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Funding & M&A extract
    Can LLM Coding Agents Reason About Time Series?
    Can LLM coding agents reason about time series? A benchmark study
    AI Agents Software Engineering
    This arXiv study tests whether LLM agents can analyze ubiquitous time series data used in finance, healthcare, and environmental monitoring. Comparing three approaches—raw numerical data, the LLM as a coding agent, and a combination—the authors find that agents with code access can outperform models processing raw data by up to 10%, though even the best agent still answers roughly 22-34% incorrectly.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Infrastructure & Hardware extract
    daVinci-kernel: Co-Evolving Skill Selection, Summarization, and Utilization via RL for GPU Kernel Optimization
    daVinci-kernel: an RL framework co-evolving skills for GPU kernel tuning
    AI Agents Fine-tuning Reinforcement Learning
    GPU kernel optimization assumes correctness and targets execution efficiency. The authors present daVinci-kernel, an RL framework coupling skill discovery and exploitation via a dynamically evolving skill library. Three agents share one LLM backbone: a Selection Agent retrieving techniques via BM25 and LLM reranking, a Policy Agent generating CUDA/Triton kernels, and a Summary Agent distilling rollouts into reusable skills. Skills are added only after execution verification confirms speedups.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • Simon Willison's Weblog · EN Developer Tools extract
    Mapping SQLite result columns back to their source `table.column`
    Mapping SQLite result columns back to their source table.column
    Claude Machine Learning Neural Network
    A technical note exploring how to map columns in arbitrary SQLite query results back to their originating table.column, so that Datasette could render queries with richer, source-aware metadata.
    Read original (Simon Willison's Weblog) ↗
  • Simon Willison's Weblog · EN New Model Releases extract
    OpenAI WebRTC Audio Session, now with document context
    Simon Willison adds document context to his OpenAI WebRTC audio tool
    GPT OpenAI
    Simon Willison updated his browser tool for OpenAI's WebRTC realtime audio API. It now supports the newer realtime voice model touting GPT-5-class reasoning, and lets users paste document text as context for spoken conversations about it.
    Read original (Simon Willison's Weblog) ↗
  • NVIDIA Developer Blog · EN Agents & Tool Use extract
    NVIDIA Achieves Leading Agentic Coding Performance on First Agentic AI Benchmark
    NVIDIA tops first agentic AI benchmark for agentic coding performance
    AI Agents Generative AI Inference NVIDIA
    NVIDIA reports leading agentic coding performance on the first benchmark dedicated to agentic AI, per its developer blog. The result highlights its inference stack and GPU infrastructure as a platform for autonomous coding agents.
    Read original (NVIDIA Developer Blog) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    AdaSR: Adaptive Streaming Reasoning with Hierarchical Relative Policy Optimization
    AdaSR enables adaptive streaming reasoning for reasoning models
    Machine Learning Retrieval-Augmented Generation (RAG) Reinforcement Learning Software Engineering Speech Processing
    AdaSR moves beyond the read-then-think paradigm by letting reasoning models reason incrementally as input streams in. It uses a hierarchical relative policy optimization scheme to train streaming reasoning.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Flood and Harvest: The Provable Necessity of Trivia for Generating Valuable Mathematics via the Lens of Language Generation in the Limit
    Why generating 'trivia' is provably necessary for valuable mathematics
    Retrieval-Augmented Generation (RAG)
    As AI coupled to proof assistants generates formal mathematics at scale, a gap opens between what a checker verifies and what mathematicians value. Through the lens of language generation in the limit, the paper argues that producing trivial, peripheral statements is provably necessary to generate valuable mathematics.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Infrastructure & Hardware extract
    Compressed Computation is (probably) not Computation in Superposition
    Compressed Computation is probably not computation in superposition
    The paper examines whether the Compressed Computation toy model is an instance of computation in superposition. It argues, based on analysis, that it probably is not.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    Towards Direct Latent-Space Synthesis for Parallel Branches in LLM-Agent Workflows
    Direct latent-space synthesis for parallel branches in LLM-agent workflows
    AI Agents Neural Network
    LLMs serve as execution engines for agentic systems yet still consume context through a sequential text interface, mismatching modern structured workflows with independent parallel branches. The paper explores synthesizing such parallel branches directly in latent space.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Inference & Efficiency extract
    Moonlight in Latent Space: Chirality and Structural Correspondence Between Beethoven's Op. 27 No. 2 and Machine Learning Mechanisms
    Structural correspondence between Beethoven's Moonlight Sonata and ML
    Embeddings Machine Learning Neural Network Natural Language Processing (NLP) Reinforcement Learning
    Through computational analysis, this paper argues that the three movements of Beethoven's Moonlight Sonata (Op. 27 No. 2) instantiate three distinct machine learning architectures by structural correspondence rather than mere analogy.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Infrastructure & Hardware extract
    A Statistical and Machine Learning Framework for Operational Threshold Detection and Deployable Dispatch Controller Development in Hydrogen Multi-Energy Systems
    ML framework for threshold detection in hydrogen multi-energy systems
    Machine Learning Reinforcement Learning
    The study presents a statistical and machine learning framework characterizing a hydrogen-based multi-energy system. It targets operational threshold detection and deployable dispatch controller development.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Inference & Efficiency extract
    Realizing Native INT8 Compute for Diffusion Transformers on Consumer GPUs: A Fused INT8 GEMM Kernel for Ideogram 4.0
    A fused INT8 GEMM kernel speeds diffusion transformers on consumer GPUs
    Neural Network Quantization Transformer
    Post-training INT8 quantization of diffusion transformers is often slower than FP8/NF4 on consumer Ampere GPUs. The paper presents a fused INT8 GEMM kernel for Ideogram 4.0 that realizes native INT8 speedups.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    Cluster LOCO: Feature Importance For Interpreting Clusters
    Cluster LOCO gives feature importance to interpret clusters
    Algorithms & Theory
    Clustering is widely used but its outputs are hard to interpret and audit. Cluster LOCO provides feature-importance scores to explain what distinguishes each cluster.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    VISTA: View-Consistent Self-Verified Training for GUI Grounding
    VISTA: view-consistent self-verified training for GUI grounding
    Reinforcement Learning Software Engineering
    Applying GRPO to GUI grounding samples rollouts from a single screenshot, so groups often turn all-failure or all-success and yield weak signal. VISTA introduces view-consistent, self-verified training to stabilize GUI grounding.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    Regional Climate Model Emulation with Diffusion Approaches: What is the Added Value of Generative Machine Learning?
    Added value of diffusion-based generative ML for climate model emulation
    Deep Learning Machine Learning Neural Network Reinforcement Learning
    Emulators cheaply reproduce regional climate models' downscaling, linking global-model predictors to high-resolution fields. The paper assesses the added value of diffusion-based generative machine learning for regional climate model emulation.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Every Eval Ever: A Unifying Schema and Community Repository for AI Evaluation Results
    Every Eval Ever: a unifying schema and repository for AI evaluations
    Meta Neural Network
    AI evaluations are widely used to track progress, but inconsistencies across evaluators hinder analysis and comparison. The paper proposes a unifying schema and a community repository, Every Eval Ever, for AI evaluation results.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • NVIDIA Developer Blog · EN Industry Adoption extract
    Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA Accelerated Infrastructure
    NVIDIA details deploying MiniMax M3 for long-context agentic workflows
    Generative AI NVIDIA Retrieval-Augmented Generation (RAG)
    NVIDIA's developer blog explains how to deploy MiniMax M3 on NVIDIA accelerated infrastructure for long-context reasoning and agentic workflows, addressing fragmented enterprise AI pipelines spanning text, vision, and other modalities.
    Read original (NVIDIA Developer Blog) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    Dense Coordinate-List Fine-Tuning Induces a Controllable Interference Surface in Vision-Language Models
    Dense coordinate-list fine-tuning induces a controllable interference surface
    Computer Vision Fine-tuning Reinforcement Learning from Human Feedback (RLHF) Software Engineering
    Fine-tuning vision-language models to emit dense coordinate lists improves grounding but alters how they serialize, repeat, and terminate structured output. The paper shows this induces a controllable interference surface in VLMs.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Infrastructure & Hardware extract
    When the Tool Decides: LLM Agents Defer Blindly to Graph Neural Network Tools, and Stronger Backbones Defer More
    LLM agents defer blindly to GNN tools — stronger backbones defer more
    AI Agents Deep Learning Neural Network Software Engineering
    A growing line of work equips LLM agents with graph neural networks as callable tools. The paper finds that agents defer blindly to these GNN tools, and that stronger backbones tend to defer even more.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗