Infrastructure & Hardware B
Showing 61–90 of 110
-
NVIDIA Blackwell Tops MLPerf Training 6.0 with Industry-Leading Scale and PerformanceNVIDIA says Blackwell tops MLPerf Training 6.0 benchmarkNVIDIA announced that its Blackwell GPU architecture topped the MLPerf Training 6.0 benchmark with what it calls industry-leading scale and performance. Summarized neutrally from the title; the export excerpt was blocked (cookie/query data), so figures are vendor claims, not independently verified.
-
Catastrophic Forgetting is Low-Rank: A Function-Space Theory for Continual AdaptationCatastrophic forgetting is low-rank: a function-space theoryCatastrophic forgetting in continual adaptation is usually viewed via parameter drift or replay, which do not reveal which output directions are vulnerable. The paper gives a function-space account in the NTK regime, showing new-task training drifts old-task predictions low-rank through the cross-task kernel.
-
Recursive Scaling in Masked Diffusion ModelsRecursive scaling in masked diffusion modelsMasked diffusion models (MDMs) have recently emerged as a generative approach. The paper investigates recursive scaling in MDMs, offering insights into their behavior and efficiency.
-
LegalHalluLens: Typed Hallucination Auditing and Calibrated Multi-Agent Debate for Trustworthy Legal AILegalHalluLens audits typed legal-AI hallucinations with calibrated debateLegal-AI systems hallucinate at aggregate rates near 52%, but averages hide where and how errors concentrate. LegalHalluLens is an auditing framework pairing typed hallucination auditing with calibrated multi-agent debate to give compliance officers actionable signals for trustworthy legal AI.
-
Multiple cyclicity and Wavelet Decomposition with Channel Correlation for Long-term Time Series ForecastingMultiple cyclicity and wavelet decomposition for long-term forecastingCyclicity and trend are key components of time series, but prior work often neglects real-world inter-channel correlations. The paper combines multiple cyclicity with wavelet decomposition and channel correlation to improve long-term time series forecasting.
-
SoftMoE: Soft Differentiable Routing for Mixture-of-Experts in LLMsSoftMoE: soft differentiable routing for mixture-of-experts in LLMsSparse mixture-of-experts architectures scale LLM parameters but their discrete routing complicates training. SoftMoE introduces soft, differentiable routing for mixture-of-experts in LLMs to enable more stable and efficient expert selection.
-
Predictive Analytics in E-Commerce for CustomerBehavior Forecasting using hybrid Ret-DNN withXGBoost ModelHybrid Ret-DNN with XGBoost for e-commerce behavior forecastingE-commerce platforms struggle to understand customer behavior and predict future purchases. The study proposes predictive analytics using a hybrid Ret-DNN combined with an XGBoost model to forecast customer behavior.
-
Monotonic Kolmogorov-Arnold Networks: A Theoretical and Empirical Study of Monotonicity as an Inductive BiasMonotonic KANs: monotonicity as an inductive bias, studied theoreticallyMonotonicity is a useful architectural inductive bias in tabular, scientific and economic settings. The paper proposes monotonic Kolmogorov-Arnold Networks with per-edge functional transparency and studies monotonicity as an inductive bias both theoretically and empirically.
-
Meta-classification of one-class classification models using ranking correlation and nearest neighborMeta-classification of one-class models via ranking correlation and kNNML has been applied widely, but applying ML to ML models is underexplored. Treating models as approximable by one-class classification (OCC), the paper proposes meta-classification of OCC models using ranking correlation and nearest-neighbor methods.
-
Perceptual compensation for tonal context in self-supervised speech modelsPerceptual compensation for tonal context in self-supervised speech modelsThe study examines the extent to which self-supervised speech models exhibit perceptual compensation for tonal context, analyzing how context effects seen in human speech perception are reflected in the models' learned representations.
-
EComAgentBench: Benchmarking Shopping Agents on Long-Horizon Tasks with Distributed Hidden IntentEComAgentBench: shopping agents on long-horizon tasks with hidden intentAs LLM-based shopping agents reach production, existing benchmarks miss how requirements arrive: implicitly, in a profile, or only when the right question is asked. EComAgentBench evaluates shopping agents on long-horizon tasks with distributed hidden intent.
-
The Fable 5 Export Controls Harm US Cyber DefenseWillison: Fable 5 export controls harm US cyber defenseWillison cites Kate Moussouris that the 'jailbreak' behind Claude Fable 5's export-control ban was merely asking it to 'fix this code' containing known CVEs and planted bugs. Since fixing security bugs is core to coding models, he argues the controls weaken US cyber defense.
-
急拡大するAIインフラの電力需要……光明は「ワットビット連携」に? さくら田中社長と東電が対談Sakura's Tanaka and TEPCO discuss 'watt-bit' coupling for AI power demandAs AI infrastructure drives surging electricity demand, how should data centers and the power grid adapt? In a keynote at Interop Tokyo 2026 in Makuhari, TEPCO Holdings senior fellow Hiroshi Okamoto and Sakura Internet president Kunihiro Tanaka held a dialogue, exploring the potential of 'watt-bit' coupling that links computing resources with power supply.
-
Fine-Tuning Biological Foundation Models with LoRA Using NVIDIA BioNeMo RecipesNVIDIA details LoRA fine-tuning of biological foundation models via BioNeMoAn NVIDIA developer blog post explains how to efficiently fine-tune biological foundation models—pretrained on large protein or genomic sequence corpora, such as the ESM2 protein language model—using LoRA, illustrated with the company's BioNeMo Recipes. A technical piece on applying foundation models in computational biology.
-
Exact Posterior Score Estimation for Solving Linear Inverse ProblemsExact closed-form posterior score for linear inverse problemsThe paper derives the exact posterior score in closed form for linear Gaussian inverse problems under general Gaussian interpolants, showing that posterior sampling reduces to a denoising problem at an operator-dependent shifted pivot with anisotropic noise. It turns this into a training objective, Exact Posterior Score (EPS), that preserves standard denoising structure and can be trained from scratch or fine-tuned.
-
Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode OutcomesHABC: hierarchical advantage weighting for RL fine-tuning of VLAsOnline RL fine-tuning of pretrained VLA policies yields only one binary outcome per episode, yet actor updates need per-transition signals. The authors argue a single scalar conflates viability and efficiency and that mixing autonomous and intervention segments misassigns credit. Their method, Hierarchical Advantage-Weighted Behavior Cloning (HABC), trains separate critic heads for the two objectives on different data subsets.
-
Selection Without Signal, Recovery Through Expression: A Measurement Study of Post-Hoc Falsification Operators for Frozen Small Code ModelsMeasurement study of post-hoc falsification operators for code modelsPer its title, this paper presents a measurement study of post-hoc 'falsification operators' applied to frozen (non-retrained) small code models, framed around selection without signal and recovery through expression. The raw excerpt was blocked by a content filter, so this summary is based on the title alone and stays deliberately neutral.
-
Stable Menus of Public Goods: AI-Enabled ProgressStudy tests AI research workflows on an open economics problemUsing an open problem from an EC 2025 paper as a testbed, the paper studies AI-for-economics research workflows. It reports that prompting with human intuition and multi-turn interaction can help, while finding an LLM slightly less effective than a first-year PhD student on the task.
-
Decoupling Inference from State Updates in Low-Latency Feature Engines via Probabilistic ThinningProbabilistic thinning decouples inference from state updates in streamsStreaming data systems increasingly underpin ML workflows maintaining many continuously updated aggregations. In production, each event triggers read-modify-write operations to storage, making high-frequency state updates a dominant source of latency, contention, and cost. This work decouples inference from persistence via probabilistic thinning: every event is scored, but durable updates fire only for informative events, using approximate disk-backed statistics with no in-memory control plane.
-
Phantoms and Disclosures: a Causal Framework for Auditing Synthetic DataA causal auditing framework to detect synthetic-data privacy disclosuresGenerative AI and LLMs have made synthetic data a popular privacy-preserving substitute for sensitive datasets, yet it can memorize and reproduce private training data. The authors propose a customizable empirical framework distinguishing "true disclosures" (direct reproduction of user data) from "phantom disclosures" (incidental generation). Using training/holdout partitioning and statistical hypothesis testing, it checks whether disclosures match strict privacy baselines like zero-learning.
-
Boosting MoE Training Throughput with Advanced Fusion KernelsNVIDIA details advanced fusion kernels to boost MoE training throughputOn its developer blog, NVIDIA explains advanced fusion-kernel techniques aimed at boosting training throughput for Mixture-of-Experts (MoE) models. Noting that MoE has rapidly become a foundational component of modern large-scale AI systems, the post outlines kernel-level optimizations for more efficient training.
-
Upper Bounds on the Generalization Error of Deep Learning Models via Local Robustness and StabilityTighter deep-learning generalization bounds via local robustnessRobustness-based generalization bounds are often vacuous in practice. The authors trace much of the looseness to the robustness term itself, especially for 0-1 loss, which is usually treated as a global measure. They propose a bound that scales the robustness term by the number of stable and unstable samples across input sub-regions, yielding tighter estimates.
-
Revisiting the Systematicity in Negation in the Era of In-Context LearningRevisiting LLM systematicity in negation via in-context learningAn arXiv paper analyzes how large language models understand negation from two angles: behavioral and representational systematicity. It reports that, via demonstrations and in-context learning, LLMs can handle negation to some degree, and examines the limits of that systematicity. Neutral, abstract-based summary.
-
Deep Q-Learning on Hölder SpacesBellman-target regularity analysis motivates a tensor-product DeepONetThis work studies the operator-theoretic core of Q-learning in continuous-time stochastic control with continuous states and actions. Under uniform ellipticity and Hölder-regular coefficients, a Bellman update smooths the state while leaving Lipschitz dependence on the action, motivating a tensor-product DeepONet and yielding approximation and resource bounds with a stiffness-complexity trade-off.
-
How Much Can We Trust LLM Search Agents? Measuring Endorsement Vulnerability to Web Content ManipulationPaper: framework measures LLM search-agent endorsement riskAn arXiv paper introduces SearchGEO, a controlled framework for measuring endorsement corruption in LLM-based web-search agents, combining a web-evidence manipulation pipeline and a five-mode attack taxonomy across multiple backends. Summarized neutrally from the abstract.
-
Scaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a Lightweight VerifierPaper: semi-supervised LLM reasoning from minimal labelsAn arXiv paper presents a semi-supervised framework that scales LLM reasoning from minimal supervision, using a lightweight reasoning-correctness classifier to turn verification into a data-creation mechanism. Summarized neutrally from the abstract.
-
Multi-Turn Reflective Masking Elicits Reasoning in Mask Diffusion ModelsReflective Masking elicits iterative reasoning in mask diffusion modelsThe paper introduces Reflective Masking, a lightweight post-training method that lets mask diffusion models iteratively revisit and revise prior outputs via multi-turn masking, plus a History Reference component. Claims reflect the abstract and are not independently verified.
-
FraudSMSWalker: Benchmarking Agentic Large Language Models for SMS-to-Webpage Fraud DetectionFraudSMSWalker benchmark targets URL-masked SMS-to-webpage fraudThe paper introduces FraudSMSWalker, a controlled benchmark for URL-masked SMS-to-webpage fraud judgment. It contains 699 bilingual chains (332 fraudulent, 367 benign) across ten scenarios, withholding raw URLs, hosts, and reputation metadata so models cannot rely on reputation shortcuts, and evaluates nine web agents.
-
Pretrained to Imagine, Fine-Tuned to Act: The Rise of World-Action ModelsNVIDIA explains the rise of World-Action Models for roboticsNVIDIA's technical blog surveys World-Action Models (WAMs)—robot policies pretrained to "imagine" via world modeling, then fine-tuned to act. It relates them to Vision-Language-Action (VLA) models built on pretrained VLM backbones for robotics.
-
VeriGraph: Towards Verifiable Data-Analytic AgentsVeriGraph: a traceable neuro-symbolic framework for verifiable data agentsThis arXiv paper introduces VeriGraph, a traceable neuro-symbolic reasoning framework for verifiable data-analytic agents. The authors note that LLM agents' reliance on linear text trajectories makes reasoning hard to audit, entangling deterministic computations over raw data with semantic deductions over natural-language claims. VeriGraph instead has agents build an explicit heterogeneous evidence directed acyclic graph (DAG) during execution.