Inference & Efficiency A
Showing 91–116 of 116
-
Beyond the Smile: A Hybrid Convolutional VAE for Crypto Volatility SurfacesA convolutional VAE for completing crypto implied-volatility surfacesThe authors present a convolutional variational autoencoder for crypto implied-volatility surfaces, paired with a predictor combining it with a quadratic smile re-fit via a deterministic per-tenor routing rule. Trained on 6,034 hourly Binance BTC and ETH option surfaces (May-October 2023), it achieves hidden-cell completion RMSE of 0.94-1.56 vol points. At 50% masking the hybrid reaches 0.83 vol points versus 7.00 for the smile re-fit alone, an eightfold reduction at no extra inference cost.
-
Boosting MoE Training Throughput with Advanced Fusion KernelsNVIDIA details advanced fusion kernels to boost MoE training throughputOn its developer blog, NVIDIA explains advanced fusion-kernel techniques aimed at boosting training throughput for Mixture-of-Experts (MoE) models. Noting that MoE has rapidly become a foundational component of modern large-scale AI systems, the post outlines kernel-level optimizations for more efficient training.
-
A Causal Model of Theory of Mind in Conflict for Artificial IntelligenceA structural causal model for when AI should engage theory of mind in conflictTheory of mind (ToM), ascribing mental states to others for prediction and inference, is widely assumed essential for human-machine integration. Existing AI-ToM models address how to mentalize but leave when largely unaddressed. The paper asks under what situational and agent-level conditions ToM engagement is causally warranted in conflict, presenting a structural causal model as a directed acyclic graph that treats ToM as a mechanism activated by conditions rather than an always-on capacity.
-
Exploring Extrinsic and Intrinsic Properties for Effective Reasoning with Code InterpreterStudy probes extrinsic and intrinsic traits of code-interpreter reasoningThis paper studies reasoning with a Code Interpreter (CI) in LLMs from two angles: extrinsic properties (crucial tokens) and intrinsic properties (code-specific cognitive behaviors). It reports that stronger CI reasoning models show more crucial tokens and behaviors—especially verification, backtracking, and backward chaining—and explores leveraging these at inference and training time. Summarized neutrally from the abstract.
-
RAID: Semantic Graph Diffusion for True Cold-Start and Cross-Lingual ForecastingRAID: retrieval-augmented diffusion for cold-start, cross-lingual forecastingTime-series foundation models transfer well given a history window, but true cold-start items with no prior observations violate that. The authors propose RAID (Retrieval-Augmented Iterative Diffusion), replacing history-based correlation with metadata-driven semantic retrieval and graph-conditioned diffusion. It maps metadata into a shared semantic space via a frozen multilingual embedding model, builds an inductive retrieval graph for unseen items, and refines a forecast from neighbors.
-
MA-SBI: Misspecification-Aware Simulation-Based Inference via Side-Channel GuidanceMA-SBI: misspecification-aware inference via side-channel guidanceSimulation-based inference (SBI) is often hindered by simulator misspecification, the mismatch between simulated and real observations. The recent robust method RoPE uses optimal transport between learned representations but needs ground-truth calibration pairs unavailable where SBI is needed. Practitioners instead have unstructured side-information such as regime labels, instruction text, and policy bulletins. The authors propose Misspecification-Aware SBI (MA-SBI) to exploit this guidance.
-
LESS Is More: Mutual-Stability Sampling for Diffusion Language ModelsLESS: a training-free adaptive sampler for diffusion language modelsThe paper presents LESS, a training-free, model-agnostic adaptive sampler for diffusion LLMs that frames token commitment as an online stopping problem. Its mutual-stability rule unmasks a position only when its top-1 prediction is confident, persists across recent steps, and is distributionally stable (top-K inter-step JS divergence). It is evaluated on Dream-7B, LLaDA-8B, and LLaDA-1.5-8B. Summarized neutrally from the abstract.
-
Binary Tracking for Spatial QA and Navigation with Open Vision-Language ModelsBinary Tracking: open vision-language models for spatial QA and navigationThe paper addresses spatial question answering for service robots traversing long egocentric routes, returning metric coordinates that downstream navigation can act on for queries like 'where can I find a dry cleaner on the way back home?' Prior approaches rely on closed-source models such as GPT-4o, which robots cannot reliably depend on due to network instability, latency, and deployment cost. The authors propose Binary Tracking, an open-source vision-language approach that can run onboard.
-
Follow the Latent Roadmap: Navigating Revocable Decoding for Diffusion LLMs with Anchor TokensAnchor-token roadmap for revocable decoding in diffusion LLMsAn arXiv paper addresses the speed-quality trade-off and error propagation in revocable decoding for diffusion LLMs (dLLMs). It proposes following a latent 'roadmap' guided by anchor tokens to mitigate failures arising in mixed-quality contexts during parallel generation. Neutral, abstract-based summary.
-
Tying the Loop -- Tied Expert Layers in Mixture-of-Experts Language ModelsPaper: Expert Tying shares MoE expert params across layersAn arXiv paper introduces Expert Tying, an architectural change that shares expert parameters across consecutive transformer layers while keeping independent layer-wise routing and attention, aiming to cut Mixture-of-Experts memory cost. Summarized neutrally from the abstract.
-
GIST-CMTF: Goal-State Inference for Causal Minimal Tool Filtering in LLM AgentsGIST-CMTF adds goal-state inference to causal minimal tool filteringThe paper introduces GIST-CMTF, which augments Causal Minimal Tool Filtering with goal-state inference for tool-augmented LLM agents. It addresses wrong-goal execution, where ambiguous requests such as "handle my appointment" map to multiple goals and an agent may follow a valid causal tool path toward an unintended objective.
-
LLM-based Visual Code Completion for Aerospace Geometric DesignPaper: LLM visual-programming copilot for aerospace designAn arXiv paper presents an LLM-based visual programming copilot for aerospace geometric design tasks, using a visual-programming variant of the ReAct methodology. Summarized neutrally from the abstract; claims are the authors' and not independently verified.
-
Progressive Knowledge-Guided Large Language Model Framework for Bearing Fault DiagnosisPhysics-guided multi-scale framework for bearing fault diagnosisThe paper proposes a progressive, physics-guided multi-scale vibration-processing pipeline for bearing fault diagnosis, using a kinematics-derived descriptor for real-time screening and fault-adaptive segmentation. Reported figures reflect the abstract and are not independently verified.
-
DoubtProbe: Black-Box Jailbreak Defense via Structural Verification and Semantic AuditingDoubtProbe: a dual-branch inference-time defense against LLM jailbreaksThis arXiv paper proposes DoubtProbe, a dual-branch inference-time framework for black-box jailbreak defense in LLMs. The authors observe that many jailbreaks do not remove the harmful goal but reorganize the information needed to express it, evading safety alignment while remaining recoverable during generation. DoubtProbe combines structural verification and semantic auditing to counter this.
-
Sakana AI、初の商用プロダクト「Sakana Marlin」を提供開始Sakana AI launches Marlin, its first commercial autonomous research assistantSakana AI has launched Sakana Marlin, its first commercial product: an autonomous research assistant for business. Given a research theme, it works autonomously for up to about eight hours—forming hypotheses, gathering and verifying information—then outputs structured summary slides and a report spanning dozens of pages. Built on the firm's long-horizon reasoning technology, it aims to act as a 'virtual CSO,' is self-serve, and available same day, with plans from free pay-per-use to Enterprise.
-
The future of Siri, or: why private inference isn’t private enoughThe future of Siri: why private inference isn't private enoughAn essay on the future of voice assistants like Siri, arguing that on-device or 'private' inference alone does not fully protect user privacy and that stronger guarantees are needed beyond encryption and local processing.
-
NVIDIA Achieves Leading Agentic Coding Performance on First Agentic AI BenchmarkNVIDIA tops first agentic AI benchmark for agentic coding performanceNVIDIA reports leading agentic coding performance on the first benchmark dedicated to agentic AI, per its developer blog. The result highlights its inference stack and GPU infrastructure as a platform for autonomous coding agents.
-
AdaSR: Adaptive Streaming Reasoning with Hierarchical Relative Policy OptimizationAdaSR enables adaptive streaming reasoning for reasoning modelsAdaSR moves beyond the read-then-think paradigm by letting reasoning models reason incrementally as input streams in. It uses a hierarchical relative policy optimization scheme to train streaming reasoning.
-
HumP-KD: A Hybrid Uncertainty-Aware Multi-Stage Progressive Knowledge Distillation Framework for Efficient Fire ClassificationHumP-KD: uncertainty-aware distillation for efficient fire classificationHumP-KD is a hybrid, uncertainty-aware multi-stage progressive knowledge distillation framework for fire classification. It targets models that are simultaneously accurate and efficient for real-time use.
-
Towards Direct Latent-Space Synthesis for Parallel Branches in LLM-Agent WorkflowsDirect latent-space synthesis for parallel branches in LLM-agent workflowsLLMs serve as execution engines for agentic systems yet still consume context through a sequential text interface, mismatching modern structured workflows with independent parallel branches. The paper explores synthesizing such parallel branches directly in latent space.
-
When to Write and When to Suppress: Route-Specialized Dual Adapters for Memory-Assisted Knowledge EditingRoute-specialized dual adapters for memory-assisted knowledge editingThis work targets knowledge editing that updates selected facts while preserving nearby behavior in a memory-assisted setting. It proposes route-specialized dual adapters that decide when to write and when to suppress edits.
-
Abstracting Cross-Domain Action Sequences into Interpretable WorkflowsAbstracting cross-domain action sequences into interpretable workflowsTime-stamped interaction logs objectively record digital app usage, but their granularity and noise obscure meaningful insights into work. The paper proposes abstracting cross-domain action sequences into interpretable workflows.
-
Moonlight in Latent Space: Chirality and Structural Correspondence Between Beethoven's Op. 27 No. 2 and Machine Learning MechanismsStructural correspondence between Beethoven's Moonlight Sonata and MLThrough computational analysis, this paper argues that the three movements of Beethoven's Moonlight Sonata (Op. 27 No. 2) instantiate three distinct machine learning architectures by structural correspondence rather than mere analogy.
-
Realizing Native INT8 Compute for Diffusion Transformers on Consumer GPUs: A Fused INT8 GEMM Kernel for Ideogram 4.0A fused INT8 GEMM kernel speeds diffusion transformers on consumer GPUsPost-training INT8 quantization of diffusion transformers is often slower than FP8/NF4 on consumer Ampere GPUs. The paper presents a fused INT8 GEMM kernel for Ideogram 4.0 that realizes native INT8 speedups.
-
Zero-shot generalization of transformer neural operators to larger domainsZero-shot generalization of transformer neural operators to larger domainsThe paper studies whether transformer-based neural operators for PDE solution operators can generalize zero-shot to larger spatial domains than seen in training.
-
CARE: Controlling LLM-Generated Policies through Auditable Review of Evidence in Scientific ExperimentationCARE: auditable evidence review to control LLM-generated policiesGiving LLMs direct control over costly, irreversible experiments invites unsafe exploration, while discarding their creativity sacrifices optimization. CARE controls LLM-generated policies through auditable review of evidence in scientific experimentation.