Inference & Efficiency (Page 3 of 6)｜AI/Tech News Trends

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Inference & Efficiency

WIDE: Boosting Adaptive LLM Inference via Token-level Dynamic Width Pruning

Deep Learning Inference Neural Network Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Industry Adoption

QuantWAMs: Calibrating at the Right Granularity for World Action Models

Quantization Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN Inference & Efficiency

Why Are GUI Agents Correct but Late? Decode on the Decision-Time Critical Path, Tested with Pre-Compiled Policy Trees

AI Agents Deep Learning Neural Network Reinforcement Learning from Human Feedback (RLHF) Software Engineering

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Multimodal

Correcting What You Cannot See: Credit Assignment for Perception Distillation in Multimodal Reasoners

Neural Network Retrieval-Augmented Generation (RAG) Software Engineering

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN Funding & M&A

Fairness Pruning: Locating Demographic Bias in GLU-MLP Layers via Differential Activations

Inference Llama Machine Learning Meta

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN Inference & Efficiency

Fully Inductive Cardinality Estimation

Embeddings Neural Network Reinforcement Learning

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN Inference & Efficiency

Semi-Supervised Learning for Molecular Graphs via Ensemble Consensus

Machine Learning Neural Network

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Inference & Efficiency

CACHE-UK: A Stability-Aware Memory Editor for Sequentially Updated Quantized LLMs in Finance

Llama Quantization

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN New Model Releases

Tycho: Active Abstraction with Programmatic World Models for ARC-AGI-3

Claude GPT Inference Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Multimodal

Theia: Large-Scale Multimodal Captioning and Automated Validation of the Incidents1M Dataset for Data-Free Distillation

Computer Vision Mixture of Experts (MoE) Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN Inference & Efficiency

Understanding Is Done Early: A Depth Division of Labor in Large Language Models and Its Use for Unbounded-Context Memory

Deep Learning Machine Learning NVIDIA Software Engineering Transformer

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Inference & Efficiency

Operationally Guided Placement-Aware Learning for Industrial Online 3D Bin Packing

Embeddings Inference

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

IEEE Spectrum (AI section) · 2026-07-30 EN Inference & Efficiency extract

Are AI Models Working Harder Than They Need to?

Are AI models working harder than they need to?

Deep Learning Google Inference Neural Network Software Engineering

IEEE Spectrum examines how much of modern AI relies on massive amounts of multiplication. Questioning whether the neural networks behind everything from generated answers to photo organization really need all that computation, the piece explores the potential to make AI inference far more efficient.

Read original (IEEE Spectrum (AI section)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Inference & Efficiency

AgenticASR: Refining Speech Recognition in Real-World Scenarios via an Agentic Approach

Deep Learning Inference Neural Network Reinforcement Learning Speech Processing

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Inference & Efficiency

OPLD: On-Policy Latent Distillation for Multimodal Reasoning

Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Inference & Efficiency

Information Bottleneck Learning for Faithful Time Series Forecasting Explanations

Inference Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Inference & Efficiency

MIND: Lightweight and Effective Memory Injection Defense for LLM Agents via Intent-Aware Information Bottleneck

AI Agents Inference Retrieval-Augmented Generation (RAG) Speech Processing

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN Inference & Efficiency

From Expert Reduction to Behavioral Divergence: Tracing Numerical State through Sparse MoE Inference

DeepSeek Inference Mixture of Experts (MoE) Reinforcement Learning from Human Feedback (RLHF)

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN New Model Releases

Distilling Answer Set Programming Theories from Large Language Models

Claude DeepSeek GPT Neural Network Software Engineering

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN Infrastructure & Hardware

GGC: Selective Query Correction for Reliable Text-to-SPARQL Generation

Inference Neural Network

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Inference & Efficiency

Group-Reflective Self-Distillation for Agentic Reinforcement Learning

AI Agents Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Inference & Efficiency

SemPIC: Learning Semantic Position-Independent KV Caches

Deep Learning Neural Network

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Inference & Efficiency

Stimulus-Evoked Network Dynamics in Human Cortical Organoids: From a Graph-Computational Framework to Repeated-Stimulation Depression

Neural Network Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN Inference & Efficiency

A Query-Efficient Stochastic Volume Rendering Framework for Time-Varying Implicit Neural Volumes

Inference

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN New Model Releases

Contrastive Reinforced Policy Optimization via Privileged Self-Distillation

Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN Inference & Efficiency

Flux-OPD: On-Policy Distillation with Evolving Contexts

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN New Model Releases

Driving up Inference Energy on SNNs: Per-Sample and Universal Sponge Attacks

Inference Neural Network

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN Inference & Efficiency

Generalization Bounds on Optimal Control for Transformer Training and Wasserstein Distributional Robustness

Neural Network Quantization Transformer

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN Inference & Efficiency

TAPO: Transition-Aware Policy Optimization for LLM Agents

AI Agents Algorithms & Theory Inference Reinforcement Learning

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN Infrastructure & Hardware

Gradient-free Task-Conditioned Retrieval for On-Device In-Context Learning

Inference Llama

Read original (arXiv cs.CL (Computation and Language)) ↗