Inference & Efficiency｜AI/Tech News Trends

Sakana AI Blog (ja) · 2026-08-02 JA New Model Releases extract

Sakana AI、日本語特化のLLM API「Sakana Namazu」を提供開始

Sakana AI launches Namazu, a Japanese-focused OpenAI-compatible LLM API

AI Agents Inference Machine Learning Meta OpenAI

Sakana AI released Namazu, an LLM API tuned for Japanese and local business use. Built on Moonshot AI's open Kimi K2.6 and refined with in-house data, it adds built-in web search and code execution. Being OpenAI-compatible, existing code works by swapping the base_url, filling the gap between costly frontier models and raw open ones.

Read original (Sakana AI Blog (ja)) ↗

ITmedia AI+ · 2026-08-01 JA New Model Releases extract

OpenAI、アクティブユーザー10億人超に　導入企業は200万社超

OpenAI passes 1 billion active users and 2 million business customers

GPT Inference OpenAI

OpenAI said it surpassed one billion active users and two million business customers. It cited efficiency gains from retained reasoning, better context management, and production optimization that cut costs and improved token throughput, alongside price cuts on some GPT-5.6 models.

Read original (ITmedia AI+) ↗

NVIDIA Developer Blog · 2026-07-31 EN Infrastructure & Hardware extract

Co-Designing AI Model Attention for Fast, Interactive Long-Context Inference

NVIDIA details co-designed attention for fast long-context inference

Generative AI Inference NVIDIA

NVIDIA describes co-designing model attention with hardware to speed up interactive long-context inference. As agentic and long-context workloads grow, attention takes a larger share of inference time, and the approach targets that bottleneck for faster serving.

Read original (NVIDIA Developer Blog) ↗

arXiv cs.LG (Machine Learning) · 2026-07-31 EN Inference & Efficiency

GQ-FSL: Green Quantized Federated Split Learning

Neural Network Quantization

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN New Model Releases

When Does On-Policy Interaction Help? Representational Tradeoffs in Value-Based Imitation Learning

Neural Network Reinforcement Learning Robotics

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-31 EN New Model Releases

QASP: Query-Adaptive Robust Vector Search Policy

Inference Retrieval-Augmented Generation (RAG)

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN New Model Releases

FriendBench: Benchmarking Dyadic Familiarity Inference in Humans and Multimodal Large Language Models

Inference Neural Network Software Engineering Speech Processing

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-31 EN Training & Fine-tuning

The Parts Are Greater Than the Sum: Automated Task Sequencing for Efficient Training of Multi-Policy LLMs

Fine-tuning Quantization

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN Inference & Efficiency

ResKV: Reconstructing Omitted Attention Contributions for Fixed-Budget KV Cache Compression

Inference

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-31 EN Inference & Efficiency

Adaptive FastOPD: Progress-Aware Rollout Horizon Expansion for Efficient On-Policy Distillation

Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN New Model Releases

Evidence-Type Competition: When Can Interventional Data Teach Language Models Causal Direction?

Inference Reinforcement Learning

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Inference & Efficiency

Self-Play Meets Skill Evolution: Self-Evolving Search Agents that Pose, Solve, and Remember

AI Agents Inference Retrieval-Augmented Generation (RAG) Reinforcement Learning Software Engineering

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Inference & Efficiency

TFGformer: Multivariate Time Series Forecasting via Time-Frequency Graph Learning and Covariate Fusion

Inference Retrieval-Augmented Generation (RAG)

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-31 EN Inference & Efficiency

Analytical and Bootstrap Confidence Intervals of Double Machine Learning: Simulation studies and an application to rural-urban difference in obesity prevalence

Algorithms & Theory Inference Machine Learning Neural Network Retrieval-Augmented Generation (RAG)

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Inference & Efficiency

Beyond Retrieval: Analytic Memory for Multimodal Agents

AI Agents Inference Meta Neural Network Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN New Model Releases

Bridging the Question-Answer Gap in Retrieval-Augmented Generation: Hypothetical Prompt Embeddings

Embeddings Retrieval-Augmented Generation (RAG) Software Engineering

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-31 EN Inference & Efficiency

OnlineCache: Learning Dynamic Caching Policies with Error Correction for Efficient Diffusion Inference

Inference Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN Inference & Efficiency

Studying quantization trade-offs for efficient inference deployment in machine translation

Deep Learning Inference Quantization

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Inference & Efficiency

Versatile On-device Adaptation at the Edge by Unifying Few-shot, Zero-shot, Continual, and In-context Learning

Algorithms & Theory Inference Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN New Model Releases

MAGA: Multi-Platform Self-Fusion of GUI Agents via Structured Action Distillation

AI Agents Neural Network Retrieval-Augmented Generation (RAG)

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Inference & Efficiency

Translation with Thought: Difficulty-Adaptive Reasoning via Reinforcement Learning for Multi-Domain Machine Translation

DeepSeek Fine-tuning GPT Inference Neural Network

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN New Model Releases

OsteoCAD: A Human-in-the-Loop Cloud-Edge Framework for Bone Tumor Segmentation

Deep Learning Inference Neural Network Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Agents & Tool Use

Tool Specifications Matter: Uncovering and Mitigating Safety Risks in AI Agents

AI Agents Deep Learning Inference Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN Agents & Tool Use

Data Turnstile: A Scalable Open Framework for Function-Calling Data Generation

Neural Network

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Safety & Evaluation

Don't Mix Rewards, Mix Policies: Policy Decomposition and Optimization for Multi-Reward RL

Inference Reinforcement Learning Software Engineering

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Infrastructure & Hardware

Small Is Enough: Per-User Style Rewriting of AI-Edited Text via LoRA Adapters

Inference

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Inference & Efficiency

FBFM: A Training-Free Asynchronous Feedback Mechanism for Flow-Matching in World-Action Models Execution

Inference Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Inference & Efficiency

MOSAIC: Masked Outsourcing of Secure AI Computations

Inference Quantization Transformer

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Inference & Efficiency

SAF-OPD: Stable Advantage Fusion for On-Policy Distillation

Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Inference & Efficiency

SERUM: State Extraction and Refinement for User Modeling

Embeddings Inference Neural Network

Read original (arXiv cs.AI (Artificial Intelligence)) ↗