Inference & Efficiency (Page 2 of 6)｜AI/Tech News Trends

Lobste.rs (AI tagged) · 2026-07-31 EN Inference & Efficiency extract

vLLM for Baidu Kunlun

Baidu open-sources a vLLM port for its Kunlun AI chips

Inference

Baidu published vLLM-Kunlun on GitHub, a port of the high-throughput vLLM inference engine targeting its in-house Kunlun AI accelerators. The project expands the options for running LLM inference on non-NVIDIA hardware.

Read original (Lobste.rs (AI tagged)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN Inference & Efficiency

Can Zero-Shot LLMs Predict Child Malnutrition? A Fairness and Temporal Robustness Study

Deep Learning GPT Inference Meta Reinforcement Learning

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN Inference & Efficiency

TransMem: Transforming Hidden States into Memory for Large Language Models

AI Agents Deep Learning Inference Retrieval-Augmented Generation (RAG)

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN New Model Releases

GoldenRetriever: Non-Interactive Homomorphic Encrypted Retrieval for Privacy-Preserving RAG

Retrieval-Augmented Generation (RAG)

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN Inference & Efficiency

Adjudicated Captioning: Multi-Agent Alignment Scoring and Consensus-Distilled Beam Arbitration for Strict Zero-Shot Image Captioning

Deep Learning Inference Transformer

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN New Model Releases

Mixture-of-Translators: Translating KV Caches Across Heterogeneous Large Language Models

Deep Learning GPT Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN Inference & Efficiency

BLADE: Boundary-Expanded and Layer-Adaptive Dynamic Exit for Efficient LLM Reasoning

Inference Retrieval-Augmented Generation (RAG) Reinforcement Learning Software Engineering

Read original (arXiv cs.CL (Computation and Language)) ↗

ITmedia AI+ · 2026-07-31 JA Training & Fine-tuning extract

Thinking Machines、軽量モデル「Inkling-Small」正式公開　サイズ4分の1で「Inkling」に匹敵する性能

Thinking Machines releases Inkling-Small, matching Inkling at 1/4 the size

Reinforcement Learning

Thinking Machines Lab released the final version of Inkling-Small, an open-weight AI model. At a quarter the size of its predecessor, the company says data improvements and reinforcement learning let it match the larger Inkling on tasks such as code generation.

Read original (ITmedia AI+) ↗

ITmedia AI+ · 2026-07-31 JA Inference & Efficiency extract

Chromeに13年以上潜んでいた脆弱性、AIで発見　直近2回のアプデで過去23回分を上回るバグ修正

Google's Gemini agent finds 13-year-old Chrome flaw; tests twice-weekly updates

AI Agents Gemini Google

Google detailed its use of AI for Chrome security, saying a Gemini-based agent uncovered a vulnerability hidden for over 13 years and that its last two updates fixed more bugs than the previous 23 combined. To counter faster AI-driven attacks, Google is trialing twice-weekly security updates.

Read original (ITmedia AI+) ↗

ITmedia AI+ · 2026-07-31 JA New Model Releases extract

Google、ロボット向けAI「Gemini Robotics 2」発表　ヒューマノイドの全身制御や指先作業を実現

Google unveils Gemini Robotics 2 for whole-body and fine fingertip control

Gemini Google Inference Robotics

Google and Google DeepMind announced Gemini Robotics 2, a family of robotics AI models supporting humanoid whole-body control, fine fingertip manipulation, and multi-robot collaboration. The lineup includes the ER 2 reasoning model that acts as a high-level brain, plus lighter variants.

Read original (ITmedia AI+) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN New Model Releases

Token-Level Diagnosis of Sycophancy in LLMs with Attribution-Guided Steering

Inference Neural Network

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Inference & Efficiency

ReToken: One Token to Improve Vision-Language Models for Visual Retrieval

Computer Vision Embeddings Inference

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN Multimodal

VAD: Attributing Visual Evidence for Target Reconstruction in Multimodal On-Policy Distillation

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN Inference & Efficiency

MixFrag: Fragility-Guided Mixed-Precision Post-Training Quantization for Vision Transformers

Computer Vision Quantization Retrieval-Augmented Generation (RAG) Reinforcement Learning Transformer

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN New Model Releases

$β$-OPSD: Deriving with Policy Optimization, Training with Self-Distillation

Reinforcement Learning

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN New Model Releases

Rethinking Inference-Time Scaling in Local Computer-Use Agents: Failure Modes and Compute Tradeoffs

AI Agents Inference Neural Network Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN New Model Releases

Frontis-MA1: Training an AI4AI Model towards Recursive Self-Improvement in Machine Learning Engineering

Fine-tuning GPT Machine Learning Meta Retrieval-Augmented Generation (RAG)

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN Inference & Efficiency

Doubly Robust Functional Representation Learning for Longitudinal Causal Inference with Irregular Histories

Inference Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Inference & Efficiency

APO: Unsupervised Atomic Policy Optimization for 3D Structure Prediction of Atomic Systems

Inference Reinforcement Learning from Human Feedback (RLHF)

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN New Model Releases

MANTA: Multi-Agent Network Topology Adaptation for Self-Evolving Multi-Agent Systems

Inference Neural Network Retrieval-Augmented Generation (RAG)

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN Inference & Efficiency

Stage-Replay Divergence Follows the KV Cache: Fixed-Prefix Precision Controls and Bidirectional Cache Transplantation

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN New Model Releases

A Fuzzy Rule-based Neuro-Symbolic Approach for Pipe Severity Prediction in Sewer Networks

Inference Neural Network Transformer

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN Inference & Efficiency

Would You Walk to the Car Wash? Revealing the Salience Bias of Large Language Models in Commonsense Reasoning

Inference Neural Network

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN New Model Releases

Improving Mental Health Screening and Early Risk Detection in Spanish

Reinforcement Learning

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Inference & Efficiency

Towards Autonomous Aircraft Surveillance from Nanosatellites through On-Board Inference and Generative Data Augmentation

Inference Neural Network Retrieval-Augmented Generation (RAG)

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN New Model Releases

SVR: Self-Verifying Refinement via Joint Verdict-Confidence Reinforcement Learning for Adaptive Test-Time Compute

Inference Retrieval-Augmented Generation (RAG) Reinforcement Learning Software Engineering

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN New Model Releases

Machines that know they are aging: a framework for hardware-aware autonomous intelligence

Inference Neural Network Robotics

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN Training & Fine-tuning

Lightning OPD 2.0: Mitigating Style Bias in Cross-Teacher On-Policy Distillation for Large Reasoning Models

Fine-tuning

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN Developer Tools

QAdapt: A Noise-Adaptive Neural Pre-Decoding Framework for Quantum Error Correction

Deep Learning Fine-tuning Google

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Inference & Efficiency

When Derived Measurements Mislead: Quantifying and Mitigating LLM Over-Trust with Privileged-Modality Reliability Evidence

Inference Neural Network Reinforcement Learning from Human Feedback (RLHF)

Read original (arXiv cs.AI (Artificial Intelligence)) ↗