マルチモーダル｜AI/Tech動向まとめ

arXiv cs.LG (Machine Learning) · 2026-07-31 EN マルチモーダル

Differentially Private Nonparametric Modal Learning with Applications to Regression and Clustering

検索拡張生成 (RAG)

元記事を読む (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN 新モデル・リリース

The Theoretical Foundation of Socratic Tests: Dynamic, Multimodal, Conversational Examinations

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN マルチモーダル

WCM: A World Critic Model for Vision-Language-Action Reinforcement Learning

コンピュータビジョン機械学習ニューラルネットワーク強化学習

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN 新モデル・リリース

FriendBench: Benchmarking Dyadic Familiarity Inference in Humans and Multimodal Large Language Models

推論 (Inference) ニューラルネットワークソフトウェア工学音声処理

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN インフラ・ハードウェア

TraceViT: Grounded Trace Supervision for Visual Abstract Reasoning

ニューラルネットワーク

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN マルチモーダル

Sycophancy Undermines Epistemic Vigilance in Cooperative Vision-Language Tasks

コンピュータビジョン強化学習

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN 新モデル・リリース

AMTFV: Agentic Mathematical Tool-Flow Verification for LLM Self-Correction

DeepSeek Gemini GPT ソフトウェア工学

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

NVIDIA Developer Blog · 2026-07-31 EN 開発者ツール抜粋

NVIDIA Video Codec SDK 13.1: Zero-Copy Transcode, AV1 B-Frames, and Frame-Accurate Seek

NVIDIA、Video Codec SDK 13.1公開、ゼロコピー変換とAV1対応

コンピュータビジョン NVIDIA

NVIDIAはVideo Codec SDK 13.1をリリースした。ゼロコピー・トランスコード、AV1のBフレーム対応、フレーム単位の正確なシークなどを追加し、高品質動画処理の需要拡大に対応する。

元記事を読む (NVIDIA Developer Blog) ↗

arXiv cs.LG (Machine Learning) · 2026-07-31 EN 推論・効率化

Adaptive FastOPD: Progress-Aware Rollout Horizon Expansion for Efficient On-Policy Distillation

検索拡張生成 (RAG) 強化学習

元記事を読む (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN マルチモーダル

QR-Structured Thermal Triggers for Targeted Semantic Attacks on Infrared Vision-Language Models

コンピュータビジョン深層学習ソフトウェア工学

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN 推論・効率化

Beyond Retrieval: Analytic Memory for Multimodal Agents

AI エージェント推論 (Inference) Meta ニューラルネットワーク強化学習

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN マルチモーダル

Stable Autoregressive Speech Generation with Low-Frame-Rate High-Dimensional Continuous Tokens

音声処理

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN 新モデル・リリース

SeekBrain: An Autonomous Multi-Agent System for Accelerating Neuroscience Discovery

ニューラルネットワーク検索拡張生成 (RAG) 強化学習

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN 新モデル・リリース

MAGA: Multi-Platform Self-Fusion of GUI Agents via Structured Action Distillation

AI エージェントニューラルネットワーク検索拡張生成 (RAG)

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

Data Center Dynamics · 2026-07-31 EN マルチモーダル抜粋

Veolia to operate 350MW gas-powered microgrid for Ohio data center campus

Veolia、オハイオ州DC向け350MWガス火力マイクログリッドを運営へ

データセンターダイナミクスによると、Veoliaは米オハイオ州のデータセンターキャンパス向けに350MW規模のマイクログリッドを運営する。天然ガス発電を主軸に、蓄電池システム（BESS）で補完する構成で、大規模DCの安定電源を確保する。

元記事を読む (Data Center Dynamics) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN エージェント・ツール使用

Data Turnstile: A Scalable Open Framework for Function-Calling Data Generation

ニューラルネットワーク

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN マルチモーダル

When Model Priors Conflict with Visual Evidence: Mitigating Commonsense-Driven Hallucinations by Selective Prior Calibration

ソフトウェア工学

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-31 EN 学習・ファインチューニング

GALA: Generative Aligned Learning for Adaptive Multimodal Representation in the Taobao Shangou Recommender System

埋め込み (Embeddings) ファインチューニングニューラルネットワーク検索拡張生成 (RAG) 強化学習

元記事を読む (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN 新モデル・リリース

Knowing When to Quit: Diagnosing and Training LLMs to Abort Futile Reasoning

強化学習

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN 新モデル・リリース

Hy-MultiTurn: A Six-Dimensional Benchmark for Deep Multi-Turn Dialogue Understanding

AI エージェント深層学習 GPT ニューラルネットワーク強化学習

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN マルチモーダル

MoRAE: Flow-Friendly Self-Supervised Latents for Text-to-Motion Generation

深層学習

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN マルチモーダル

Faster but Different: Diagnosing and Controlling Content Drift in Accelerated Multimodal Diffusion Language Models

深層学習機械学習ニューラルネットワーク

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN 推論・効率化

Adjudicated Captioning: Multi-Agent Alignment Scoring and Consensus-Distilled Beam Arbitration for Strict Zero-Shot Image Captioning

深層学習推論 (Inference) Transformer

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN 推論・効率化

BLADE: Boundary-Expanded and Layer-Adaptive Dynamic Exit for Efficient LLM Reasoning

推論 (Inference) 検索拡張生成 (RAG) 強化学習ソフトウェア工学

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN 開発者ツール

TORUS: A Test of Rendering-Understanding Self-Coherence for Unified Audio Models

深層学習ニューラルネットワークソフトウェア工学音声処理

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN 推論・効率化

ReToken: One Token to Improve Vision-Language Models for Visual Retrieval

コンピュータビジョン埋め込み (Embeddings) 推論 (Inference)

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN マルチモーダル

OSReward: Instituting Standardized Evaluation for Cross-Platform Computer-Use Reward Models

AI エージェントコンピュータビジョン深層学習ニューラルネットワーク強化学習

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN マルチモーダル

Change2Task: From Repository Changes to Executable Coding Agent Tasks and Environments

AI エージェント検索拡張生成 (RAG)

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN マルチモーダル

VAD: Attributing Visual Evidence for Target Reconstruction in Multimodal On-Policy Distillation

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN 推論・効率化

MixFrag: Fragility-Guided Mixed-Precision Post-Training Quantization for Vision Transformers

コンピュータビジョン量子化検索拡張生成 (RAG) 強化学習 Transformer

元記事を読む (arXiv cs.LG (Machine Learning)) ↗