マルチモーダル (2 / 5 ページ)｜AI/Tech動向まとめ

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN 開発者ツール

DualG-MRAG: Decoupling Macro-Reasoning and Micro-Matching for Multimodal Retrieval-Augmented Generation

ニューラルネットワーク検索拡張生成 (RAG)

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN マルチモーダル

ScaFE: Data-Efficient Scar Classification with LLM-Generated Clinical Feature Programs

コンピュータビジョン

元記事を読む (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN 新モデル・リリース

Same Graph Cross-Task Transfer in GNNs: Protocols and Predictors

ニューラルネットワーク検索拡張生成 (RAG) 強化学習

元記事を読む (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN マルチモーダル

A report-grounded vision-language foundation model for colonoscopy from 280000 routine reports

コンピュータビジョンニューラルネットワーク

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN 推論・効率化

When Derived Measurements Mislead: Quantifying and Mitigating LLM Over-Trust with Privileged-Modality Reliability Evidence

推論 (Inference) ニューラルネットワーク人間のフィードバックによる強化学習 (RLHF)

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN 推論・効率化

Why Are GUI Agents Correct but Late? Decode on the Decision-Time Critical Path, Tested with Pre-Compiled Policy Trees

AI エージェント深層学習ニューラルネットワーク人間のフィードバックによる強化学習 (RLHF) ソフトウェア工学

元記事を読む (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN マルチモーダル

HyperClaim: Fine-Grained Cross-Modal Hypergraph Reasoning for Video Misinformation Detection

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN 新モデル・リリース

LEDGERMIND: Provenance-Constrained Multimodal Agentic Reasoning with a Structured Evidence Ledger

AI エージェントニューラルネットワーク強化学習ソフトウェア工学

元記事を読む (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN マルチモーダル

Correcting What You Cannot See: Credit Assignment for Perception Distillation in Multimodal Reasoners

ニューラルネットワーク検索拡張生成 (RAG) ソフトウェア工学

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

Google DeepMind Blog · 2026-07-30 EN マルチモーダル抜粋

Gemini Robotics ER 2: powering robotics with video understanding, task orchestration, and multi-robot collaboration

DeepMind、映像理解と多ロボット連携のGemini Robotics ER 2を発表

Gemini 強化学習ロボティクス

DeepMindは、ロボット向けモデルGemini Robotics ER 2を発表した。映像理解、タスクの分解・調整、複数ロボットの協調を強化し、ロボットが現実世界の課題を推論しながら協力して解決できるようにする段階的な進歩と位置づける。

元記事を読む (Google DeepMind Blog) ↗

Sakana AI Blog (ja) · 2026-07-30 EN 開発者ツール抜粋

From Japan, Products the World Will Use: An Interview with Sakana AI's Head of Product Development

Sakana AI製品開発責任者、世界で使われる日本発プロダクトを語る

ニューラルネットワーク強化学習

Sakana AIの製品開発責任者へのインタビュー記事。日本発で世界に使われるプロダクトを生み出す狙いや、同社の製品開発の考え方が語られている。国内AIスタートアップの製品戦略を示す内容となっている。

元記事を読む (Sakana AI Blog (ja)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN マルチモーダル

PathView-Bench: Can Multimodal Large Language Models Achieve Fine-grained Multiscale Understanding of Pathology Images?

機械学習ニューラルネットワークソフトウェア工学

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN 新モデル・リリース

ObjectStream: Latent Objects as Memory Anchors for Streaming Video Understanding

強化学習

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN マルチモーダル

Theia: Large-Scale Multimodal Captioning and Automated Validation of the Incidents1M Dataset for Data-Free Distillation

コンピュータビジョン Mixture of Experts (MoE) ニューラルネットワーク検索拡張生成 (RAG) 強化学習

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN 推論・効率化

Understanding Is Done Early: A Depth Division of Labor in Large Language Models and Its Use for Unbounded-Context Memory

深層学習機械学習 NVIDIA ソフトウェア工学 Transformer

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN 新モデル・リリース

Qwen-UI-Agent Technical Report: Toward Next-Generation Real-World Centric Foundation GUI Agents

AI エージェント Gemini GPT ニューラルネットワーク強化学習

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN 新モデル・リリース

Old Tricks, New Models: How Simple Image Transformations Break Modern AI-based Content Moderation

検索拡張生成 (RAG)

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN 推論・効率化

AgenticASR: Refining Speech Recognition in Real-World Scenarios via an Agentic Approach

深層学習推論 (Inference) ニューラルネットワーク強化学習音声処理

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN 新モデル・リリース

Where and When to Commit: Candidate-Aware Decoding for Diffusion Language Models

コンピュータビジョン深層学習強化学習ソフトウェア工学

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN 新モデル・リリース

RRM: Experience-Driven Reflective Retrieval Memory for Long-Horizon Multimodal Reasoning

AI エージェント深層学習ソフトウェア工学

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN 推論・効率化

OPLD: On-Policy Latent Distillation for Multimodal Reasoning

強化学習

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN 推論・効率化

Group-Reflective Self-Distillation for Agentic Reinforcement Learning

AI エージェント強化学習

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN 推論・効率化

Flux-OPD: On-Policy Distillation with Evolving Contexts

元記事を読む (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN 推論・効率化

TAPO: Transition-Aware Policy Optimization for LLM Agents

AI エージェントアルゴリズム・理論推論 (Inference) 強化学習

元記事を読む (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN 新モデル・リリース

AutoSupervision: Closing the Feedback Loop in Scientific Workflows with Grounded Revision Verification

GPT 検索拡張生成 (RAG)

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN 新モデル・リリース

Semantic-Aligned Structural Abstraction for Multimodal Sentiment Analysis

検索拡張生成 (RAG)

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN インフラ・ハードウェア

Gradient-free Task-Conditioned Retrieval for On-Device In-Context Learning

推論 (Inference) Llama

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN マルチモーダル

Can LVLMs Uncover the Truth Behind Visual Illusions? An Analysis of Perceptual and Reasoning Capabilities

ニューラルネットワーク強化学習ソフトウェア工学

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN マルチモーダル

DualAnchor: Preserving Language Priors and Improving Lexical Fidelity in Gloss-Free Sign Language Translation

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

Hacker News (Front Page) · 2026-07-29 EN マルチモーダル抜粋

The coolest use for the Vision Pro

iOS開発者C.Selig、Vision Proの最良の使い道を個人ブログで紹介

Apollo（Redditクライアント）で知られるiOS開発者Christian Selig氏が、個人ブログでApple Vision Proの「最も優れた使い道」を紹介する記事。URLスラッグ（vision-pro-house）から住宅・空間の可視化に関する活用例とみられるが、raw_excerptが空のため具体的な用途・手順・機能は本文未取得で確認不可。attention 0.5・categories=[multimodal]の個人系トピックとしてexport対象となった。

元記事を読む (Hacker News (Front Page)) ↗