Multimodal (Page 3 of 5)｜AI/Tech News Trends

arXiv cs.CL (Computation and Language) · 2026-07-29 EN Training & Fine-tuning

DenseOn with the LateOn: Fully Open Dense and Late-Interaction Models for Multilingual, Long-Context, and Code Search

Fine-tuning Machine Learning Retrieval-Augmented Generation (RAG)

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-29 EN Multimodal

Anatomy Contextualized Adaption of CT Foundation Models

Computer Vision Embeddings Reinforcement Learning Transformer

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-29 EN Multimodal

DLAM: Distributional Latent Actions with Temporal Constraints

Computer Vision Deep Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-29 EN New Model Releases

Equilibrium Training of Energy-Based Models with Parallel Trajectory Tempering

Neural Network Reinforcement Learning

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-29 EN Multimodal

Visual Credit Audit for Multimodal Spatial Reasoning

Machine Learning Neural Network Software Engineering

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-29 EN New Model Releases

SciFigAlign: Scoring Scientific Figures by Fine-tuned Alignment of Visuals with Manuscript Evidence

Machine Learning Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-29 EN Multimodal

What Can Latent World Models Know? Physical Parameter Identifiability in Multimodal Predictive Representations

Retrieval-Augmented Generation (RAG) Reinforcement Learning Software Engineering

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-29 EN Multimodal

Foundation Models for Face Presentation Attack Detection: A Unified Linear-Probing Benchmark

Computer Vision Neural Network Transformer

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-29 EN New Model Releases

Progressive Multimodal Alignment for Continual Instruction Tuning

Deep Learning Embeddings Machine Learning Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-29 EN New Model Releases

Dual-Path LLM Reasoning for Multimodal Few-Shot Knowledge Graph Completion

Reinforcement Learning

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-29 EN New Model Releases

Hearsay: Vision-Language Medical Diagnoses Without an Image

Claude Computer Vision Gemini GPT Retrieval-Augmented Generation (RAG)

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-29 EN Multimodal

Amortized Moment Matching for Visual Generation

Neural Network

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-29 EN New Model Releases

See2Think: Do Multimodal Models Really Use Intermediate Visual States?

Inference Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning Software Engineering

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-29 EN Multimodal

Multimodal fusion of visual and morphometric features for avian bone classification

Neural Network Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-29 EN Multimodal

Zero-Shot Face-to-Speech Synthesis via Latent Space Adaptation of a Style-Diffusion TTS Model

Speech Processing

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-29 EN Multimodal

Dual Inversion for Text-to-Image Diffusion Models: From Both Prompt and Noise Perspectives

Computer Vision

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-29 EN Inference & Efficiency

Where Detectors Fail: Closing the Tail-Domain Gap with Expert-Guided Mutual Distillation

Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-29 EN New Model Releases

Learning Dynamic User Personas from Implicit Interaction Streams via Iterative Refinement

Reinforcement Learning

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-29 EN New Model Releases

Diagnosing Fine-Grained Inconsistency Classification in Financial Disclosure Text

Embeddings GPT

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-29 EN Multimodal

Symphony of Bias: Exploring Gender Associations with Musical Instruments in Multimodal LLMs

Neural Network Reinforcement Learning

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-28 EN Inference & Efficiency

Pass the Baton: Trajectory-Relayed On-Policy Distillation

Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-28 EN Multimodal

$π\mathbf{R}^2$: Reactive Real-time Flow Policies

Computer Vision Neural Network

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-28 EN New Model Releases

Re-thinking Mammography Transfer Learning: The Dataset-Informed Transfer Learning (DITL) Framework for Breast Cancer Screening and Lesion Diagnosis

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-28 EN Multimodal

VetClaw: An Edge-Cloud Multimodal Agentic System for Veterinary Disease Screening

Computer Vision Deep Learning Reinforcement Learning

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-28 EN Multimodal

Reinformed Dreamer: An Asymmetric World Model Efficiently Trained through Latent Guidance

Algorithms & Theory Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-28 EN Multimodal

CHARM: A Multimodal Graph Foundation Model with Hierarchical Context Modeling for Zero-Shot Transfer

Fine-tuning Neural Network Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-28 EN Inference & Efficiency

MDTransformer: A Hardware-Software Co-Design of Mode-Division Photonic Transformer Accelerator with Inverse-Designed Coherent Crossbar

Inference Quantization Retrieval-Augmented Generation (RAG) Transformer

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-28 EN Inference & Efficiency

Parallel Decoding Distillation for Fast Image and Video Generation

Inference

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-28 EN New Model Releases

Untangling Co-Drift: Proactive Multi-Intent Failure Prediction and Root-Cause Disambiguation for Self-Driving Networks

Mixture of Experts (MoE)

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-28 EN Multimodal

Knowledge-Guided Multimodal Reasoning over Interacting Streams for Video-Level Ambivalence and Hesitancy Recognition

Meta Neural Network

Read original (arXiv cs.AI (Artificial Intelligence)) ↗