学習・ファインチューニング｜AI/Tech動向まとめ

arXiv cs.LG (Machine Learning) · 2026-07-31 EN 学習・ファインチューニング

The Parts Are Greater Than the Sum: Automated Task Sequencing for Efficient Training of Multi-Policy LLMs

ファインチューニング量子化

元記事を読む (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN 学習・ファインチューニング

LEMUR: Learning to Align with Multi-Objective Reinforcement Learning from Preference Feedback

AI エージェント強化学習

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-31 EN 開発者ツール

Ordered-to-disordered transfer learning with graph neural networks for formation-energy and HOMO-LUMO gap prediction in high-entropy perovskite oxides

ニューラルネットワーク

元記事を読む (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-31 EN 学習・ファインチューニング

Leveraging Transfer Learning with Class-Specific Decoders for Laparoscopic Segmentation

深層学習検索拡張生成 (RAG)

元記事を読む (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN 新モデル・リリース

Evidence-Type Competition: When Can Interventional Data Teach Language Models Causal Direction?

推論 (Inference) 強化学習

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-31 EN 学習・ファインチューニング

MoPET: Parameter-Efficient Mixture-of-Experts for Unified Medical Image Classification

深層学習ファインチューニング Mixture of Experts (MoE) 検索拡張生成 (RAG)

元記事を読む (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-31 EN 新モデル・リリース

Parameter-Free Heavy-Tailed Bandits

アルゴリズム・理論強化学習

元記事を読む (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN 学習・ファインチューニング

Explore Beyond the Boundary Using Entropic Information

AI エージェント検索拡張生成 (RAG) 強化学習

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-31 EN 学習・ファインチューニング

ALIVE: Warnings Before Exclusion in Budgeted Multi-Source Learning

元記事を読む (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN 学習・ファインチューニング

PTP: Previous-Token Prediction based LLM Inversion for Near-Exact Prompt Reconstruction

ファインチューニング

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-31 EN 学習・ファインチューニング

The Greedy Advantage in Finite-Horizon Bandits

アルゴリズム・理論

元記事を読む (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN 推論・効率化

Translation with Thought: Difficulty-Adaptive Reasoning via Reinforcement Learning for Multi-Domain Machine Translation

DeepSeek ファインチューニング GPT 推論 (Inference) ニューラルネットワーク

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN 学習・ファインチューニング

RecHarness: A Bandit-Routed Agentic Harness for Self-Evolving Recommender Systems

AI エージェント

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN インフラ・ハードウェア

Small Is Enough: Per-User Style Rewriting of AI-Edited Text via LoRA Adapters

推論 (Inference)

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-31 EN 学習・ファインチューニング

GALA: Generative Aligned Learning for Adaptive Multimodal Representation in the Taobao Shangou Recommender System

埋め込み (Embeddings) ファインチューニングニューラルネットワーク検索拡張生成 (RAG) 強化学習

元記事を読む (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN 推論・効率化

SAF-OPD: Stable Advantage Fusion for On-Policy Distillation

ニューラルネットワーク検索拡張生成 (RAG) 強化学習

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN 学習・ファインチューニング

Learning Latent Reasoning Traces for Scalar Reward Models End-to-End

検索拡張生成 (RAG) 強化学習人間のフィードバックによる強化学習 (RLHF)

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

ITmedia AI+ · 2026-07-31 JA 学習・ファインチューニング抜粋

Thinking Machines、軽量モデル「Inkling-Small」正式公開　サイズ4分の1で「Inkling」に匹敵する性能

Thinking Machines、軽量モデル「Inkling-Small」公開、1/4サイズで同等性能

強化学習

Thinking Machines Labは、オープンウェイトのAIモデル「Inkling-Small」正式版を公開した。従来モデルの4分の1のサイズながら、データ改良や強化学習によりコード生成などで「Inkling」に匹敵する性能を実現したとしている。

元記事を読む (ITmedia AI+) ↗

Simon Willison's Weblog · 2026-07-30 EN 新モデル・リリース抜粋

llm 0.32rc2

Simon Willison、llm 0.32rc2公開―既定モデルをGPT-5.6 Lunaに

GPT 機械学習ニューラルネットワーク OpenAI 人間のフィードバックによる強化学習 (RLHF)

Simon Willison氏がCLIツールllmの0.32rc2を公開した。依存関係の問題を修正するとともに、既定モデルを未設定のユーザー向けに従来のGPT-4o miniから、より新しく高性能なGPT-5.6 Lunaへ変更した。Lunaはやや高価だが大きな改善という。

元記事を読む (Simon Willison's Weblog) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN 安全性・評価

Inducing language models to assert their own consciousness restores human beliefs and values

ファインチューニング

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

Publickey · 2026-07-30 JA 新モデル・リリース抜粋

JetBrains、AIが少ないトークンでコンテキストを取得しやすく、よりよいコード生成を可能にする「JetBrains Context」発表

JetBrains、AIエージェント向け「JetBrains Context」発表、少トークンで文脈提供

AI エージェント機械学習

JetBrainsは、コードリポジトリの上に知的レイヤを構築する新サービス「JetBrains Context」を発表した。AIエージェントに対して適切なコードのコンテキストを少ないトークンで提供することで、より良いコード生成を可能にするという。

元記事を読む (Publickey) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN 新モデル・リリース

Frontis-MA1: Training an AI4AI Model towards Recursive Self-Improvement in Machine Learning Engineering

ファインチューニング GPT 機械学習 Meta 検索拡張生成 (RAG)

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN 推論・効率化

APO: Unsupervised Atomic Policy Optimization for 3D Structure Prediction of Atomic Systems

推論 (Inference) 人間のフィードバックによる強化学習 (RLHF)

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN 新モデル・リリース

Same Graph Cross-Task Transfer in GNNs: Protocols and Predictors

ニューラルネットワーク検索拡張生成 (RAG) 強化学習

元記事を読む (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN 新モデル・リリース

Improving Mental Health Screening and Early Risk Detection in Spanish

強化学習

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN 学習・ファインチューニング

Cybersecurity Detection Classification with Reasoning-enabled Language Models

強化学習人間のフィードバックによる強化学習 (RLHF)

元記事を読む (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN 学習・ファインチューニング

Lightning OPD 2.0: Mitigating Style Bias in Cross-Teacher On-Policy Distillation for Large Reasoning Models

ファインチューニング

元記事を読む (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN 新モデル・リリース

Oracle-Budgeted Molecular Optimization with Short-Term Graph Memory

深層学習検索拡張生成 (RAG)

元記事を読む (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN 開発者ツール

QAdapt: A Noise-Adaptive Neural Pre-Decoding Framework for Quantum Error Correction

深層学習ファインチューニング Google

元記事を読む (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN 推論・効率化

When Derived Measurements Mislead: Quantifying and Mitigating LLM Over-Trust with Privileged-Modality Reliability Evidence

推論 (Inference) ニューラルネットワーク人間のフィードバックによる強化学習 (RLHF)

元記事を読む (arXiv cs.AI (Artificial Intelligence)) ↗