Training & Fine-tuning｜AI/Tech News Trends

arXiv cs.LG (Machine Learning) · 2026-07-31 EN Training & Fine-tuning

The Parts Are Greater Than the Sum: Automated Task Sequencing for Efficient Training of Multi-Policy LLMs

Fine-tuning Quantization

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Training & Fine-tuning

LEMUR: Learning to Align with Multi-Objective Reinforcement Learning from Preference Feedback

AI Agents Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-31 EN Developer Tools

Ordered-to-disordered transfer learning with graph neural networks for formation-energy and HOMO-LUMO gap prediction in high-entropy perovskite oxides

Neural Network

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-31 EN Training & Fine-tuning

Leveraging Transfer Learning with Class-Specific Decoders for Laparoscopic Segmentation

Deep Learning Retrieval-Augmented Generation (RAG)

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN New Model Releases

Evidence-Type Competition: When Can Interventional Data Teach Language Models Causal Direction?

Inference Reinforcement Learning

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-31 EN Training & Fine-tuning

MoPET: Parameter-Efficient Mixture-of-Experts for Unified Medical Image Classification

Deep Learning Fine-tuning Mixture of Experts (MoE) Retrieval-Augmented Generation (RAG)

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-31 EN New Model Releases

Parameter-Free Heavy-Tailed Bandits

Algorithms & Theory Reinforcement Learning

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Training & Fine-tuning

Explore Beyond the Boundary Using Entropic Information

AI Agents Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-31 EN Training & Fine-tuning

ALIVE: Warnings Before Exclusion in Budgeted Multi-Source Learning

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN Training & Fine-tuning

PTP: Previous-Token Prediction based LLM Inversion for Near-Exact Prompt Reconstruction

Fine-tuning

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-31 EN Training & Fine-tuning

The Greedy Advantage in Finite-Horizon Bandits

Algorithms & Theory

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Inference & Efficiency

Translation with Thought: Difficulty-Adaptive Reasoning via Reinforcement Learning for Multi-Domain Machine Translation

DeepSeek Fine-tuning GPT Inference Neural Network

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Training & Fine-tuning

RecHarness: A Bandit-Routed Agentic Harness for Self-Evolving Recommender Systems

AI Agents

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Infrastructure & Hardware

Small Is Enough: Per-User Style Rewriting of AI-Edited Text via LoRA Adapters

Inference

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-31 EN Training & Fine-tuning

GALA: Generative Aligned Learning for Adaptive Multimodal Representation in the Taobao Shangou Recommender System

Embeddings Fine-tuning Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Inference & Efficiency

SAF-OPD: Stable Advantage Fusion for On-Policy Distillation

Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN Training & Fine-tuning

Learning Latent Reasoning Traces for Scalar Reward Models End-to-End

Retrieval-Augmented Generation (RAG) Reinforcement Learning Reinforcement Learning from Human Feedback (RLHF)

Read original (arXiv cs.CL (Computation and Language)) ↗

ITmedia AI+ · 2026-07-31 JA Training & Fine-tuning extract

Thinking Machines、軽量モデル「Inkling-Small」正式公開　サイズ4分の1で「Inkling」に匹敵する性能

Thinking Machines releases Inkling-Small, matching Inkling at 1/4 the size

Reinforcement Learning

Thinking Machines Lab released the final version of Inkling-Small, an open-weight AI model. At a quarter the size of its predecessor, the company says data improvements and reinforcement learning let it match the larger Inkling on tasks such as code generation.

Read original (ITmedia AI+) ↗

Simon Willison's Weblog · 2026-07-30 EN New Model Releases extract

llm 0.32rc2

llm 0.32rc2 switches its default model to GPT-5.6 Luna

GPT Machine Learning Neural Network OpenAI Reinforcement Learning from Human Feedback (RLHF)

Simon Willison released llm 0.32rc2, fixing a dependency issue and changing the default model for users who have not set one from GPT-4o mini to the newer, more capable GPT-5.6 Luna. Luna is slightly more expensive but a notable upgrade.

Read original (Simon Willison's Weblog) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN Safety & Evaluation

Inducing language models to assert their own consciousness restores human beliefs and values

Fine-tuning

Read original (arXiv cs.CL (Computation and Language)) ↗

Publickey · 2026-07-30 JA New Model Releases extract

JetBrains、AIが少ないトークンでコンテキストを取得しやすく、よりよいコード生成を可能にする「JetBrains Context」発表

JetBrains unveils 'JetBrains Context' to feed AI agents code context efficiently

AI Agents Machine Learning

JetBrains announced JetBrains Context, a service that builds an intelligence layer over code repositories. By supplying AI agents with the right code context using fewer tokens, it aims to enable better code generation from agentic coding tools.

Read original (Publickey) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN New Model Releases

Frontis-MA1: Training an AI4AI Model towards Recursive Self-Improvement in Machine Learning Engineering

Fine-tuning GPT Machine Learning Meta Retrieval-Augmented Generation (RAG)

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Inference & Efficiency

APO: Unsupervised Atomic Policy Optimization for 3D Structure Prediction of Atomic Systems

Inference Reinforcement Learning from Human Feedback (RLHF)

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN New Model Releases

Same Graph Cross-Task Transfer in GNNs: Protocols and Predictors

Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning