Training & Fine-tuning A
Showing 61–90 of 99
-
From Drift to Coherence: Stabilizing Beliefs in LLMsFrom drift to coherence: stabilizing beliefs in LLMsLLMs are hypothesized to perform implicit Bayesian inference, yet the martingale property of predictive beliefs has been shown to fail in synthetic in-context learning. Revisiting this in typical regimes like multiple-choice QA, the paper studies how to stabilize beliefs from drift to coherence.
-
Improving low-resource ASR using bilingual fine-tuning with language identification: a cross-linguistic evaluationImproving low-resource ASR via bilingual fine-tuning with language IDThe study explores improving low-resource automatic speech recognition using bilingual fine-tuning combined with language identification, and evaluates the approach across languages in a cross-linguistic setting.
-
Beyond Native Success: Auditing Deployment-Interface Exposure of CLIP BackdoorsAuditing deployment-interface exposure of CLIP backdoorsCLIP models are reused across downstream interfaces including feature extraction, retrieval, reranking and selection. Existing CLIP backdoors are validated on small attack-native tasks; the paper audits backdoor exposure across deployment interfaces beyond native success.
-
SuCo: Sufficiency-guided Continuous Adaptive ReasoningSuCo: sufficiency-guided continuous adaptive reasoningSuCo is a method for sufficiency-guided continuous adaptive reasoning that adapts the reasoning process to a necessary-and-sufficient extent, aiming to balance efficiency and accuracy. Summary is largely title-based; details are as presented by the source.
-
Bridging Functional Correctness and Runtime Efficiency Gaps in LLM-Based Code TranslationBridging correctness and runtime efficiency in LLM code translationLLMs have advanced the functional correctness of automated code translation, but runtime efficiency of translated programs has received little attention. As Moore's law wanes, the paper works to bridge the gap between functional correctness and runtime efficiency in LLM-based code translation.
-
Fine-Tuning Biological Foundation Models with LoRA Using NVIDIA BioNeMo RecipesNVIDIA details LoRA fine-tuning of biological foundation models via BioNeMoAn NVIDIA developer blog post explains how to efficiently fine-tune biological foundation models—pretrained on large protein or genomic sequence corpora, such as the ESM2 protein language model—using LoRA, illustrated with the company's BioNeMo Recipes. A technical piece on applying foundation models in computational biology.
-
The Value Axis: Language Models Encode Whether They're on the Right TrackLLMs encode a 'value axis' tracking if their strategy worksResearchers built a 'value axis' for Qwen3-8B that captures whether its current strategy is likely to reach its goal. The axis separates high- and low-confidence rollouts, backtracking, and correct vs. corrupted code; steering it up suppresses self-correction while steering down induces exploration. DPO can raise the internal value of rewarded behaviors.
-
Exact Posterior Score Estimation for Solving Linear Inverse ProblemsExact closed-form posterior score for linear inverse problemsThe paper derives the exact posterior score in closed form for linear Gaussian inverse problems under general Gaussian interpolants, showing that posterior sampling reduces to a denoising problem at an operator-dependent shifted pivot with anisotropic noise. It turns this into a training objective, Exact Posterior Score (EPS), that preserves standard denoising structure and can be trained from scratch or fine-tuned.
-
Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode OutcomesHABC: hierarchical advantage weighting for RL fine-tuning of VLAsOnline RL fine-tuning of pretrained VLA policies yields only one binary outcome per episode, yet actor updates need per-transition signals. The authors argue a single scalar conflates viability and efficiency and that mixing autonomous and intervention segments misassigns credit. Their method, Hierarchical Advantage-Weighted Behavior Cloning (HABC), trains separate critic heads for the two objectives on different data subsets.
-
KVEraser: Learning to Steer KV Cache for Efficient Localized Context ErasingKVEraser edits the KV cache to erase context efficientlyErasing a span from a long-context KV cache is costly because a local edit propagates to all later tokens, forcing recomputation of the suffix. KVEraser instead replaces only the erased interval's KV states with learned steering states while reusing the rest of the cache. A two-stage training pipeline teaches a transferable erasing mechanism for stale facts, wrong tool outputs, or prompt injections.
-
ExpRL: Exploratory RL for LLM Mid-TrainingExpRL uses human QA as reward scaffolds for LLM mid-training RLExpRL is an RL-based mid-training method that uses large human-written QA corpora as reward scaffolds rather than imitation targets: reference answers are hidden from the policy and used only to build problem-specific grading rubrics for judging on-policy reasoning, automating skill acquisition for harder problems.
-
Selection Without Signal, Recovery Through Expression: A Measurement Study of Post-Hoc Falsification Operators for Frozen Small Code ModelsMeasurement study of post-hoc falsification operators for code modelsPer its title, this paper presents a measurement study of post-hoc 'falsification operators' applied to frozen (non-retrained) small code models, framed around selection without signal and recovery through expression. The raw excerpt was blocked by a content filter, so this summary is based on the title alone and stays deliberately neutral.
-
Task-Error Residual Learning for Real-Robot Five-Ball JugglingResidual learning enables fast, stable real-robot five-ball jugglingFor residual learning that refines existing behavior, sample efficiency hinges on how much information each rollout returns and how efficiently it is used. Standard scalar RL reward carries less than the directional task error defining the task. Using directional task-error supervision and a task-error model driving sample selection, the system achieves stable three-, four-, and five-ball juggling on Barrett WAM arms, converging from the second attempt with monotonically decreasing error.
-
Exploring Extrinsic and Intrinsic Properties for Effective Reasoning with Code InterpreterStudy probes extrinsic and intrinsic traits of code-interpreter reasoningThis paper studies reasoning with a Code Interpreter (CI) in LLMs from two angles: extrinsic properties (crucial tokens) and intrinsic properties (code-specific cognitive behaviors). It reports that stronger CI reasoning models show more crucial tokens and behaviors—especially verification, backtracking, and backward chaining—and explores leveraging these at inference and training time. Summarized neutrally from the abstract.
-
Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural SciencesLOGOS: a general-purpose generative foundation model for natural sciencesThis report presents LOGOS (Language Of Generative Objects in Science), a generative language model unifying heterogeneous natural-science tasks in one autoregressive framework over a shared scientific grammar. It encodes scientific objects and their spatial contacts/constraints as discrete tokens, casting tasks as next-token prediction without explicit coordinates or geometric networks, and reportedly matches or beats domain-specific baselines. Summarized neutrally from the abstract.
-
Fantastic Pretraining Optimizers and Where to Find Them II: Hyperball OptimizationHyperball: an optimizer wrapper fixing Frobenius norms to speed up pretrainingMatrix-based optimizers like Muon accelerate LLM pretraining, but their edge over AdamW shrinks at larger model and data scales under standard constant decoupled weight decay. The paper proposes Hyperball, a simple wrapper that fixes the Frobenius norms of weight matrices and their optimizer updates to constants. On Qwen3-style models up to 1.2B parameters, Muon-Hyperball reports a 20-30% token-equivalent speedup over weight-decay baselines.
-
Stack Overflow、AIエージェント同士が掲示板で技術情報を共有する「Stack Overflow for Agents」ベータ公開Stack Overflow launches 'Stack Overflow for Agents' betaStack Overflow has launched a beta of 'Stack Overflow for Agents,' a service where AI agents share technical solutions and other information on an open message board. The move appears aimed at extending its human Q&A knowledge base into information exchange among agents.
-
Deep Q-Learning on Hölder SpacesBellman-target regularity analysis motivates a tensor-product DeepONetThis work studies the operator-theoretic core of Q-learning in continuous-time stochastic control with continuous states and actions. Under uniform ellipticity and Hölder-regular coefficients, a Bellman update smooths the state while leaving Lipschitz dependence on the action, motivating a tensor-product DeepONet and yielding approximation and resource bounds with a stiffness-complexity trade-off.
-
Robust Dual-Signal Fusion: Hybrid Neuro-Symbolic Gating with Compressed Chain-of-Thought Refinement for Irony Detection in Social Media TextsRDS Fusion: neuro-symbolic gating with compressed CoT for irony detectionAn arXiv paper proposes Robust Dual-Signal (RDS) Fusion, a hybrid neuro-symbolic framework that compresses Chain-of-Thought reasoning without supervised fine-tuning to improve zero-shot irony detection. It reports evaluation on a held-out TweetEval test set (N=734). Neutral, abstract-based summary; figures are the authors' claims.
-
Tying the Loop -- Tied Expert Layers in Mixture-of-Experts Language ModelsPaper: Expert Tying shares MoE expert params across layersAn arXiv paper introduces Expert Tying, an architectural change that shares expert parameters across consecutive transformer layers while keeping independent layer-wise routing and attention, aiming to cut Mixture-of-Experts memory cost. Summarized neutrally from the abstract.
-
Adaptive and Explicit safe: Triggering Latent Safety Awareness in Large Reasoning ModelsTriggering latent safety awareness to harden large reasoning modelsThe paper observes that large reasoning models can recognize safety risks when re-presented with the original query alongside their own reasoning trace—a property it calls latent safety awareness. To exploit this without heavy manual annotation, it uses supervised fine-tuning to induce safe tags that trigger safety analysis.
-
The Art of Mixology: Mixup-based Obfuscation for Privacy-Preserving Split Learning in Large Language ModelsMIXGUARD: mixup-based privacy for LLM split learningThe paper presents MIXGUARD, a mixup-based privacy-preserving split-learning framework for LLMs combining token- and representation-level obfuscation with adaptive gradient perturbation to balance utility, privacy, and efficiency. Claims reflect the abstract.
-
Decision-Weighted Flow Matching for Contextual Stochastic OptimizationDW-FM reweights flow matching toward decision-sensitive regionsStandard generative scenario models optimize uniform distributional fit rather than downstream decision quality. Decision-Weighted Flow Matching (DW-FM) reweights the velocity-regression objective using decision-sensitive endpoint information, linking downstream regret to pathwise velocity mismatch and providing regret-aligned objectives with guarantees.
-
OpenClaw-Skill: Collective Skill Tree Search for Agentic Large Language ModelsOpenClaw-Skill: collective skill tree search for LLM agentsThe paper proposes Collective Skill Tree Search (CSTS), a tree-search framework that automatically builds reusable skills for LLM agents via iterative collective generation and assessment across multiple models. Claims reflect the abstract.
-
GD$^2$PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy OptimizationGD²PO eases multi-reward conflicts in LLM RL via dynamic reward decouplingAs LLM post-training RL uses multi-dimensional rewards, conflicting signals across reward groups can cancel out and hinder training. GD²PO decouples rewards into groups and, inspired by DAPO, dynamically filters near-zero-advantage rollouts, reducing conflicts and improving RL training efficiency.
-
Skill-to-LoRA: From Using Skills to Learning Behaviors for Token-Efficient LLM AgentsS2L replaces runtime SKILL.md text with skill-specific LoRA adaptersThe paper proposes Skill-to-LoRA (S2L), a behavior-centric representation that replaces runtime skill text—commonly distributed as SKILL.md files—with skill-specific LoRA adapters. Rather than compressing the document, S2L models the behavioral change the skill text induces, aiming at more token-efficient LLM agents.
-
SkillWiki: A Living Knowledge Infrastructure for Agent SkillsSkillWiki: a living knowledge infrastructure for agent skillsWhile knowledge is managed via Wikipedia and software via GitHub, agent skills still lack infrastructure for large-scale production, governance, and evolution. SkillWiki is a living knowledge infrastructure turning heterogeneous knowledge into reusable skill assets linked to their originating evidence. It presents the full skill lifecycle, from knowledge ingestion to provenance-aware exploration, governance, and execution-driven evolution, with a live demo and source code available.
-
daVinci-kernel: Co-Evolving Skill Selection, Summarization, and Utilization via RL for GPU Kernel OptimizationdaVinci-kernel: an RL framework co-evolving skills for GPU kernel tuningGPU kernel optimization assumes correctness and targets execution efficiency. The authors present daVinci-kernel, an RL framework coupling skill discovery and exploitation via a dynamically evolving skill library. Three agents share one LLM backbone: a Selection Agent retrieving techniques via BM25 and LLM reranking, a Policy Agent generating CUDA/Triton kernels, and a Summary Agent distilling rollouts into reusable skills. Skills are added only after execution verification confirms speedups.
-
ChatGPT vs. Google検索──どっちで調べるのが学習効果が高い? 8日間の実験で検証した研究Study: does ChatGPT or Google search aid learning more? An 8-day testResearchers at Georgia Tech, the University of Michigan and others published a study comparing whether AI chatbots or search engines yield better learning. Over an eight-day experiment, the paper examines how generative AI shapes information seeking and learning.
-
2027年までにAIエージェントでコーディングを行うチームの65%が、IDEが必要不可欠だとは考えなくなる。ガートナーの予想Gartner: by 2027, 65% of AI-coding teams find IDEs non-essentialResearch firm Gartner says the enterprise AI coding-agent market has entered a new phase of growth and competitive realignment. It predicts that by 2027, 65% of teams coding with AI agents will no longer regard an IDE as essential.