Training & Fine-tuning A

Showing 1–30 of 100
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    Probe-and-Refine Tuning of Repository Guidance for Coding Agents
    Probe-and-Refine: tuning repository guidance for coding agents
    AI Agents Fine-tuning Retrieval-Augmented Generation (RAG) Software Engineering
    The paper presents Probe-and-Refine, a method for tuning the repository guidance (such as AGENTS.md files) that LLM-based coding agents rely on. It targets the higher-level operational knowledge—file layout, test workflows, and error-prone patterns—that is not contained in the code itself.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    FreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA Mining
    FreeStyle: dual-reference style-content control via community LoRA mining
    Retrieval-Augmented Generation (RAG)
    Style-content dual-reference generation aims to synthesize an image that preserves structure while adopting a reference style. FreeStyle leverages community LoRA mining to give free control over style and content.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    Calibration Without Comprehension: Diagnosing the Limits of Fine-Tuning LLMs for Vulnerability Detection in Systems Software
    Diagnosing whether fine-tuned LLMs comprehend software vulnerabilities
    Fine-tuning Neural Network Reinforcement Learning
    It is unclear whether LLMs that score well on vulnerability benchmarks truly reason about security or merely pattern-match. This work diagnoses the limits of fine-tuning LLMs for vulnerability detection in systems software.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Training & Fine-tuning extract
    Your Mouse and Eyes Secretly Leak Your Preference: LLM Alignment using Implicit Feedback from Users
    Aligning LLMs with implicit user feedback from mouse and gaze
    Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning Reinforcement Learning from Human Feedback (RLHF)
    The paper proposes aligning large language models using implicit user signals—such as mouse and eye movements—instead of explicit human feedback. It addresses the limitation that users rarely provide explicit ratings, which makes high-quality preference data scarce for reward modeling.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Multimodal extract
    Scalable Training of Spatially Grounded 2D Vision-Language Models for Radiology
    RefRad2D: training spatially grounded radiology VLMs at scale
    Computer Vision Fine-tuning Neural Network Software Engineering
    The paper studies how to train spatially grounded vision-language models for radiology without manual spatial annotations. It introduces RefRad2D, a large-scale bilingual (German/English) dataset of 1.2M CT and MR image-text pairs derived from clinical practice, with VQA and spatial grounding subsets.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN Developer Tools extract
    Evolutionary Two-Stage Hyperparameter Optimization Strategies for Physics-Informed Neural Networks
    Evolutionary two-stage hyperparameter optimization for PINNs
    Algorithms & Theory Deep Learning Embeddings Neural Network
    The paper proposes evolutionary two-stage hyperparameter optimization strategies for physics-informed neural networks (PINNs). It targets PINNs' unstable convergence, training plateaus, and strong sensitivity to architectural and optimization hyperparameters arising from their highly non-convex training.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    DataMagic: Transforming Tabular Data into Data Insight Video
    DataMagic: turning tabular data into data-insight videos
    Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning
    Data videos combine dynamic charts, voice narration, and synchronized animation to convey insights. DataMagic automatically transforms tabular data into such data-insight videos.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Infrastructure & Hardware extract
    Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe
    Rethinking shrinkage bias in LLM FP4 pretraining with a UFP4 recipe
    Mixture of Experts (MoE) NVIDIA Quantization
    FP4 training promises large memory and compute savings for LLM pretraining but suffers from shrinkage bias. This paper analyzes its geometric origin and systemic impact and proposes a UFP4 recipe to address it.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Inference & Efficiency extract
    AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning
    AutoPass: evidence-guided LLM agents for compiler performance tuning
    AI Agents Fine-tuning Inference
    Large language models show promise for code compilation tasks but struggle with runtime performance tuning. AutoPass uses evidence-guided LLM agents to perform compiler performance tuning.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    Automating SKILL.md Generation for Computer-Using Agents via Interaction Trajectory Mining
    Automating SKILL.md generation via interaction trajectory mining
    AI Agents Neural Network Reinforcement Learning
    Explicit skill libraries make computer-using agents easier to inspect, but building them is costly. This work automates SKILL.md generation by mining agents' interaction trajectories.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Training & Fine-tuning extract
    Train, Retrieve, or Both? A Four-Arm Head-to-Head for Correct Statutory Citation on the Ontario Residential Tenancies Act
    Train, retrieve, or both? Statutory citation on Ontario tenancy law
    Deep Learning Fine-tuning Neural Network Retrieval-Augmented Generation (RAG)
    The paper runs a four-arm head-to-head comparison of fine-tuning, retrieval, and their combination for producing correct statutory citations on the Ontario Residential Tenancies Act and its core regulation. It targets the practical need of tenants, landlords, and help-desk staff to be pointed at the governing provision.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    ELVA: Exploring Ranking-Driven Universal Multimodal Retrieval
    ELVA: ranking-driven universal multimodal retrieval
    Deep Learning Machine Learning Retrieval-Augmented Generation (RAG) Reinforcement Learning
    Leveraging multimodal large language models through contrastive learning has become mainstream for retrieval. ELVA explores a ranking-driven approach to universal multimodal retrieval.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    Finetuning Vision-Language-Action Models Requires Fewer Layers Than You Think
    Finetuning vision-language-action models needs fewer layers than expected
    Computer Vision Fine-tuning Inference Machine Learning Reinforcement Learning
    Vision-Language-Action models pre-trained on massive video-robot datasets have transformed robot control. This work shows that finetuning them requires fewer layers than previously assumed.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    ScholarQuest: A Taxonomy-Guided Benchmark for Agentic Academic Paper Search in Open Literature Environments
    ScholarQuest: a taxonomy-guided benchmark for agentic paper search
    AI Agents Software Engineering
    Academic paper search is a core step in research, and LLM-based search agents are emerging. ScholarQuest provides a taxonomy-guided benchmark for agentic academic paper search in open literature environments.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Training & Fine-tuning extract
    Actionable Activation Directions for Detecting and Mitigating Emergent Misalignment Across Language Model Families
    Activation directions for detecting emergent misalignment in LLMs
    Fine-tuning Llama Reinforcement Learning
    The paper investigates whether emergent misalignment—induced by fine-tuning language models on insecure code—corresponds to a causally actionable, shared direction in activation space. Across four instruction-tuned model families, it studies using such directions to detect and mitigate the misalignment.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    HilDA: Hierarchical Distillation with Diffusion for Advancing Self-Supervised LiDAR Pre-trainin
    HilDA: hierarchical distillation with diffusion for self-supervised LiDAR
    Computer Vision Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning
    Using vision foundation models for camera-to-LiDAR knowledge distillation is promising. HilDA advances self-supervised LiDAR pre-training through hierarchical distillation with diffusion.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    IHUBERT: Vector-Based Semantic Deduplication and Domain-Balanced Pretraining for Persian Resources
    IHUBERT: a Persian language model with semantic dedup pretraining
    Reinforcement Learning Software Engineering
    The paper presents IHUBERT, a monolingual Persian pretrained language model trained from scratch on a RoBERTa-base encoder. It uses vector-based semantic deduplication and domain-balanced pretraining to address the scarcity of large, high-quality Persian corpora and limited evaluation.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Training & Fine-tuning extract
    Clusters are All You Need: Pre-Training the Tsetlin Machine with Semantic Clusters from Language Models for Interpretability
    Pre-training the Tsetlin Machine with LM semantic clusters
    Embeddings
    The paper pre-trains the Tsetlin Machine using semantic clusters drawn from language models to improve interpretability in text classification. It aims to combine the transparency of the Tsetlin Machine's clause-based reasoning with the semantic information that models like BERT capture but do not expose transparently.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Training & Fine-tuning extract
    Beyond Uniform Forgetting: A Study of Sequential Direct Preference Optimization Across Preference Settings
    Sequential DPO and forgetting across preference settings
    Llama Machine Learning Reinforcement Learning Reinforcement Learning from Human Feedback (RLHF)
    The paper studies sequential Direct Preference Optimization (DPO) across different preference settings, examining how applying multiple alignment objectives one after another affects earlier ones. It looks beyond uniform forgetting to understand how later training stages interfere with previously learned preferences.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • Hugging Face Blog · EN Training & Fine-tuning extract
    Beyond LoRA: Can you beat the most popular fine-tuning technique?
    Hugging Face asks if you can beat LoRA, the top fine-tuning method
    Fine-tuning
    Hugging Face examines whether any approach can beat LoRA, the most popular fine-tuning technique. It compares alternative parameter-efficient methods on performance and cost, probing experimentally whether they can rival LoRA and offering guidance for practitioners.
    Read original (Hugging Face Blog) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Native Active Perception as Reasoning for Omni-Modal Understanding
    Active perception as reasoning for efficient omni-modal understanding
    Deep Learning Fine-tuning Machine Learning Neural Network Retrieval-Augmented Generation (RAG)
    Passive long-video models 'watch it all,' processing frames uniformly so cost grows with duration regardless of query difficulty. This work treats perception as reasoning, with native active perception that selectively attends to relevant frames for efficient omni-modal understanding.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    UBP2: Uncertainty-Balanced Preference Planning for Efficient Preference-based Reinforcement Learning
    UBP2: uncertainty-balanced planning for efficient preference-based RL
    Meta Neural Network Reinforcement Learning
    Preference-based RL learns reward models from pairwise behavior comparisons, bypassing explicit reward design, but existing methods often rely on passive data collection. UBP2 introduces uncertainty-balanced preference planning to actively select comparisons and learn efficiently from fewer preferences.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    Trade-offs in Medical LLM Adaptation: An Empirical Study in French QA
    Trade-offs in medical LLM adaptation, studied on French QA
    Fine-tuning Reinforcement Learning Software Engineering
    As LLMs are adapted to specialized domains and languages, the effectiveness of adaptation strategies remains unclear. This empirical study on French medical question answering analyzes the trade-offs of various domain-adaptation methods, clarifying gains and losses in performance and generality.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Multimodal extract
    OneCanvas: 3D Scene Understanding via Panoramic Reprojection
    OneCanvas enables VLM 3D scene understanding via panoramic reprojection
    Computer Vision Embeddings Neural Network Robotics Software Engineering
    Existing 3D scene understanding in VLMs relies on complex, model-specific geometry encoders or large training budgets for spatial reasoning. OneCanvas instead uses panoramic reprojection, letting VLMs reason about 3D scenes efficiently without dedicated geometry encoders or heavy training.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    TxBench-PP: Analyzing AI Agent Performance on Small-Molecule Preclinical Pharmacology
    TxBench-PP evaluates AI agents on preclinical pharmacology
    AI Agents Claude GPT Reinforcement Learning from Human Feedback (RLHF) Software Engineering
    AI agents promise to accelerate drug discovery by compressing interpretation and decision loops, but deployment needs trusted evaluation on realistic tasks. TxBench-PP is a benchmark analyzing AI agent performance on small-molecule preclinical pharmacology, assessing their practical reliability.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability
    STARE reweights token advantages to stabilize policy entropy
    Algorithms & Theory Retrieval-Augmented Generation (RAG) Reinforcement Learning
    Reinforcement learning with verifiable rewards, such as GRPO, dominates post-training for complex LLM reasoning but often suffers policy entropy collapse. STARE introduces surprisal-guided token-level advantage reweighting to stabilize policy entropy and preserve exploration during training.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    Mechanism-Guided Selective Unlearning for RLVR-Induced Reasoning
    MAST selectively unlearns RLVR-induced reasoning with less damage
    Fine-tuning Reinforcement Learning
    The authors propose MAST (Mechanism-Aligned Selective Targeting), a mechanism-guided method for unlearning RLVR-induced reasoning with substantially less collateral damage than standard full-parameter updates, removing targeted reasoning while preserving other capabilities.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    User as Engram: Internalizing Per-User Memory as Local Parametric Edits
    User as Engram: per-user memory as local parametric edits
    Retrieval-Augmented Generation (RAG) Software Engineering
    Personal memory in a language model involves two problems: content and reasoning skill, which the brain keeps apart—a sparse local hippocampal engram per episode and slow neocortical skill. Inspired by this, the work internalizes per-user memory as local parametric edits to the model.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Funding & M&A extract
    Dango: A Strictly L1-Only Large Language Model for Studying Second Language Acquisition
    Dango: an L1-only 1.8B LLM for studying second-language acquisition
    The authors introduce Dango, a 1.8B-parameter language model designed for controlled studies of L1-to-L2 (Japanese-to-English) transfer in second language acquisition. By training strictly on L1 only, Dango enables controlled experiments on transfer phenomena that prior SLA model studies could not.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Beyond Safe Data: Pretraining-Stage Alignment with Regular Safety Reflection
    Pretraining-stage alignment via regular safety reflection
    Fine-tuning Inference Reinforcement Learning
    To achieve deeper safety alignment for LLMs, recent work pushes safety interventions earlier into pretraining, mainly by filtering unsafe data or rewriting it into safe forms. Going beyond safe data, this work embeds regular safety reflection during pretraining to instill more fundamental alignment.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗