Training & Fine-tuning A
Showing 1–30 of 100
-
Probe-and-Refine Tuning of Repository Guidance for Coding AgentsProbe-and-Refine: tuning repository guidance for coding agentsThe paper presents Probe-and-Refine, a method for tuning the repository guidance (such as AGENTS.md files) that LLM-based coding agents rely on. It targets the higher-level operational knowledge—file layout, test workflows, and error-prone patterns—that is not contained in the code itself.
-
FreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA MiningFreeStyle: dual-reference style-content control via community LoRA miningStyle-content dual-reference generation aims to synthesize an image that preserves structure while adopting a reference style. FreeStyle leverages community LoRA mining to give free control over style and content.
-
Calibration Without Comprehension: Diagnosing the Limits of Fine-Tuning LLMs for Vulnerability Detection in Systems SoftwareDiagnosing whether fine-tuned LLMs comprehend software vulnerabilitiesIt is unclear whether LLMs that score well on vulnerability benchmarks truly reason about security or merely pattern-match. This work diagnoses the limits of fine-tuning LLMs for vulnerability detection in systems software.
-
Your Mouse and Eyes Secretly Leak Your Preference: LLM Alignment using Implicit Feedback from UsersAligning LLMs with implicit user feedback from mouse and gazeThe paper proposes aligning large language models using implicit user signals—such as mouse and eye movements—instead of explicit human feedback. It addresses the limitation that users rarely provide explicit ratings, which makes high-quality preference data scarce for reward modeling.
-
Scalable Training of Spatially Grounded 2D Vision-Language Models for RadiologyRefRad2D: training spatially grounded radiology VLMs at scaleThe paper studies how to train spatially grounded vision-language models for radiology without manual spatial annotations. It introduces RefRad2D, a large-scale bilingual (German/English) dataset of 1.2M CT and MR image-text pairs derived from clinical practice, with VQA and spatial grounding subsets.
-
Evolutionary Two-Stage Hyperparameter Optimization Strategies for Physics-Informed Neural NetworksEvolutionary two-stage hyperparameter optimization for PINNsThe paper proposes evolutionary two-stage hyperparameter optimization strategies for physics-informed neural networks (PINNs). It targets PINNs' unstable convergence, training plateaus, and strong sensitivity to architectural and optimization hyperparameters arising from their highly non-convex training.
-
DataMagic: Transforming Tabular Data into Data Insight VideoDataMagic: turning tabular data into data-insight videosData videos combine dynamic charts, voice narration, and synchronized animation to convey insights. DataMagic automatically transforms tabular data into such data-insight videos.
-
Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 RecipeRethinking shrinkage bias in LLM FP4 pretraining with a UFP4 recipeFP4 training promises large memory and compute savings for LLM pretraining but suffers from shrinkage bias. This paper analyzes its geometric origin and systemic impact and proposes a UFP4 recipe to address it.
-
AutoPass: Evidence-Guided LLM Agents for Compiler Performance TuningAutoPass: evidence-guided LLM agents for compiler performance tuningLarge language models show promise for code compilation tasks but struggle with runtime performance tuning. AutoPass uses evidence-guided LLM agents to perform compiler performance tuning.
-
Automating SKILL.md Generation for Computer-Using Agents via Interaction Trajectory MiningAutomating SKILL.md generation via interaction trajectory miningExplicit skill libraries make computer-using agents easier to inspect, but building them is costly. This work automates SKILL.md generation by mining agents' interaction trajectories.
-
Train, Retrieve, or Both? A Four-Arm Head-to-Head for Correct Statutory Citation on the Ontario Residential Tenancies ActTrain, retrieve, or both? Statutory citation on Ontario tenancy lawThe paper runs a four-arm head-to-head comparison of fine-tuning, retrieval, and their combination for producing correct statutory citations on the Ontario Residential Tenancies Act and its core regulation. It targets the practical need of tenants, landlords, and help-desk staff to be pointed at the governing provision.
-
ELVA: Exploring Ranking-Driven Universal Multimodal RetrievalELVA: ranking-driven universal multimodal retrievalLeveraging multimodal large language models through contrastive learning has become mainstream for retrieval. ELVA explores a ranking-driven approach to universal multimodal retrieval.
-
Finetuning Vision-Language-Action Models Requires Fewer Layers Than You ThinkFinetuning vision-language-action models needs fewer layers than expectedVision-Language-Action models pre-trained on massive video-robot datasets have transformed robot control. This work shows that finetuning them requires fewer layers than previously assumed.
-
ScholarQuest: A Taxonomy-Guided Benchmark for Agentic Academic Paper Search in Open Literature EnvironmentsScholarQuest: a taxonomy-guided benchmark for agentic paper searchAcademic paper search is a core step in research, and LLM-based search agents are emerging. ScholarQuest provides a taxonomy-guided benchmark for agentic academic paper search in open literature environments.
-
Actionable Activation Directions for Detecting and Mitigating Emergent Misalignment Across Language Model FamiliesActivation directions for detecting emergent misalignment in LLMsThe paper investigates whether emergent misalignment—induced by fine-tuning language models on insecure code—corresponds to a causally actionable, shared direction in activation space. Across four instruction-tuned model families, it studies using such directions to detect and mitigate the misalignment.
-
HilDA: Hierarchical Distillation with Diffusion for Advancing Self-Supervised LiDAR Pre-traininHilDA: hierarchical distillation with diffusion for self-supervised LiDARUsing vision foundation models for camera-to-LiDAR knowledge distillation is promising. HilDA advances self-supervised LiDAR pre-training through hierarchical distillation with diffusion.
-
IHUBERT: Vector-Based Semantic Deduplication and Domain-Balanced Pretraining for Persian ResourcesIHUBERT: a Persian language model with semantic dedup pretrainingThe paper presents IHUBERT, a monolingual Persian pretrained language model trained from scratch on a RoBERTa-base encoder. It uses vector-based semantic deduplication and domain-balanced pretraining to address the scarcity of large, high-quality Persian corpora and limited evaluation.
-
Clusters are All You Need: Pre-Training the Tsetlin Machine with Semantic Clusters from Language Models for InterpretabilityPre-training the Tsetlin Machine with LM semantic clustersThe paper pre-trains the Tsetlin Machine using semantic clusters drawn from language models to improve interpretability in text classification. It aims to combine the transparency of the Tsetlin Machine's clause-based reasoning with the semantic information that models like BERT capture but do not expose transparently.
-
Beyond Uniform Forgetting: A Study of Sequential Direct Preference Optimization Across Preference SettingsSequential DPO and forgetting across preference settingsThe paper studies sequential Direct Preference Optimization (DPO) across different preference settings, examining how applying multiple alignment objectives one after another affects earlier ones. It looks beyond uniform forgetting to understand how later training stages interfere with previously learned preferences.
-
Beyond LoRA: Can you beat the most popular fine-tuning technique?Hugging Face asks if you can beat LoRA, the top fine-tuning methodHugging Face examines whether any approach can beat LoRA, the most popular fine-tuning technique. It compares alternative parameter-efficient methods on performance and cost, probing experimentally whether they can rival LoRA and offering guidance for practitioners.
-
Native Active Perception as Reasoning for Omni-Modal UnderstandingActive perception as reasoning for efficient omni-modal understandingPassive long-video models 'watch it all,' processing frames uniformly so cost grows with duration regardless of query difficulty. This work treats perception as reasoning, with native active perception that selectively attends to relevant frames for efficient omni-modal understanding.
-
UBP2: Uncertainty-Balanced Preference Planning for Efficient Preference-based Reinforcement LearningUBP2: uncertainty-balanced planning for efficient preference-based RLPreference-based RL learns reward models from pairwise behavior comparisons, bypassing explicit reward design, but existing methods often rely on passive data collection. UBP2 introduces uncertainty-balanced preference planning to actively select comparisons and learn efficiently from fewer preferences.
-
Trade-offs in Medical LLM Adaptation: An Empirical Study in French QATrade-offs in medical LLM adaptation, studied on French QAAs LLMs are adapted to specialized domains and languages, the effectiveness of adaptation strategies remains unclear. This empirical study on French medical question answering analyzes the trade-offs of various domain-adaptation methods, clarifying gains and losses in performance and generality.
-
OneCanvas: 3D Scene Understanding via Panoramic ReprojectionOneCanvas enables VLM 3D scene understanding via panoramic reprojectionExisting 3D scene understanding in VLMs relies on complex, model-specific geometry encoders or large training budgets for spatial reasoning. OneCanvas instead uses panoramic reprojection, letting VLMs reason about 3D scenes efficiently without dedicated geometry encoders or heavy training.
-
TxBench-PP: Analyzing AI Agent Performance on Small-Molecule Preclinical PharmacologyTxBench-PP evaluates AI agents on preclinical pharmacologyAI agents promise to accelerate drug discovery by compressing interpretation and decision loops, but deployment needs trusted evaluation on realistic tasks. TxBench-PP is a benchmark analyzing AI agent performance on small-molecule preclinical pharmacology, assessing their practical reliability.
-
STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy StabilitySTARE reweights token advantages to stabilize policy entropyReinforcement learning with verifiable rewards, such as GRPO, dominates post-training for complex LLM reasoning but often suffers policy entropy collapse. STARE introduces surprisal-guided token-level advantage reweighting to stabilize policy entropy and preserve exploration during training.
-
Mechanism-Guided Selective Unlearning for RLVR-Induced ReasoningMAST selectively unlearns RLVR-induced reasoning with less damageThe authors propose MAST (Mechanism-Aligned Selective Targeting), a mechanism-guided method for unlearning RLVR-induced reasoning with substantially less collateral damage than standard full-parameter updates, removing targeted reasoning while preserving other capabilities.
-
User as Engram: Internalizing Per-User Memory as Local Parametric EditsUser as Engram: per-user memory as local parametric editsPersonal memory in a language model involves two problems: content and reasoning skill, which the brain keeps apart—a sparse local hippocampal engram per episode and slow neocortical skill. Inspired by this, the work internalizes per-user memory as local parametric edits to the model.
-
Dango: A Strictly L1-Only Large Language Model for Studying Second Language AcquisitionDango: an L1-only 1.8B LLM for studying second-language acquisitionThe authors introduce Dango, a 1.8B-parameter language model designed for controlled studies of L1-to-L2 (Japanese-to-English) transfer in second language acquisition. By training strictly on L1 only, Dango enables controlled experiments on transfer phenomena that prior SLA model studies could not.
-
Beyond Safe Data: Pretraining-Stage Alignment with Regular Safety ReflectionPretraining-stage alignment via regular safety reflectionTo achieve deeper safety alignment for LLMs, recent work pushes safety interventions earlier into pretraining, mainly by filtering unsafe data or rewriting it into safe forms. Going beyond safe data, this work embeds regular safety reflection during pretraining to instill more fundamental alignment.