Training & Fine-tuning A

Showing 31–60 of 99
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    IndicContextEval: A Benchmark for Evaluating Context Utilisation in Audio Large Language Models Across 8 Indic Languages
    IndicContextEval: audio-LLM context use across 8 Indic languages
    Meta Neural Network Software Engineering Speech Processing
    Audio LLMs can condition speech recognition on textual prompts such as domain descriptions or entity lists, but whether they truly use this context is unclear. IndicContextEval is a benchmark evaluating context utilisation in audio large language models across eight Indic languages.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    AdsMind: A Physics-Grounded Multi-Agent System for Self-Correcting Discovery of Adsorption Configurations on Heterogeneous Catalyst Surfaces
    AdsMind: physics-grounded multi-agent search for adsorption configs
    AI Agents Machine Intelligence Machine Learning
    Identifying the lowest-energy surface-adsorbate configuration is critical for modeling heterogeneous catalysis, but exhaustive ab initio exploration is prohibitive. AdsMind is a physics-grounded multi-agent system that self-corrects to efficiently discover adsorption configurations.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Training & Fine-tuning extract
    On Local Population-Risk Certificates
    Local population-risk certificates for model updates
    Reinforcement Learning from Human Feedback (RLHF)
    This paper develops local certificates for population-risk increments around a current model. For a local candidate set, the certificate provides a two-sided confidence bound on the change in population risk, giving theoretical guarantees on the risk impact of local model updates.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    Leadership as Coordination Control: Behavioral Signatures and the Recovery-Advantage Boundary in Multi-Agent LLM Teams
    Leadership as coordination control in multi-agent LLM teams
    Llama
    Team science holds leadership is contingent—helpful only under specific conditions, and unneeded by capable autonomous teams. Asking the analogous question for multi-agent LLMs, this work frames leadership as coordination control, characterizing its behavioral signatures and the recovery-advantage boundary.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Industry Adoption extract
    JourneyFormer: Encoding Airbnb Guest Journey with Sequence Modeling
    JourneyFormer encodes the Airbnb guest journey via sequence modeling
    Algorithms & Theory Embeddings Inference
    Sequence modeling is increasingly popular in recommendation and ranking for its ability to model users' historical behaviors and infer intentions. This work proposes JourneyFormer, which encodes the Airbnb guest journey with sequence modeling to better understand behavior and improve recommendations.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    ProductConsistency: Improving Product Identity Preservation in Instruction-Based Image Editing via SFT and RL
    ProductConsistency preserves product identity in instruction-based editing
    Fine-tuning Machine Learning Reinforcement Learning
    Instruction-based image editing enables complex edits from natural language, but in product-centric scenarios preserving product features and branding is hard. ProductConsistency uses supervised fine-tuning and reinforcement learning to improve product identity preservation during instruction-based editing.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    ARIADNE: Agnostic Routing for Inference-time Adapter DyNamic sElection
    ARIADNE: agnostic routing for inference-time adapter selection
    Embeddings Fine-tuning Inference Llama Retrieval-Augmented Generation (RAG)
    Widespread parameter-efficient fine-tuning yields ecosystems where one backbone pairs with many task-specialized adapters. ARIADNE provides agnostic routing for inference-time dynamic adapter selection, choosing the right adapter per input without model-specific assumptions.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    Where Did the Variability Go? From Vibe Coding to Product Lines by Regeneration
    From vibe coding to product lines via regeneration
    Software Engineering
    In vibe coding, an emerging AI-driven paradigm, an LLM generates an entire program from a natural-language prompt—but where does the variability that traditional software engineering manages go? This work uses regeneration to move from vibe coding toward software product lines.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    Spotlight: Synergizing Seed Exploration and Spot GPUs for DiT RL Post-Training
    Spotlight cuts DiT RL post-training cost with spot GPUs
    Deep Learning Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning Transformer
    Reinforcement learning post-training of Diffusion Transformers is prohibitively expensive, needing thousands of high-end GPUs. Spotlight synergizes seed exploration with cheap, preemptible spot GPUs to substantially reduce the cost of DiT RL post-training.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Training & Fine-tuning extract
    GraphPO: Graph-based Policy Optimization for Reasoning Models
    GraphPO: graph-based policy optimization for reasoning models
    Neural Network Reinforcement Learning Software Engineering
    Reinforcement learning with verifiable rewards has become standard for reasoning models. GraphPO introduces a graph-based policy optimization method that exploits structure across reasoning steps to improve reasoning performance.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    SAGE: Stochastic Prompt Optimization via Agent-Guided Exploration
    SAGE: stochastic prompt optimization via agent-guided exploration
    Context engineering has become a primary lever for improving AI systems. SAGE is a stochastic prompt optimization method that uses agent-guided exploration to automatically discover effective prompts and improve task performance.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Policy & Regulation extract
    Output Vector Editing for Memorization Mitigation in Large Language Models
    Output vector editing for memorization mitigation in LLMs
    Llama Machine Learning
    Large language models memorize and reproduce sequences from their training data. This work edits output vectors to mitigate such memorization, reducing the risk of leaking copyrighted or private content.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • Preferred Networks Tech Blog · JA Training & Fine-tuning extract
    PLaMo-3.0-Prime-β を LLM 開発の現場で使う
    Preferred Networks shows PLaMo-3.0-Prime-β in real LLM development
    Deep Learning
    Preferred Networks continues developing its large language model PLaMo and shares how to use the latest PLaMo-3.0-Prime-β in real development work. Beyond training large models, it covers the many surrounding tasks involved in building high-performance LLMs in practice.
    Read original (Preferred Networks Tech Blog) ↗
  • Publickey · JA New Model Releases extract
    GitLab、AIエージェント向けの次世代Git互換ソースコード管理サービス「Project Switch」発表。最大で50倍高速かつ半分のトークンで利用可能に
    GitLab unveils 'Project Switch,' a Git-compatible SCM service for AI agents
    AI Agents Machine Learning
    GitLab announced Project Switch, a next-generation Git-compatible source code management service aimed at AI agents, at its GitLab Transcend event in London. Reports cite up to 50x speed and roughly half the token usage; figures reflect the announcement and remain unverified.
    Read original (Publickey) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    EvolveNav: Proactive Preflection and Self-Evolving Memory for Zero-Shot Object Goal Navigation
    EvolveNav: a self-evolving framework for zero-shot object-goal navigation
    AI Agents Neural Network Retrieval-Augmented Generation (RAG)
    The paper proposes a self-evolving zero-shot object-goal navigation framework that builds an agentic rule memory by extracting actionable knowledge from past trajectories and uses a retrieval strategy to enable continuous test-time improvement.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Darshana Graph: A Parallel Commentary Corpus for Comparative Indian Philosophy, with Stylometric and Exploratory Graph Analyses
    Darshana Graph: a parallel commentary corpus for Indian philosophy
    Machine Learning Neural Network
    Darshana Graph is a corpus of over 125,000 text records spanning classical Hindu, Buddhist and Jain philosophical traditions, drawn from public-domain and openly licensed translations. It supports comparative Indian philosophy through stylometric and exploratory graph analyses.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Learning from the Self-future: On-policy Self-distillation for dLLMs
    On-policy self-distillation explored for diffusion LLMs
    Deep Learning Fine-tuning Reinforcement Learning Software Engineering
    On-policy self-distillation (OPSD) helps post-training of LLMs but is unexplored for diffusion LLMs (dLLMs). Existing OPSD methods are autoregressive-centric, injecting privileged information via left-to-right prefix conditioning; this work studies self-distillation suited to dLLMs.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN Training & Fine-tuning extract
    Multi-Source Cybersecurity Logs: An ATT&CK-Labeled Dataset and SLM Evaluation
    ATT&CK-labeled multi-source security log dataset with SLM evaluation
    Fine-tuning Llama Machine Learning Neural Network Reinforcement Learning from Human Feedback (RLHF)
    The work builds a dataset of multi-source cybersecurity logs labeled with MITRE ATT&CK and evaluates small language models (SLMs) on it. Summary is title-based and neutral; details and figures are as presented by the source and not independently verified.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    WEQA: Wearable hEalth Question Answering with Query-Adaptive Agentic Reasoning
    WEQA: query-adaptive agentic reasoning for wearable health QA
    Deep Learning Neural Network Software Engineering
    The paper proposes WEQA, a framework for question answering over wearable health sensor data using query-adaptive agentic reasoning, arguing that diverse sensor modalities and user intents cannot be handled by a fixed reasoning workflow.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Inference & Efficiency extract
    S4oP: Operator-level Pruning of Structured State Space Models for Resource-Constrained Devices
    S4oP prunes structured state space models at the operator level
    Fine-tuning Inference Reinforcement Learning
    Structured state space models such as S4 and S4D capture long-range dependencies but are hard to deploy on constrained devices. S4oP introduces operator-level pruning to enable efficient deployment of SSMs on time- and resource-constrained hardware.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    EAGG: Embodiment-Aligned Grasp Generation via Geometry-Aware Graph Conditioning
    EAGG: embodiment-aligned grasp generation via graph conditioning
    Fine-tuning Retrieval-Augmented Generation (RAG)
    The paper presents EAGG, an embodiment-aligned grasp generator that represents each end-effector with a topology-aware graph and embodiment-specific conditioning, aiming to generalize grasp generation across objects and diverse robot embodiments.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Training & Fine-tuning extract
    From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning
    From reasoning traces to reusable modules for compositional reasoning
    Fine-tuning Retrieval-Augmented Generation (RAG) Reinforcement Learning
    Post-training pipelines combining supervised fine-tuning with reinforcement learning are key to turning LLMs into robust reasoners. The paper studies compositional generalization in LM reasoning by converting reasoning traces into reusable modules.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Training & Fine-tuning extract
    Uncertainty Quantification for Flow-Based Vision-Language-Action Models
    Uncertainty quantification for flow-based vision-language-action models
    Computer Vision Fine-tuning Retrieval-Augmented Generation (RAG) Reinforcement Learning
    Vision-language-action models combine vision-language backbones with expressive generative action heads trained via flow matching on large robotic datasets. Despite strong performance, the paper studies uncertainty quantification for these flow-based VLA models.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN Training & Fine-tuning extract
    When English Isn't the Best Teacher: Source Language Effects in Cross-Lingual In-Context Learning
    Source-language effects in cross-lingual in-context learning
    Fine-tuning Neural Network Natural Language Processing (NLP)
    Cross-lingual transfer is well studied under supervised fine-tuning, where data and linguistic similarity drive quality. As the field shifts to few-shot in-context learning, this paper examines source-language effects and shows English is not always the best teacher.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN Training & Fine-tuning extract
    Catastrophic Forgetting is Low-Rank: A Function-Space Theory for Continual Adaptation
    Catastrophic forgetting is low-rank: a function-space theory
    Fine-tuning Reinforcement Learning
    Catastrophic forgetting in continual adaptation is usually viewed via parameter drift or replay, which do not reveal which output directions are vulnerable. The paper gives a function-space account in the NTK regime, showing new-task training drifts old-task predictions low-rank through the cross-task kernel.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    Fast Nonparametric Conditional Independence Testing via Two-Stage Regression
    Fast nonparametric conditional independence testing via two-stage regression
    Algorithms & Theory Reinforcement Learning from Human Feedback (RLHF)
    Conditional independence testing is fundamental to statistics and causal inference. The paper proposes a fast nonparametric conditional independence test based on two-stage regression, aiming to improve computational efficiency and power.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Fine-tuning LLMs for Passive Depression Severity Estimation from AI Mental Health Dialogue
    Fine-tuning LLMs for passive depression severity from AI dialogue
    Claude Fine-tuning Neural Network Reinforcement Learning
    The paper fine-tunes LLMs for passive estimation of depression severity from AI mental-health dialogue, exploring how conversational signals can indicate severity. Figures and efficacy are as reported by the source and not independently verified.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    KANLib -- An Modular, Extensible and Fast Kolmogorov-Arnold Network Implementation
    KANLib: a modular, extensible and fast KAN implementation
    Kolmogorov-Arnold Networks replace linear weights with learnable univariate functions but their high computational cost hampers practical research. KANLib provides a modular, extensible and fast implementation of KANs to ease experimentation.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Environment-Grounded Automated Prompt Optimization for LLM Game Agents
    Environment-grounded automated prompt optimization for LLM game agents
    AI Agents Fine-tuning Reinforcement Learning
    LLM agents in interactive environments are sensitive to prompts, yet prompt engineering stays manual and task-specific. The paper decomposes the observation-to-action pipeline and proposes an environment-grounded automated prompt optimization framework for LLM game agents.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Training & Fine-tuning extract
    Perceptual compensation for tonal context in self-supervised speech models
    Perceptual compensation for tonal context in self-supervised speech models
    Embeddings Retrieval-Augmented Generation (RAG) Speech Processing
    The study examines the extent to which self-supervised speech models exhibit perceptual compensation for tonal context, analyzing how context effects seen in human speech perception are reflected in the models' learned representations.
    Read original (arXiv cs.CL (Computation and Language)) ↗