Training & Fine-tuning A
Showing 31–60 of 99
-
IndicContextEval: A Benchmark for Evaluating Context Utilisation in Audio Large Language Models Across 8 Indic LanguagesIndicContextEval: audio-LLM context use across 8 Indic languagesAudio LLMs can condition speech recognition on textual prompts such as domain descriptions or entity lists, but whether they truly use this context is unclear. IndicContextEval is a benchmark evaluating context utilisation in audio large language models across eight Indic languages.
-
AdsMind: A Physics-Grounded Multi-Agent System for Self-Correcting Discovery of Adsorption Configurations on Heterogeneous Catalyst SurfacesAdsMind: physics-grounded multi-agent search for adsorption configsIdentifying the lowest-energy surface-adsorbate configuration is critical for modeling heterogeneous catalysis, but exhaustive ab initio exploration is prohibitive. AdsMind is a physics-grounded multi-agent system that self-corrects to efficiently discover adsorption configurations.
-
On Local Population-Risk CertificatesLocal population-risk certificates for model updatesThis paper develops local certificates for population-risk increments around a current model. For a local candidate set, the certificate provides a two-sided confidence bound on the change in population risk, giving theoretical guarantees on the risk impact of local model updates.
-
Leadership as Coordination Control: Behavioral Signatures and the Recovery-Advantage Boundary in Multi-Agent LLM TeamsLeadership as coordination control in multi-agent LLM teamsTeam science holds leadership is contingent—helpful only under specific conditions, and unneeded by capable autonomous teams. Asking the analogous question for multi-agent LLMs, this work frames leadership as coordination control, characterizing its behavioral signatures and the recovery-advantage boundary.
-
JourneyFormer: Encoding Airbnb Guest Journey with Sequence ModelingJourneyFormer encodes the Airbnb guest journey via sequence modelingSequence modeling is increasingly popular in recommendation and ranking for its ability to model users' historical behaviors and infer intentions. This work proposes JourneyFormer, which encodes the Airbnb guest journey with sequence modeling to better understand behavior and improve recommendations.
-
ProductConsistency: Improving Product Identity Preservation in Instruction-Based Image Editing via SFT and RLProductConsistency preserves product identity in instruction-based editingInstruction-based image editing enables complex edits from natural language, but in product-centric scenarios preserving product features and branding is hard. ProductConsistency uses supervised fine-tuning and reinforcement learning to improve product identity preservation during instruction-based editing.
-
ARIADNE: Agnostic Routing for Inference-time Adapter DyNamic sElectionARIADNE: agnostic routing for inference-time adapter selectionWidespread parameter-efficient fine-tuning yields ecosystems where one backbone pairs with many task-specialized adapters. ARIADNE provides agnostic routing for inference-time dynamic adapter selection, choosing the right adapter per input without model-specific assumptions.
-
Where Did the Variability Go? From Vibe Coding to Product Lines by RegenerationFrom vibe coding to product lines via regenerationIn vibe coding, an emerging AI-driven paradigm, an LLM generates an entire program from a natural-language prompt—but where does the variability that traditional software engineering manages go? This work uses regeneration to move from vibe coding toward software product lines.
-
Spotlight: Synergizing Seed Exploration and Spot GPUs for DiT RL Post-TrainingSpotlight cuts DiT RL post-training cost with spot GPUsReinforcement learning post-training of Diffusion Transformers is prohibitively expensive, needing thousands of high-end GPUs. Spotlight synergizes seed exploration with cheap, preemptible spot GPUs to substantially reduce the cost of DiT RL post-training.
-
GraphPO: Graph-based Policy Optimization for Reasoning ModelsGraphPO: graph-based policy optimization for reasoning modelsReinforcement learning with verifiable rewards has become standard for reasoning models. GraphPO introduces a graph-based policy optimization method that exploits structure across reasoning steps to improve reasoning performance.
-
SAGE: Stochastic Prompt Optimization via Agent-Guided ExplorationSAGE: stochastic prompt optimization via agent-guided explorationContext engineering has become a primary lever for improving AI systems. SAGE is a stochastic prompt optimization method that uses agent-guided exploration to automatically discover effective prompts and improve task performance.
-
Output Vector Editing for Memorization Mitigation in Large Language ModelsOutput vector editing for memorization mitigation in LLMsLarge language models memorize and reproduce sequences from their training data. This work edits output vectors to mitigate such memorization, reducing the risk of leaking copyrighted or private content.
-
PLaMo-3.0-Prime-β を LLM 開発の現場で使うPreferred Networks shows PLaMo-3.0-Prime-β in real LLM developmentPreferred Networks continues developing its large language model PLaMo and shares how to use the latest PLaMo-3.0-Prime-β in real development work. Beyond training large models, it covers the many surrounding tasks involved in building high-performance LLMs in practice.
-
GitLab、AIエージェント向けの次世代Git互換ソースコード管理サービス「Project Switch」発表。最大で50倍高速かつ半分のトークンで利用可能にGitLab unveils 'Project Switch,' a Git-compatible SCM service for AI agentsGitLab announced Project Switch, a next-generation Git-compatible source code management service aimed at AI agents, at its GitLab Transcend event in London. Reports cite up to 50x speed and roughly half the token usage; figures reflect the announcement and remain unverified.
-
EvolveNav: Proactive Preflection and Self-Evolving Memory for Zero-Shot Object Goal NavigationEvolveNav: a self-evolving framework for zero-shot object-goal navigationThe paper proposes a self-evolving zero-shot object-goal navigation framework that builds an agentic rule memory by extracting actionable knowledge from past trajectories and uses a retrieval strategy to enable continuous test-time improvement.
-
Darshana Graph: A Parallel Commentary Corpus for Comparative Indian Philosophy, with Stylometric and Exploratory Graph AnalysesDarshana Graph: a parallel commentary corpus for Indian philosophyDarshana Graph is a corpus of over 125,000 text records spanning classical Hindu, Buddhist and Jain philosophical traditions, drawn from public-domain and openly licensed translations. It supports comparative Indian philosophy through stylometric and exploratory graph analyses.
-
Learning from the Self-future: On-policy Self-distillation for dLLMsOn-policy self-distillation explored for diffusion LLMsOn-policy self-distillation (OPSD) helps post-training of LLMs but is unexplored for diffusion LLMs (dLLMs). Existing OPSD methods are autoregressive-centric, injecting privileged information via left-to-right prefix conditioning; this work studies self-distillation suited to dLLMs.
-
Multi-Source Cybersecurity Logs: An ATT&CK-Labeled Dataset and SLM EvaluationATT&CK-labeled multi-source security log dataset with SLM evaluationThe work builds a dataset of multi-source cybersecurity logs labeled with MITRE ATT&CK and evaluates small language models (SLMs) on it. Summary is title-based and neutral; details and figures are as presented by the source and not independently verified.
-
WEQA: Wearable hEalth Question Answering with Query-Adaptive Agentic ReasoningWEQA: query-adaptive agentic reasoning for wearable health QAThe paper proposes WEQA, a framework for question answering over wearable health sensor data using query-adaptive agentic reasoning, arguing that diverse sensor modalities and user intents cannot be handled by a fixed reasoning workflow.
-
S4oP: Operator-level Pruning of Structured State Space Models for Resource-Constrained DevicesS4oP prunes structured state space models at the operator levelStructured state space models such as S4 and S4D capture long-range dependencies but are hard to deploy on constrained devices. S4oP introduces operator-level pruning to enable efficient deployment of SSMs on time- and resource-constrained hardware.
-
EAGG: Embodiment-Aligned Grasp Generation via Geometry-Aware Graph ConditioningEAGG: embodiment-aligned grasp generation via graph conditioningThe paper presents EAGG, an embodiment-aligned grasp generator that represents each end-effector with a topology-aware graph and embodiment-specific conditioning, aiming to generalize grasp generation across objects and diverse robot embodiments.
-
From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model ReasoningFrom reasoning traces to reusable modules for compositional reasoningPost-training pipelines combining supervised fine-tuning with reinforcement learning are key to turning LLMs into robust reasoners. The paper studies compositional generalization in LM reasoning by converting reasoning traces into reusable modules.
-
Uncertainty Quantification for Flow-Based Vision-Language-Action ModelsUncertainty quantification for flow-based vision-language-action modelsVision-language-action models combine vision-language backbones with expressive generative action heads trained via flow matching on large robotic datasets. Despite strong performance, the paper studies uncertainty quantification for these flow-based VLA models.
-
When English Isn't the Best Teacher: Source Language Effects in Cross-Lingual In-Context LearningSource-language effects in cross-lingual in-context learningCross-lingual transfer is well studied under supervised fine-tuning, where data and linguistic similarity drive quality. As the field shifts to few-shot in-context learning, this paper examines source-language effects and shows English is not always the best teacher.
-
Catastrophic Forgetting is Low-Rank: A Function-Space Theory for Continual AdaptationCatastrophic forgetting is low-rank: a function-space theoryCatastrophic forgetting in continual adaptation is usually viewed via parameter drift or replay, which do not reveal which output directions are vulnerable. The paper gives a function-space account in the NTK regime, showing new-task training drifts old-task predictions low-rank through the cross-task kernel.
-
Fast Nonparametric Conditional Independence Testing via Two-Stage RegressionFast nonparametric conditional independence testing via two-stage regressionConditional independence testing is fundamental to statistics and causal inference. The paper proposes a fast nonparametric conditional independence test based on two-stage regression, aiming to improve computational efficiency and power.
-
Fine-tuning LLMs for Passive Depression Severity Estimation from AI Mental Health DialogueFine-tuning LLMs for passive depression severity from AI dialogueThe paper fine-tunes LLMs for passive estimation of depression severity from AI mental-health dialogue, exploring how conversational signals can indicate severity. Figures and efficacy are as reported by the source and not independently verified.
-
KANLib -- An Modular, Extensible and Fast Kolmogorov-Arnold Network ImplementationKANLib: a modular, extensible and fast KAN implementationKolmogorov-Arnold Networks replace linear weights with learnable univariate functions but their high computational cost hampers practical research. KANLib provides a modular, extensible and fast implementation of KANs to ease experimentation.
-
Environment-Grounded Automated Prompt Optimization for LLM Game AgentsEnvironment-grounded automated prompt optimization for LLM game agentsLLM agents in interactive environments are sensitive to prompts, yet prompt engineering stays manual and task-specific. The paper decomposes the observation-to-action pipeline and proposes an environment-grounded automated prompt optimization framework for LLM game agents.
-
Perceptual compensation for tonal context in self-supervised speech modelsPerceptual compensation for tonal context in self-supervised speech modelsThe study examines the extent to which self-supervised speech models exhibit perceptual compensation for tonal context, analyzing how context effects seen in human speech perception are reflected in the models' learned representations.