Inference & Efficiency A
Showing 31–60 of 121
-
Equivariant Graph Neural Networks Improve Optical Spectra Prediction for Materials ScreeningEquivariant GNNs improve optical spectra prediction for materialsScalable prediction of optical spectra is critical for high-throughput materials screening in optoelectronics such as solar cells, yet existing surrogates train on spectra from lower-level methods. This work uses equivariant graph neural networks to improve optical spectra prediction for materials screening.
-
Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-DistillationDecoupling perception and reasoning for shortcut-resilient self-distillationOn-policy self-distillation trains a model on its own rollouts, using a frozen copy to give dense token-level targets conditioned on a reference. This work decouples perception from reasoning—seeing before reasoning—to make multimodal on-policy self-distillation resilient to shortcut learning.
-
Wasserstein Policy Learning for Distributional OutcomesWasserstein policy learning for distributional outcomesOffline policy learning is gaining attention in causal inference, aiming to learn an individualized treatment rule mapping covariates to treatments that maximizes empirical outcomes. This work proposes Wasserstein policy learning for distributional outcomes, accounting for the full outcome distribution.
-
JourneyFormer: Encoding Airbnb Guest Journey with Sequence ModelingJourneyFormer encodes the Airbnb guest journey via sequence modelingSequence modeling is increasingly popular in recommendation and ranking for its ability to model users' historical behaviors and infer intentions. This work proposes JourneyFormer, which encodes the Airbnb guest journey with sequence modeling to better understand behavior and improve recommendations.
-
ARIADNE: Agnostic Routing for Inference-time Adapter DyNamic sElectionARIADNE: agnostic routing for inference-time adapter selectionWidespread parameter-efficient fine-tuning yields ecosystems where one backbone pairs with many task-specialized adapters. ARIADNE provides agnostic routing for inference-time dynamic adapter selection, choosing the right adapter per input without model-specific assumptions.
-
RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use AgentsRODS: reward-driven online data synthesis for tool-use agentsMulti-turn tool-use RL is bottlenecked by the rapid depletion of informative samples in static datasets. Observing that GRPO's gradient signal concentrates on certain tasks, RODS performs reward-driven online data synthesis to continually supply informative samples for multi-turn tool-use agents.
-
FoMoE: Breaking the Full-Replica Barrier with a Federation of MoEsFoMoE breaks the full-replica barrier with a federation of MoEsPretraining LLMs typically demands large-scale infrastructure with tightly coupled accelerators. As model and data scale grow, FoMoE proposes a federation of Mixture-of-Experts that avoids replicating the full model across devices, breaking the full-replica barrier and easing infrastructure constraints.
-
Decoupling Search from Reasoning: A Vendor-Agnostic Grounding Architecture for LLM AgentsDecoupling search from reasoning: a vendor-agnostic grounding architectureProduction LLM agents increasingly depend on real-time search but get locked into vendor-specific grounding. This work decouples search from reasoning with a vendor-agnostic grounding architecture, letting search backends be swapped while preserving reasoning quality.
-
Graph-ESBMC-PLC: Formal Verification of Graphical PLCopen XML Ladder Diagram Programs Using SMT-Based Model CheckingGraph-ESBMC-PLC: SMT-based verification of PLCopen ladder diagramsPLCopen XML defines encodings for IEC 61131-3 Ladder Diagrams. Graph-ESBMC-PLC applies SMT-based model checking to formally verify graphical PLCopen XML Ladder Diagram programs, supporting correctness checking of industrial control software.
-
REVES: REvision and VErification--Augmented Training for Test-Time ScalingREVES: revision- and verification-augmented training for test-time scalingTest-time scaling via sequential revision has become a powerful paradigm. REVES proposes revision- and verification-augmented training that strengthens a model ability to revise and verify its own outputs, making extra test-time compute more effective.
-
Learning Robust Pair Confidence for Multimodal Emotion-Cause Pair ExtractionLearning robust pair confidence for multimodal emotion-cause extractionMultimodal emotion-cause pair extraction requires reliable pairing of emotions and their causes. This work learns robust pair confidence, yielding emotion-cause extraction that is more resilient to noise and ambiguity.
-
Improving Medical Communication using Rubric-Guided Counterfactual RecommendationsRubric-guided counterfactual recommendations for medical communicationText-based telemedicine increasingly relies on lightweight patient feedback. This work improves medical communication using rubric-guided counterfactual recommendations, enhancing the quality of patient-clinician interactions.
-
Efficient Financial Language Understanding via Distillation with Synthetic DataEfficient financial language understanding via distillation with synthetic dataLarge instruction-following models are powerful but costly to deploy, especially in finance. This work distills capabilities using synthetic data to build lightweight models that understand financial language efficiently.
-
Approximate Structured Diffusion for Sequence LabellingApproximate structured diffusion for sequence labellingSequence labelling is a core NLP task. This work proposes an approximate structured diffusion approach that models label dependencies while keeping sequence labelling efficient.
-
Beyond Scalar Scores: Exploring LLM-based Metrics for Clinical Significance Evaluation in Radiology ReportsBeyond scalar scores: LLM-based metrics for radiology report significanceReliable evaluation of generated radiology reports requires strict clinical validity. Going beyond scalar scores, this work explores LLM-based metrics for clinical significance evaluation, assessing report quality in clinically meaningful terms.
-
Lost in a Single Vector: Improving Long-Document Retrieval with Chunk Evidence AggregationImproving long-document retrieval with chunk evidence aggregationDense retrieval matches one query vector against one document vector, but long documents get lost in a single vector. This work splits documents into chunks and aggregates per-chunk evidence to improve long-document retrieval.
-
Morpheus: A Morphology-Aware Neural Tokenizer and Word Embedder for TurkishMorpheus: a morphology-aware neural tokenizer and embedder for TurkishTurkish is agglutinative, with meaning carried by morphemes that subword tokenizers fail to capture. Morpheus is a morphology-aware neural tokenizer and word embedder designed to improve Turkish language processing.
-
LLM Serving Fairness: No more noisy neighboursCohere ensures fair compute sharing across LLM serving tenantsCohere details how it ensures every tenant gets a fair share of compute in LLM serving, tackling the 'noisy neighbour' problem where one user monopolizes resources. The design allocates capacity fairly across tenants to deliver stable, predictable multi-tenant performance.
-
東芝の組み込み向け量子インスパイアード技術が進化、高速化と安定性を両立Toshiba builds embedded quantum-inspired combinatorial optimizerToshiba announced a 'quantum-inspired optimization framework' that solves combinatorial optimization problems quickly and stably even under constantly changing real-world conditions. Aimed at embedded use, it targets both speed and stability; the claims are Toshiba's and not independently verified.
-
Visual Verification Enables Inference-time Steering and Autonomous Policy ImprovementVERITAS steers and self-improves robot policies at inference timeThe paper proposes VERITAS, a generator-verifier framework pairing a pre-trained generalist robot policy with a gradient-free visual verifier that evaluates actions at inference time, improving performance without extra training and enabling self-improvement.
-
Variable-Width TransformersVariable-width transformer cuts FLOPs ~22% via x-shaped layer widthsThe paper proposes an x-shaped transformer that keeps early and late layers wide while narrowing the middle, using a parameter-free residual resizing mechanism. Across dense 200M-2B and 3B MoE decoder-only models it outperforms parameter-matched uniform baselines and reduces FLOPs by about 22% under loss-matched scaling, with smaller KV cache.
-
Zone of Proximal Policy Optimization: Teacher in Prompts, Not GradientsZone of Proximal PPO puts the teacher in prompts, not gradientsKnowledge distillation is brittle for small students, as imitating a large teacher's logits concentrates on its sharpest modes and hurts generalization. The proposed Zone of Proximal Policy Optimization places the teacher in prompts rather than gradients to improve small-student generalization.
-
Rethinking Dataset Distillation for Classification: Do Distilled Sets Outperform Coresets?Do distilled sets beat coresets? Rethinking dataset distillationDataset distillation synthesizes compact training sets for data-centric machine learning. This paper rethinks distillation for classification, asking whether distilled sets actually outperform coresets (real-data subsets) and under what conditions.
-
RubricsTree: Scalable and Evolving Open-Ended Evaluation of Personal Health Agents across Health Memory and Medical SkillsRubricsTree: scalable open-ended evaluation of personal health agentsLLM personal health agents using sensor metrics promise to ease healthcare disparities, but an open-ended evaluation bottleneck limits clinical deployment. RubricsTree offers scalable, evolving open-ended evaluation across health memory and medical skills.
-
Learning from the Self-future: On-policy Self-distillation for dLLMsOn-policy self-distillation explored for diffusion LLMsOn-policy self-distillation (OPSD) helps post-training of LLMs but is unexplored for diffusion LLMs (dLLMs). Existing OPSD methods are autoregressive-centric, injecting privileged information via left-to-right prefix conditioning; this work studies self-distillation suited to dLLMs.
-
Kolmogorov Regression for Robust Diffusion PoliciesKolmogorov regression yields robust diffusion policiesFinite-dimensional diffusion policies suffer temporal drift from discretization that degrades long-horizon performance. The paper introduces a backward Kolmogorov equation that lifts diffusion policies into a Cameron-Martin space to make them more robust.
-
Knowledge Reutilization in Meta-Reinforcement LearningA meta-knowledge reutilization framework for meta-RL across agentsThe paper proposes a meta-knowledge reutilization framework for meta-reinforcement learning that learns task-level knowledge on a dynamics-simplified agent and transfers it to heterogeneous agents, using a Bayesian non-parametric prior to organize latent task modes.
-
Embedded Machine Learning for Microcontroller-Class Edge Devices: Data, Feature, Evaluation, and Deployment PipelinesA pipeline survey of embedded ML for microcontroller-class devicesEmbedded machine learning moves inference from the cloud to resource-constrained devices. This practice-oriented synthesis lays out data, feature, evaluation and deployment pipelines for an embedded ML workflow on microcontroller-class platforms.
-
Ternary Mamba: Grouped Quantization-Aware Training of W1.58A16 State Space ModelsTernary Mamba: grouped QAT for W1.58A16 state space modelsTernary Mamba applies grouped quantization-aware training to Mamba state space models with ternary (W1.58) weights and 16-bit activations, targeting efficient low-bit training and inference of sequence models while preserving accuracy.
-
Querying an astronomical database using large language models: the ALeRCE text-to-SQL systemA text-to-SQL system for querying the ALeRCE astronomical databaseThe paper develops an LLM-based text-to-SQL system using in-context learning, applied to the ALeRCE astronomical broker database, generating executable SQL from natural language and evaluated on a dataset of 110 NL/SQL pairs via step-by-step generation.