Inference & Efficiency A

Showing 31–60 of 121
  • arXiv cs.AI (Artificial Intelligence) · EN Inference & Efficiency extract
    Equivariant Graph Neural Networks Improve Optical Spectra Prediction for Materials Screening
    Equivariant GNNs improve optical spectra prediction for materials
    Neural Network
    Scalable prediction of optical spectra is critical for high-throughput materials screening in optoelectronics such as solar cells, yet existing surrogates train on spectra from lower-level methods. This work uses equivariant graph neural networks to improve optical spectra prediction for materials screening.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Multimodal extract
    Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation
    Decoupling perception and reasoning for shortcut-resilient self-distillation
    Computer Vision Machine Learning Software Engineering
    On-policy self-distillation trains a model on its own rollouts, using a frozen copy to give dense token-level targets conditioned on a reference. This work decouples perception from reasoning—seeing before reasoning—to make multimodal on-policy self-distillation resilient to shortcut learning.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Inference & Efficiency extract
    Wasserstein Policy Learning for Distributional Outcomes
    Wasserstein policy learning for distributional outcomes
    Deep Learning Inference
    Offline policy learning is gaining attention in causal inference, aiming to learn an individualized treatment rule mapping covariates to treatments that maximizes empirical outcomes. This work proposes Wasserstein policy learning for distributional outcomes, accounting for the full outcome distribution.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Industry Adoption extract
    JourneyFormer: Encoding Airbnb Guest Journey with Sequence Modeling
    JourneyFormer encodes the Airbnb guest journey via sequence modeling
    Algorithms & Theory Embeddings Inference
    Sequence modeling is increasingly popular in recommendation and ranking for its ability to model users' historical behaviors and infer intentions. This work proposes JourneyFormer, which encodes the Airbnb guest journey with sequence modeling to better understand behavior and improve recommendations.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    ARIADNE: Agnostic Routing for Inference-time Adapter DyNamic sElection
    ARIADNE: agnostic routing for inference-time adapter selection
    Embeddings Fine-tuning Inference Llama Retrieval-Augmented Generation (RAG)
    Widespread parameter-efficient fine-tuning yields ecosystems where one backbone pairs with many task-specialized adapters. ARIADNE provides agnostic routing for inference-time dynamic adapter selection, choosing the right adapter per input without model-specific assumptions.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Developer Tools extract
    RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents
    RODS: reward-driven online data synthesis for tool-use agents
    AI Agents Inference Reinforcement Learning
    Multi-turn tool-use RL is bottlenecked by the rapid depletion of informative samples in static datasets. Observing that GRPO's gradient signal concentrates on certain tasks, RODS performs reward-driven online data synthesis to continually supply informative samples for multi-turn tool-use agents.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Infrastructure & Hardware extract
    FoMoE: Breaking the Full-Replica Barrier with a Federation of MoEs
    FoMoE breaks the full-replica barrier with a federation of MoEs
    Mixture of Experts (MoE) Neural Network
    Pretraining LLMs typically demands large-scale infrastructure with tightly coupled accelerators. As model and data scale grow, FoMoE proposes a federation of Mixture-of-Experts that avoids replicating the full model across devices, breaking the full-replica barrier and easing infrastructure constraints.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Inference & Efficiency extract
    Decoupling Search from Reasoning: A Vendor-Agnostic Grounding Architecture for LLM Agents
    Decoupling search from reasoning: a vendor-agnostic grounding architecture
    AI Agents Deep Learning Model Context Protocol (MCP) Reinforcement Learning Software Engineering
    Production LLM agents increasingly depend on real-time search but get locked into vendor-specific grounding. This work decouples search from reasoning with a vendor-agnostic grounding architecture, letting search backends be swapped while preserving reasoning quality.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    Graph-ESBMC-PLC: Formal Verification of Graphical PLCopen XML Ladder Diagram Programs Using SMT-Based Model Checking
    Graph-ESBMC-PLC: SMT-based verification of PLCopen ladder diagrams
    Inference Machine Learning Neural Network
    PLCopen XML defines encodings for IEC 61131-3 Ladder Diagrams. Graph-ESBMC-PLC applies SMT-based model checking to formally verify graphical PLCopen XML Ladder Diagram programs, supporting correctness checking of industrial control software.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Inference & Efficiency extract
    REVES: REvision and VErification--Augmented Training for Test-Time Scaling
    REVES: revision- and verification-augmented training for test-time scaling
    Inference Reinforcement Learning Software Engineering
    Test-time scaling via sequential revision has become a powerful paradigm. REVES proposes revision- and verification-augmented training that strengthens a model ability to revise and verify its own outputs, making extra test-time compute more effective.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Developer Tools extract
    Learning Robust Pair Confidence for Multimodal Emotion-Cause Pair Extraction
    Learning robust pair confidence for multimodal emotion-cause extraction
    Inference Retrieval-Augmented Generation (RAG)
    Multimodal emotion-cause pair extraction requires reliable pairing of emotions and their causes. This work learns robust pair confidence, yielding emotion-cause extraction that is more resilient to noise and ambiguity.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Improving Medical Communication using Rubric-Guided Counterfactual Recommendations
    Rubric-guided counterfactual recommendations for medical communication
    Inference Meta
    Text-based telemedicine increasingly relies on lightweight patient feedback. This work improves medical communication using rubric-guided counterfactual recommendations, enhancing the quality of patient-clinician interactions.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Inference & Efficiency extract
    Efficient Financial Language Understanding via Distillation with Synthetic Data
    Efficient financial language understanding via distillation with synthetic data
    Neural Network Natural Language Processing (NLP) Reinforcement Learning
    Large instruction-following models are powerful but costly to deploy, especially in finance. This work distills capabilities using synthetic data to build lightweight models that understand financial language efficiently.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Inference & Efficiency extract
    Approximate Structured Diffusion for Sequence Labelling
    Approximate structured diffusion for sequence labelling
    Inference Machine Learning Neural Network Natural Language Processing (NLP) Retrieval-Augmented Generation (RAG)
    Sequence labelling is a core NLP task. This work proposes an approximate structured diffusion approach that models label dependencies while keeping sequence labelling efficient.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    Beyond Scalar Scores: Exploring LLM-based Metrics for Clinical Significance Evaluation in Radiology Reports
    Beyond scalar scores: LLM-based metrics for radiology report significance
    Inference Machine Learning
    Reliable evaluation of generated radiology reports requires strict clinical validity. Going beyond scalar scores, this work explores LLM-based metrics for clinical significance evaluation, assessing report quality in clinically meaningful terms.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Lost in a Single Vector: Improving Long-Document Retrieval with Chunk Evidence Aggregation
    Improving long-document retrieval with chunk evidence aggregation
    Deep Learning Inference Reinforcement Learning
    Dense retrieval matches one query vector against one document vector, but long documents get lost in a single vector. This work splits documents into chunks and aggregates per-chunk evidence to improve long-document retrieval.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Infrastructure & Hardware extract
    Morpheus: A Morphology-Aware Neural Tokenizer and Word Embedder for Turkish
    Morpheus: a morphology-aware neural tokenizer and embedder for Turkish
    Embeddings Inference Retrieval-Augmented Generation (RAG)
    Turkish is agglutinative, with meaning carried by morphemes that subword tokenizers fail to capture. Morpheus is a morphology-aware neural tokenizer and word embedder designed to improve Turkish language processing.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • Cohere Blog · EN Inference & Efficiency extract
    LLM Serving Fairness: No more noisy neighbours
    Cohere ensures fair compute sharing across LLM serving tenants
    Deep Learning Inference Meta Neural Network Reinforcement Learning
    Cohere details how it ensures every tenant gets a fair share of compute in LLM serving, tackling the 'noisy neighbour' problem where one user monopolizes resources. The design allocates capacity fairly across tenants to deliver stable, predictable multi-tenant performance.
    Read original (Cohere Blog) ↗
  • ITmedia AI+ · JA Inference & Efficiency extract
    東芝の組み込み向け量子インスパイアード技術が進化、高速化と安定性を両立
    Toshiba builds embedded quantum-inspired combinatorial optimizer
    Toshiba announced a 'quantum-inspired optimization framework' that solves combinatorial optimization problems quickly and stably even under constantly changing real-world conditions. Aimed at embedded use, it targets both speed and stability; the claims are Toshiba's and not independently verified.
    Read original (ITmedia AI+) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Industry Adoption extract
    Visual Verification Enables Inference-time Steering and Autonomous Policy Improvement
    VERITAS steers and self-improves robot policies at inference time
    Inference Reinforcement Learning
    The paper proposes VERITAS, a generator-verifier framework pairing a pre-trained generalist robot policy with a gradient-free visual verifier that evaluates actions at inference time, improving performance without extra training and enabling self-improvement.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Inference & Efficiency extract
    Variable-Width Transformers
    Variable-width transformer cuts FLOPs ~22% via x-shaped layer widths
    Deep Learning Mixture of Experts (MoE) Retrieval-Augmented Generation (RAG) Reinforcement Learning Transformer
    The paper proposes an x-shaped transformer that keeps early and late layers wide while narrowing the middle, using a parameter-free residual resizing mechanism. Across dense 200M-2B and 3B MoE decoder-only models it outperforms parameter-matched uniform baselines and reduces FLOPs by about 22% under loss-matched scaling, with smaller KV cache.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients
    Zone of Proximal PPO puts the teacher in prompts, not gradients
    Reinforcement Learning
    Knowledge distillation is brittle for small students, as imitating a large teacher's logits concentrates on its sharpest modes and hurts generalization. The proposed Zone of Proximal Policy Optimization places the teacher in prompts rather than gradients to improve small-student generalization.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN Inference & Efficiency extract
    Rethinking Dataset Distillation for Classification: Do Distilled Sets Outperform Coresets?
    Do distilled sets beat coresets? Rethinking dataset distillation
    Machine Learning Retrieval-Augmented Generation (RAG)
    Dataset distillation synthesizes compact training sets for data-centric machine learning. This paper rethinks distillation for classification, asking whether distilled sets actually outperform coresets (real-data subsets) and under what conditions.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    RubricsTree: Scalable and Evolving Open-Ended Evaluation of Personal Health Agents across Health Memory and Medical Skills
    RubricsTree: scalable open-ended evaluation of personal health agents
    AI Agents Gemini GPT Meta Neural Network
    LLM personal health agents using sensor metrics promise to ease healthcare disparities, but an open-ended evaluation bottleneck limits clinical deployment. RubricsTree offers scalable, evolving open-ended evaluation across health memory and medical skills.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Learning from the Self-future: On-policy Self-distillation for dLLMs
    On-policy self-distillation explored for diffusion LLMs
    Deep Learning Fine-tuning Reinforcement Learning Software Engineering
    On-policy self-distillation (OPSD) helps post-training of LLMs but is unexplored for diffusion LLMs (dLLMs). Existing OPSD methods are autoregressive-centric, injecting privileged information via left-to-right prefix conditioning; this work studies self-distillation suited to dLLMs.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    Kolmogorov Regression for Robust Diffusion Policies
    Kolmogorov regression yields robust diffusion policies
    Inference Neural Network Reinforcement Learning
    Finite-dimensional diffusion policies suffer temporal drift from discretization that degrades long-horizon performance. The paper introduces a backward Kolmogorov equation that lifts diffusion policies into a Cameron-Martin space to make them more robust.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    Knowledge Reutilization in Meta-Reinforcement Learning
    A meta-knowledge reutilization framework for meta-RL across agents
    AI Agents Inference Meta Reinforcement Learning
    The paper proposes a meta-knowledge reutilization framework for meta-reinforcement learning that learns task-level knowledge on a dynamics-simplified agent and transfers it to heterogeneous agents, using a Bayesian non-parametric prior to organize latent task modes.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Inference & Efficiency extract
    Embedded Machine Learning for Microcontroller-Class Edge Devices: Data, Feature, Evaluation, and Deployment Pipelines
    A pipeline survey of embedded ML for microcontroller-class devices
    Inference Machine Learning Quantization
    Embedded machine learning moves inference from the cloud to resource-constrained devices. This practice-oriented synthesis lays out data, feature, evaluation and deployment pipelines for an embedded ML workflow on microcontroller-class platforms.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Inference & Efficiency extract
    Ternary Mamba: Grouped Quantization-Aware Training of W1.58A16 State Space Models
    Ternary Mamba: grouped QAT for W1.58A16 state space models
    Inference Quantization Retrieval-Augmented Generation (RAG) Transformer
    Ternary Mamba applies grouped quantization-aware training to Mamba state space models with ternary (W1.58) weights and 16-bit activations, targeting efficient low-bit training and inference of sequence models while preserving accuracy.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    Querying an astronomical database using large language models: the ALeRCE text-to-SQL system
    A text-to-SQL system for querying the ALeRCE astronomical database
    Claude Gemini GPT Inference
    The paper develops an LLM-based text-to-SQL system using in-context learning, applied to the ALeRCE astronomical broker database, generating executable SQL from natural language and evaluated on a dataset of 110 NL/SQL pairs via step-by-step generation.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗