New Model Releases A
Showing 61–90 of 267
-
Freeing the Law with LOCUS: A Local Ordinance Corpus for the United StatesLOCUS releases a US local-ordinance corpus for legal AIProgress in legal AI depends on authoritative legal text at scale, yet US local ordinances—a consequential layer of American law—are largely missing from machine-readable corpora. The authors build LOCUS, a corpus of US local ordinances, to broaden legal-AI research data.
-
UBP2: Uncertainty-Balanced Preference Planning for Efficient Preference-based Reinforcement LearningUBP2: uncertainty-balanced planning for efficient preference-based RLPreference-based RL learns reward models from pairwise behavior comparisons, bypassing explicit reward design, but existing methods often rely on passive data collection. UBP2 introduces uncertainty-balanced preference planning to actively select comparisons and learn efficiently from fewer preferences.
-
Optimal scenario design for climate emulationOptimal scenario design improves climate emulation surrogatesAs deep learning for physical systems grows, efforts to improve generalizability have focused on architectures embedding physical constraints. This work instead studies optimal scenario design for machine-learning surrogate models of climate, improving generalization and predictive accuracy.
-
Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action ModelsMeasuring commonsense and knowledge retention in VLA modelsEmbodied Vision-Language-Action (VLA) models are typically obtained by fine-tuning powerful pretrained VLMs on robotics data, yet how much commonsense and factual knowledge they retain is unclear. This work measures that retention, revealing how much fine-tuning erodes prior world knowledge.
-
A Multi-Domain Benchmark for Detecting AI-Generated Text-Rich Images from GPT-Image-2A multi-domain benchmark to detect GPT-Image-2 text-rich imagesText-rich images often hold privacy-sensitive, transactional, or decision-relevant information. As multimodal generators synthesize realistic text and layouts, this work builds a multi-domain benchmark for detecting AI-generated text-rich images from GPT-Image-2, assessing detector reliability.
-
X+Slides: Benchmarking Audience-Conditioned Slide GenerationX+Slides benchmarks audience-conditioned slide generationAutomatically generating slide decks from documents is an important LLM application, but existing benchmarks mainly assess completeness and technical depth. X+Slides introduces a benchmark for audience-conditioned slide generation, evaluating how well decks adapt to their intended audience.
-
SCAN: Enhance Time Series Anomaly Detection via Multi-Scale Neighborhood-Centered ClusteringSCAN boosts time-series anomaly detection via neighborhood clusteringTime-series anomaly detection is crucial across applications, and reconstruction-based methods dominate but suffer from over-generalization that reconstructs anomalies too well. SCAN uses multi-scale neighborhood-centered clustering to curb this over-generalization and improve detection.
-
OneCanvas: 3D Scene Understanding via Panoramic ReprojectionOneCanvas enables VLM 3D scene understanding via panoramic reprojectionExisting 3D scene understanding in VLMs relies on complex, model-specific geometry encoders or large training budgets for spatial reasoning. OneCanvas instead uses panoramic reprojection, letting VLMs reason about 3D scenes efficiently without dedicated geometry encoders or heavy training.
-
Acceleration of an algebraic multigrid pressure solver using graph neural networksGraph neural networks accelerate an algebraic multigrid pressure solverSolving the pressure-Poisson equation is the main bottleneck in incompressible unstructured flow solvers, as traditional linear solvers are sensitive to mesh irregularities. This work uses graph neural networks to accelerate an algebraic multigrid pressure solver, improving solve efficiency.
-
Transformer Geometry Observatory TGO-I: Spectral Geometry ObservatoryTGO-I: a spectral geometry observatory for Vision TransformersDespite the wide adoption and success of Vision Transformers, understanding of their dimensional and representational geometry remains limited. The Transformer Geometry Observatory (TGO-I) studies ViTs through spectral geometry, observing and analyzing the structure of their representation spaces.
-
A Taxonomy of Mental Health and Technology Needs for Alzheimer's and Dementia CaregiversA taxonomy of mental-health and tech needs for dementia caregiversFamily members caring for people with Alzheimer's and related dementias form the foundation of long-term care worldwide; in 2023 over 11 million U.S. relatives provided unpaid care. This work presents a taxonomy of caregivers' mental-health and technology needs to guide supportive design.
-
TxBench-PP: Analyzing AI Agent Performance on Small-Molecule Preclinical PharmacologyTxBench-PP evaluates AI agents on preclinical pharmacologyAI agents promise to accelerate drug discovery by compressing interpretation and decision loops, but deployment needs trusted evaluation on realistic tasks. TxBench-PP is a benchmark analyzing AI agent performance on small-molecule preclinical pharmacology, assessing their practical reliability.
-
Machine Unlearning for the XGBoost Model with Network Intrusion DatasetsMachine unlearning for XGBoost on network intrusion dataMachine unlearning removes specific data points from trained models without full retraining, but most research targets neural networks. This work studies machine unlearning for the XGBoost gradient-boosted tree model using network intrusion datasets, extending unlearning beyond deep models.
-
RECOM: A Validity Discrimination Tradeoff in Automatic Metrics for Open Ended Reddit Question AnsweringRECOM analyzes validity vs discrimination in automatic metricsAutomatic metrics are the default for evaluating LLM-generated text, yet a metric is quietly asked to do two jobs: tell genuine content alignment from surface coincidence (validity) and discriminate quality. Using open-ended Reddit QA, RECOM analyzes this validity–discrimination trade-off.
-
あなたのAWSのコストの問題がどこにあるか、AIが教えてくれる「AWS FinOps Agent」パブリックプレビュー開始AWS launches a public preview of 'AWS FinOps Agent' for cost analysisAmazon Web Services has begun a public preview of the 'AWS FinOps Agent,' an AI agent that answers questions about AWS costs and investigates the causes of cost anomalies. It targets FinOps operations support. The specific feature scope and accuracy are per the article and announcement, unverified independently.
-
The More the Merrier: Combining Properties for ABox Abduction under Repair Semantics for ELbotCombining properties for ABox abduction under repair semanticsAbduction explains missing entailments from a knowledge base by proposing a hypothesis that would make them hold. This work studies ABox abduction under repair semantics for the EL description logic, combining multiple properties to produce stronger explanatory hypotheses.
-
When AUC Misleads: Polarization-Aware Evaluation of Deepfake Detectors under Domain ShiftPolarization-aware evaluation of deepfake detectors under domain shiftAdvances in diffusion models and face-swapping enable highly realistic deepfakes and real-world harm. This work shows AUC can mislead when evaluating detectors under domain shift, and proposes a polarization-aware evaluation that better reflects deepfake detector performance across domains.
-
Dango: A Strictly L1-Only Large Language Model for Studying Second Language AcquisitionDango: an L1-only 1.8B LLM for studying second-language acquisitionThe authors introduce Dango, a 1.8B-parameter language model designed for controlled studies of L1-to-L2 (Japanese-to-English) transfer in second language acquisition. By training strictly on L1 only, Dango enables controlled experiments on transfer phenomena that prior SLA model studies could not.
-
Beyond Safe Data: Pretraining-Stage Alignment with Regular Safety ReflectionPretraining-stage alignment via regular safety reflectionTo achieve deeper safety alignment for LLMs, recent work pushes safety interventions earlier into pretraining, mainly by filtering unsafe data or rewriting it into safe forms. Going beyond safe data, this work embeds regular safety reflection during pretraining to instill more fundamental alignment.
-
Essential Subspace Merging for Multi-Task LearningEssential subspace merging for multi-task model mergingModel merging integrates the capabilities of several models fine-tuned from the same pretrained checkpoint into one, enabling multi-task learning. This work proposes Essential Subspace Merging, which extracts and merges each task's essential subspace to reduce interference and preserve multi-task performance.
-
IndicContextEval: A Benchmark for Evaluating Context Utilisation in Audio Large Language Models Across 8 Indic LanguagesIndicContextEval: audio-LLM context use across 8 Indic languagesAudio LLMs can condition speech recognition on textual prompts such as domain descriptions or entity lists, but whether they truly use this context is unclear. IndicContextEval is a benchmark evaluating context utilisation in audio large language models across eight Indic languages.
-
Complementary Attention Head Pruning for Efficient TransformersComplementary attention-head pruning for efficient TransformersTransformers' success stems from architectural scaling, which inflates parameter counts and hinders deployment in resource-constrained settings. This work proposes complementary attention head pruning, removing heads so that retained ones stay complementary, preserving accuracy while improving efficiency.
-
OpenAnt: LLM-Powered Vulnerability Discovery Through Code Decomposition, Adversarial Verification, and Dynamic TestingOpenAnt: LLM-powered vulnerability discovery via code decompositionAutomated vulnerability discovery in large codebases is hard: static analysis yields high false positives while dynamic methods like fuzzing lack coverage. OpenAnt is an LLM-powered approach combining code decomposition, adversarial verification, and dynamic testing to surface real vulnerabilities.
-
OrthoReg: Orthogonal Regularization for Hybrid Symbolic-Neural Dynamical SystemsOrthoReg: orthogonal regularization for symbolic-neural dynamical systemsDynamical systems are fundamental to modeling the natural world, but modeling them trades off interpretable hand-specified mechanistic models against flexible yet opaque neural ones. OrthoReg introduces orthogonal regularization to disentangle symbolic and neural components in hybrid dynamical systems.
-
Human-AI Coevolution Dynamics: A Formal Theory of Social Intelligence Emergence Through Long-Term InteractionA formal theory of human-AI coevolution and social intelligenceConversational AI has advanced in language generation, personalization, and long-context interaction, but most methods model social behavior through isolated components. This work offers a formal theory of human-AI coevolution dynamics, explaining how social intelligence emerges through long-term interaction.
-
INDEQS: Informed Neural controlled Differential EQuationSINDEQS: informed neural controlled differential equations for forecastingNeural Controlled Differential Equations provide a powerful continuous-time framework for time-series forecasting, but standard graph-based extensions struggle to learn spatial structure. INDEQS introduces informed neural controlled differential equations to better capture structure and improve forecasting.
-
ProductConsistency: Improving Product Identity Preservation in Instruction-Based Image Editing via SFT and RLProductConsistency preserves product identity in instruction-based editingInstruction-based image editing enables complex edits from natural language, but in product-centric scenarios preserving product features and branding is hard. ProductConsistency uses supervised fine-tuning and reinforcement learning to improve product identity preservation during instruction-based editing.
-
Structure Over Nonlinearity: Explicit Interaction Architectures for Dynamical LearningExplicit interaction architectures for dynamical learningMost learning architectures for dynamical systems rely on generic nonlinear function approximation, often needing high complexity to capture structured behavior. Favoring structure over nonlinearity, this work proposes explicit interaction architectures that model variable interactions directly for efficient dynamical learning.
-
Context-Aware Optimization of Follow-Up Intervals for Type 2 Diabetes Care Using Markov Decision ProcessesOptimizing type-2 diabetes follow-up intervals with MDPsChronic disease management relies on regular patient-provider interactions to track progression and control. For Type 2 Diabetes, guidelines prescribe fixed follow-up intervals. This work uses Markov decision processes to optimize follow-up intervals in a context-aware way, tailoring scheduling to each patient.
-
ARIADNE: Agnostic Routing for Inference-time Adapter DyNamic sElectionARIADNE: agnostic routing for inference-time adapter selectionWidespread parameter-efficient fine-tuning yields ecosystems where one backbone pairs with many task-specialized adapters. ARIADNE provides agnostic routing for inference-time dynamic adapter selection, choosing the right adapter per input without model-specific assumptions.