New Model Releases A

Showing 91–120 of 260
  • arXiv cs.AI (Artificial Intelligence) · EN Infrastructure & Hardware extract
    FoMoE: Breaking the Full-Replica Barrier with a Federation of MoEs
    FoMoE breaks the full-replica barrier with a federation of MoEs
    Mixture of Experts (MoE) Neural Network
    Pretraining LLMs typically demands large-scale infrastructure with tightly coupled accelerators. As model and data scale grow, FoMoE proposes a federation of Mixture-of-Experts that avoids replicating the full model across devices, breaking the full-replica barrier and easing infrastructure constraints.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Sumi: Open Uniform Diffusion Language Model from Scratch
    Sumi: an open uniform diffusion language model from scratch
    Deep Learning Reinforcement Learning
    Diffusion models are a promising alternative to autoregressive ones, and uniform diffusion language models (UDLMs) let any token be updated at any step. This work releases Sumi, an open uniform diffusion language model built from scratch, supporting research and reproducibility in diffusion LMs.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    Spotlight: Synergizing Seed Exploration and Spot GPUs for DiT RL Post-Training
    Spotlight cuts DiT RL post-training cost with spot GPUs
    Deep Learning Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning Transformer
    Reinforcement learning post-training of Diffusion Transformers is prohibitively expensive, needing thousands of high-end GPUs. Spotlight synergizes seed exploration with cheap, preemptible spot GPUs to substantially reduce the cost of DiT RL post-training.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Enhancing Multilingual Reasoning via Steerable Model Merging
    Enhancing multilingual reasoning via steerable model merging
    Neural Network
    Model merging effectively composes the capabilities of a multilingual model and a reasoning model, achieving promising generalization on multilingual reasoning by aligning their feature spaces. This work introduces steerable model merging to control the composition and further boost multilingual reasoning.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    TRAP: Benchmark for Task-completion and Resistance to Active Privacy-extraction
    TRAP benchmarks agents on task completion and privacy resistance
    AI Agents Neural Network
    Agents are increasingly deployed in document-intensive workflows where sensitive private information is routine input—e.g., booking a flight needs passport numbers. TRAP is a benchmark evaluating agents on both task completion and resistance to active privacy-extraction attempts.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    ThinkDeception: A Progressive Reinforcement Learning Framework for Interpretable Multimodal Deception Detection
    ThinkDeception: interpretable multimodal deception detection via RL
    Machine Learning Neural Network Reinforcement Learning
    Existing multimodal deception detection relies on end-to-end black boxes that offer no transparent reasoning. ThinkDeception is a progressive reinforcement learning framework that explicitly captures subtle cross-modal cues and produces interpretable reasoning trajectories for deception detection.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Beyond Tokenization: Direct Timestep Embedding and Contrastive Alignment for Time-Series Question Answering
    Direct timestep embedding and contrastive alignment for time-series QA
    Embeddings Machine Learning Retrieval-Augmented Generation (RAG) Software Engineering
    Time-series question answering casts analysis as natural-language QA. Instead of tokenizing the series, this work embeds timesteps directly and uses contrastive alignment to match language representations, avoiding the information loss of tokenization.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    CAPRA: Scaling Feedback on Software Architecture Deliverables with a Multi-Agent LLM System
    CAPRA: a multi-agent LLM system for software architecture feedback
    AI Agents GPT Machine Learning Software Engineering
    Automated assessment in software engineering education has advanced, but giving quality feedback on architecture deliverables remains hard. CAPRA is a multi-agent LLM system that scales detailed feedback on software architecture deliverables.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    SenFlow: Inter-Sentence Flow Modeling for AI-Generated Text Detection in Hybrid Documents
    SenFlow: inter-sentence flow modeling for AI-text detection
    DeepSeek Retrieval-Augmented Generation (RAG)
    Sentence-level AI-generated text detection is hard in hybrid human-AI documents. SenFlow models inter-sentence flow to capture discontinuities, improving detection of AI-generated sentences within mixed documents.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    SciRisk-Bench: A Risk-Dimension-Aware Benchmark for AI4Science Safety
    SciRisk-Bench: a risk-dimension-aware benchmark for AI4Science safety
    Neural Network Reinforcement Learning Software Engineering
    As LLMs become embedded in scientific research, evaluating their safety matters. SciRisk-Bench is a risk-dimension-aware benchmark that assesses the safety of LLMs in AI-for-science settings across multiple risk categories.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    SAGE: Stochastic Prompt Optimization via Agent-Guided Exploration
    SAGE: stochastic prompt optimization via agent-guided exploration
    Context engineering has become a primary lever for improving AI systems. SAGE is a stochastic prompt optimization method that uses agent-guided exploration to automatically discover effective prompts and improve task performance.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Improving Medical Communication using Rubric-Guided Counterfactual Recommendations
    Rubric-guided counterfactual recommendations for medical communication
    Inference Meta
    Text-based telemedicine increasingly relies on lightweight patient feedback. This work improves medical communication using rubric-guided counterfactual recommendations, enhancing the quality of patient-clinician interactions.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • OpenAI Blog · EN New Model Releases extract
    A near-autonomous AI chemist improves a challenging reaction in medicinal chemistry
    OpenAI's near-autonomous AI chemist improves a key medicinal reaction
    GPT OpenAI
    OpenAI and Molecule.one describe a near-autonomous AI chemist, built on GPT-5.4, that improved a challenging reaction in medicinal chemistry. Framed as advancing drug-discovery research; specific performance figures are article-based and not independently verified.
    Read original (OpenAI Blog) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents
    GateMem: benchmarking memory governance in shared-memory agents
    AI Agents Neural Network
    Memory benchmarks for LLM agents largely assume single-user settings, leaving shared-memory governance untested. GateMem benchmarks memory governance, such as access control and management, in multi-principal shared-memory agents.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • ITmedia AI+ · JA New Model Releases extract
    Cursor、Gitホスティング「Origin」発表 SpaceXによる買収発表直後に
    Cursor unveils 'Origin' Git hosting, seen as a GitHub rival
    Cursor, the AI coding tool, announced 'Origin', a Git hosting service that the article frames as aimed at rivaling GitHub. The reveal reportedly came right after news of SpaceX acquiring Cursor. Acquisition terms and Origin's features are article-based, and third-party verification is unconfirmed.
    Read original (ITmedia AI+) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    HandwritingAgent: Language-Driven Handwriting Synthesis in Scalable Vector Space
    HandwritingAgent: language-driven handwriting synthesis in vector space
    Deep Learning Neural Network Retrieval-Augmented Generation (RAG)
    Emulating natural handwriting styles remains an open problem. HandwritingAgent synthesizes handwriting in a scalable vector space from language-driven instructions, enabling generation of diverse, resolution-independent handwriting styles.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    RedactionBench
    RedactionBench: a benchmark for redacting sensitive information
    Neural Network Reinforcement Learning
    Large language models are increasingly applied to sensitive domains. RedactionBench evaluates how well models redact sensitive information in such settings, supporting verification toward safer deployment.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Lost in a Single Vector: Improving Long-Document Retrieval with Chunk Evidence Aggregation
    Improving long-document retrieval with chunk evidence aggregation
    Deep Learning Inference Reinforcement Learning
    Dense retrieval matches one query vector against one document vector, but long documents get lost in a single vector. This work splits documents into chunks and aggregates per-chunk evidence to improve long-document retrieval.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    SAMA: Semantic Anchor-aligned Augmentation for Unified Low-Resource Multimodal Information Extraction
    SAMA: semantic anchor-aligned augmentation for low-resource multimodal IE
    Machine Learning Retrieval-Augmented Generation (RAG)
    Multimodal information extraction spans many tasks but suffers from scarce data in low-resource settings. SAMA proposes semantic anchor-aligned augmentation to unify and improve multimodal information extraction under low-resource conditions.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Policy & Regulation extract
    Output Vector Editing for Memorization Mitigation in Large Language Models
    Output vector editing for memorization mitigation in LLMs
    Llama Machine Learning
    Large language models memorize and reproduce sequences from their training data. This work edits output vectors to mitigate such memorization, reducing the risk of leaking copyrighted or private content.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Attention as Frustrated Synchronization
    Attention as frustrated synchronization
    Transformer
    A network of oscillators that synchronizes perfectly computes nothing. This work frames attention as frustrated synchronization, offering a physics-inspired view that interprets the workings of attention through partial, non-trivial synchronization.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    ForecastBench-Sim: A Simulated-World Forecasting Benchmark
    ForecastBench-Sim: a simulated-world forecasting benchmark
    Reinforcement Learning Software Engineering
    Forecasting benchmarks for general-purpose AI usually inherit real-world events, making evaluation hard to control. ForecastBench-Sim introduces a simulated-world forecasting benchmark, enabling controlled assessment of AI forecasting ability.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • OpenAI Blog · EN New Model Releases extract
    Introducing LifeSciBench
    OpenAI launches LifeSciBench for life-science research tasks
    Deep Learning Reinforcement Learning
    OpenAI introduced LifeSciBench, an expert-authored and expert-reviewed benchmark for evaluating how AI systems handle real-world life science research tasks and decisions. It aims to rigorously assess AI's practical usefulness in life-science research.
    Read original (OpenAI Blog) ↗
  • Simon Willison's Weblog · EN New Model Releases extract
    datasette 1.0a34
    Datasette 1.0a34 released with in-interface row editing
    Neural Network
    Datasette 1.0a34 has been released. The headline feature is tooling to insert, edit, and delete rows directly within the Datasette interface, available on table pages so users can modify data without leaving the app.
    Read original (Simon Willison's Weblog) ↗
  • Publickey · JA New Model Releases extract
    GitLab、AIエージェント向けの次世代Git互換ソースコード管理サービス「Project Switch」発表。最大で50倍高速かつ半分のトークンで利用可能に
    GitLab unveils 'Project Switch,' a Git-compatible SCM service for AI agents
    AI Agents Machine Learning
    GitLab announced Project Switch, a next-generation Git-compatible source code management service aimed at AI agents, at its GitLab Transcend event in London. Reports cite up to 50x speed and roughly half the token usage; figures reflect the announcement and remain unverified.
    Read original (Publickey) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    ReproRepo: Scaling Reproducibility Audits with GitHub Repository Issues
    ReproRepo scales reproducibility audits using GitHub repo issues
    AI Agents GPT Machine Learning Retrieval-Augmented Generation (RAG) Reinforcement Learning
    Reproducing results from papers and code is central to science but existing benchmarks are hard to scale. ReproRepo leverages GitHub repository issues to evaluate, at scale, how well LLM agents can assist with reproducibility tasks, addressing the manual effort that limits prior reproducibility benchmarks.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    EvolveNav: Proactive Preflection and Self-Evolving Memory for Zero-Shot Object Goal Navigation
    EvolveNav: a self-evolving framework for zero-shot object-goal navigation
    AI Agents Neural Network Retrieval-Augmented Generation (RAG)
    The paper proposes a self-evolving zero-shot object-goal navigation framework that builds an agentic rule memory by extracting actionable knowledge from past trajectories and uses a retrieval strategy to enable continuous test-time improvement.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Darshana Graph: A Parallel Commentary Corpus for Comparative Indian Philosophy, with Stylometric and Exploratory Graph Analyses
    Darshana Graph: a parallel commentary corpus for Indian philosophy
    Machine Learning Neural Network
    Darshana Graph is a corpus of over 125,000 text records spanning classical Hindu, Buddhist and Jain philosophical traditions, drawn from public-domain and openly licensed translations. It supports comparative Indian philosophy through stylometric and exploratory graph analyses.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients
    Zone of Proximal PPO puts the teacher in prompts, not gradients
    Reinforcement Learning
    Knowledge distillation is brittle for small students, as imitating a large teacher's logits concentrates on its sharpest modes and hurts generalization. The proposed Zone of Proximal Policy Optimization places the teacher in prompts rather than gradients to improve small-student generalization.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Looped World Models
    Looped World Models refine latents iteratively for efficient sim
    Reinforcement Learning Transformer
    World models need deep computation for faithful long-horizon simulation, but deep models are costly and accumulate errors. LoopWM introduces the first looped architectures for world modelling, iteratively refining latent states to resolve this tension.
    Read original (arXiv cs.CL (Computation and Language)) ↗