New Model Releases A

Showing 151–180 of 250
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    Beyond Visual Cues: CoT-Enhanced Reasoning for Semi-supervised Medical Image Segmentation
    CoT-enhanced reasoning for semi-supervised medical image segmentation
    Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning
    Semi-supervised medical image segmentation mitigates annotation scarcity via consistency regularization but relies mostly on pixel-level visual matching. The paper adds chain-of-thought-enhanced reasoning to go beyond visual cues for segmentation.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    KANLib -- An Modular, Extensible and Fast Kolmogorov-Arnold Network Implementation
    KANLib: a modular, extensible and fast KAN implementation
    Kolmogorov-Arnold Networks replace linear weights with learnable univariate functions but their high computational cost hampers practical research. KANLib provides a modular, extensible and fast implementation of KANs to ease experimentation.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Non-negative Elastic Net Decoding for Information Retrieval
    Non-negative elastic net decoding for information retrieval
    Deep Learning Embeddings Neural Network
    Dense retrieval has become the dominant paradigm in information retrieval. The paper applies non-negative elastic net decoding to information retrieval, aiming to improve retrieval representations and accuracy.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions
    ChLogic evaluates logical reasoning robustness in Chinese
    LLMs do well on standardized logical reasoning benchmarks, but whether this holds beyond English is unclear. ChLogic is an English-Chinese aligned benchmark testing whether models preserve logical reasoning when the same latent structure is expressed in Chinese.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning Models
    Dynamic rollout editing reduces overthinking in RL reasoning models
    Neural Network Reinforcement Learning Software Engineering
    Long chain-of-thought reasoning helps, but models often keep generating unnecessary reasoning after reaching a correct answer. Framing this as overthinking in GRPO-style RL post-training, the paper proposes dynamic rollout editing to reduce it.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    AnchorKV: Safety-Aware KV Cache Compression via Soft Penalty with a Refusal Anchor
    AnchorKV: safety-aware KV cache compression via soft penalties
    Inference Reinforcement Learning
    AnchorKV is a safety-aware KV cache compression method that uses soft penalties (anchors) to retain important key-value entries while reducing memory. Summary is largely title-based; details are as presented by the source and not independently verified.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    WallZero: Mastering the Game of WallGo with Strategic Analysis
    WallZero masters the board game WallGo with strategic analysis
    Meta Retrieval-Augmented Generation (RAG) Reinforcement Learning
    WallGo is a recently introduced strategic board game. WallZero masters WallGo through an approach incorporating strategic analysis, demonstrating game-playing performance and strategic insights.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Multimodal extract
    Qwen-RobotManip Technical Report: Alignment Unlocks Scale for Robotic Manipulation Foundation Models
    Qwen-RobotManip: alignment unlocks scale for robot manipulation models
    Computer Vision
    Language and multimodal foundation models generalize by aligning heterogeneous data under a unified formulation and training at scale. This technical report investigates applying that recipe to robotic manipulation, arguing alignment unlocks scale for manipulation foundation models.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Environment-Grounded Automated Prompt Optimization for LLM Game Agents
    Environment-grounded automated prompt optimization for LLM game agents
    AI Agents Fine-tuning Reinforcement Learning
    LLM agents in interactive environments are sensitive to prompts, yet prompt engineering stays manual and task-specific. The paper decomposes the observation-to-action pipeline and proposes an environment-grounded automated prompt optimization framework for LLM game agents.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    From Drift to Coherence: Stabilizing Beliefs in LLMs
    From drift to coherence: stabilizing beliefs in LLMs
    Fine-tuning Inference Reinforcement Learning Software Engineering
    LLMs are hypothesized to perform implicit Bayesian inference, yet the martingale property of predictive beliefs has been shown to fail in synthetic in-context learning. Revisiting this in typical regimes like multiple-choice QA, the paper studies how to stabilize beliefs from drift to coherence.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    When Multiple Scripts Matter: Evaluating ASR in Clinical Settings
    Evaluating ASR in clinical settings when multiple scripts matter
    Meta Speech Processing
    Automatic speech recognition in non-English clinical settings faces multiscript variability, where a term appears in multiple valid orthographies. String-matching metrics treat variants as errors and underestimate performance; the paper studies ASR evaluation when multiple scripts matter.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Developer Tools extract
    Beyond Native Success: Auditing Deployment-Interface Exposure of CLIP Backdoors
    Auditing deployment-interface exposure of CLIP backdoors
    Neural Network Reinforcement Learning
    CLIP models are reused across downstream interfaces including feature extraction, retrieval, reranking and selection. Existing CLIP backdoors are validated on small attack-native tasks; the paper audits backdoor exposure across deployment interfaces beyond native success.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    Toward Accessible Psychotherapy Training Using AI-Driven Interactive Patient Avatars
    AI-driven patient avatars for more accessible psychotherapy training
    GPT
    Training psychotherapists in evidence-based interventions like Acceptance and Commitment Therapy needs repeated practice with feedback, limited by ethical, logistical and resource constraints. The paper introduces AI-driven interactive patient avatars to make such training more accessible.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Multimodal extract
    Vision-language models for chest radiography do not always need the image
    Vision-language models for chest radiography do not always need the image
    Computer Vision Inference Software Engineering
    Medical vision-language models combine images and text for reporting. For chest radiography, the paper shows these models do not always need the image to make predictions, and discusses the implications for evaluation and clinical use.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    EComAgentBench: Benchmarking Shopping Agents on Long-Horizon Tasks with Distributed Hidden Intent
    EComAgentBench: shopping agents on long-horizon tasks with hidden intent
    AI Agents Software Engineering
    As LLM-based shopping agents reach production, existing benchmarks miss how requirements arrive: implicitly, in a profile, or only when the right question is asked. EComAgentBench evaluates shopping agents on long-horizon tasks with distributed hidden intent.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • ITmedia AI+ · JA New Model Releases extract
    OpenAIの高度AIでソフトバンクの脆弱性を1万件発見 孫正義氏「大変な危機」 日本の重要インフラ企業へ診断サービス提供
    SoftBank unveils OpenAI-powered Patching-as-a-Service security offering
    GPT OpenAI
    SoftBank Group announced "Patching as a Service" on June 16, a cybersecurity offering built on OpenAI technologies such as "GPT-5.5 Cyber." It simulates attacks on corporate systems to find vulnerabilities, then proposes remediation plans and implementation end-to-end. SoftBank says it will prioritize select firms supporting Japan's critical infrastructure, while chairman Masayoshi Son stressed the gravity of the cyber threat.
    Read original (ITmedia AI+) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    LLMs Infer Cultural Context but Fail to Apply It When Responding
    LLMs infer cultural context but fail to apply it when responding
    Inference Retrieval-Augmented Generation (RAG) Reinforcement Learning Software Engineering
    LLMs are known to overrepresent dominant, often Western cultures while marginalizing others. The paper evaluates how this affects culturally adapted response generation, finding that models can infer cultural context but fail to apply it when responding.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    SuCo: Sufficiency-guided Continuous Adaptive Reasoning
    SuCo: sufficiency-guided continuous adaptive reasoning
    Fine-tuning Reinforcement Learning Software Engineering
    SuCo is a method for sufficiency-guided continuous adaptive reasoning that adapts the reasoning process to a necessary-and-sufficient extent, aiming to balance efficiency and accuracy. Summary is largely title-based; details are as presented by the source.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    Bridging Functional Correctness and Runtime Efficiency Gaps in LLM-Based Code Translation
    Bridging correctness and runtime efficiency in LLM code translation
    Neural Network Retrieval-Augmented Generation (RAG)
    LLMs have advanced the functional correctness of automated code translation, but runtime efficiency of translated programs has received little attention. As Moore's law wanes, the paper works to bridge the gap between functional correctness and runtime efficiency in LLM-based code translation.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning
    From trainee to trainer: LLM-designed RL training environments
    Gemini GPT Reinforcement Learning
    RL pipelines for LLM training often rely on manually redesigned environments between stages, forcing heuristic guesses about good configurations. The paper has the LLM itself design training environments for reinforcement learning with multi-agent reasoning, moving from trainee to trainer.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    MambaCount: Efficient Text-guided Open-vocabulary Object Counting with Spatial Sparse State Space Duality Block
    MambaCount: efficient open-vocabulary counting via state-space duality
    Reinforcement Learning Transformer
    Text-guided open-vocabulary object counting is hard in dense scenes with large scale variation, and existing Transformer methods are limited by quadratic complexity. MambaCount uses a spatial sparse state space duality block for efficient open-vocabulary object counting.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation
    OPD-Evolver cultivates self-evolving agents via on-policy distillation
    AI Agents
    Memory is a standard substrate for self-evolving agents, but retaining experience differs from learning how to evolve through it. OPD-Evolver uses on-policy distillation to cultivate a holistic agent evolver that selects useful experience, acts on it and writes reusable knowledge.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • OpenAI Blog · EN Safety & Evaluation extract
    Predicting model behavior before release by simulating deployment
    OpenAI unveils Deployment Simulation to predict model behavior pre-release
    OpenAI
    OpenAI introduced Deployment Simulation, a method to predict an AI model's behavior before deployment by using real conversation data to simulate responses, aiming to improve safety and evaluation accuracy. The claims are OpenAI's own and not independently verified.
    Read original (OpenAI Blog) ↗
  • Lobste.rs (AI tagged) · EN New Model Releases extract
    June Framework Memory and storage pricing updates
    Framework updates memory and storage pricing amid volatile market
    Retrieval-Augmented Generation (RAG)
    A Framework blog post reports updated memory and storage pricing for its desktop products amid a volatile memory market. It states the 128GB Framework Desktop has risen about $1,660 to $4,839, up from $2,000 at launch. The piece concerns hardware market dynamics rather than AI directly and reached the feed via lobste.rs.
    Read original (Lobste.rs (AI tagged)) ↗
  • arXiv cs.LG (Machine Learning) · EN Infrastructure & Hardware extract
    Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes
    HABC: hierarchical advantage weighting for RL fine-tuning of VLAs
    Fine-tuning Reinforcement Learning
    Online RL fine-tuning of pretrained VLA policies yields only one binary outcome per episode, yet actor updates need per-transition signals. The authors argue a single scalar conflates viability and efficiency and that mixing autonomous and intervention segments misassigns credit. Their method, Hierarchical Advantage-Weighted Behavior Cloning (HABC), trains separate critic heads for the two objectives on different data subsets.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Benchmarking LLM Agents on Meta-Analysis Articles from Nature Portfolio
    A benchmark for LLM agents on Nature Portfolio meta-analyses
    AI Agents Meta Retrieval-Augmented Generation (RAG)
    This work introduces a benchmark that evaluates LLM agents on meta-analysis articles from Nature Portfolio. The article excerpt was unavailable, so this summary is limited to a neutral description based on the title.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Inference & Efficiency extract
    KVEraser: Learning to Steer KV Cache for Efficient Localized Context Erasing
    KVEraser edits the KV cache to erase context efficiently
    Fine-tuning Reinforcement Learning
    Erasing a span from a long-context KV cache is costly because a local edit propagates to all later tokens, forcing recomputation of the suffix. KVEraser instead replaces only the erased interval's KV states with learned steering states while reusing the rest of the cache. A two-stage training pipeline teaches a transferable erasing mechanism for stale facts, wrong tool outputs, or prompt injections.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    DEEPRUBRIC: Evidence-Tree Rubric Supervision for Efficient Reinforcement Learning of Deep Research Agents
    DeepRubric: evidence-tree rubrics to boost deep-research agent RL
    AI Agents Reinforcement Learning
    DeepRubric is a data-construction framework for RL of deep research agents that reverses the usual query-to-rubric flow: starting from a seed topic it builds an evidence tree to decide what an evidence-backed report should be judged on, then synthesizes aligned query-rubric pairs for more reliable reward supervision.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    HAMON: Passive Optical Sequence Mixing for Long-Horizon Forecasting
    HAMON: a passive optical core for long-horizon forecasting
    Inference Neural Network Transformer
    HAMON is a passive diffractive optical forecasting core: history is encoded onto an optical aperture and cascaded trainable phase masks with free-space diffraction shape the forecast directly in the output field. Inference is a single passive optical pass with no digital sequence-mixing layer, yet it beats strong digital baselines on ETTm2.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Multimodal extract
    FusionRS: A Large-Scale RGB-Infrared Remote Sensing Dataset for Dual-Modal Vision-Language Foundation Models
    FusionRS: a large-scale RGB-infrared-text remote sensing dataset
    Computer Vision
    Noting that remote-sensing vision-language models remain RGB-centric, the paper introduces FusionRS, described as the first large-scale RGB-infrared-text dataset for dual-modal learning. It is built by translating public RGB images into infrared-style counterparts, pairing each with conventional and infrared-aware captions.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗