New Model Releases (Page 6 of 9)｜AI/Tech News Trends

arXiv cs.LG (Machine Learning) · 2026-06-16 EN New Model Releases extract

Beyond Visual Cues: CoT-Enhanced Reasoning for Semi-supervised Medical Image Segmentation

CoT-enhanced reasoning for semi-supervised medical image segmentation

Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning

Semi-supervised medical image segmentation mitigates annotation scarcity via consistency regularization but relies mostly on pixel-level visual matching. The paper adds chain-of-thought-enhanced reasoning to go beyond visual cues for segmentation.

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-06-16 EN Safety & Evaluation extract

KANLib -- An Modular, Extensible and Fast Kolmogorov-Arnold Network Implementation

KANLib: a modular, extensible and fast KAN implementation

Kolmogorov-Arnold Networks replace linear weights with learnable univariate functions but their high computational cost hampers practical research. KANLib provides a modular, extensible and fast implementation of KANs to ease experimentation.

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-16 EN New Model Releases extract

Non-negative Elastic Net Decoding for Information Retrieval

Non-negative elastic net decoding for information retrieval

Deep Learning Embeddings Neural Network

Dense retrieval has become the dominant paradigm in information retrieval. The paper applies non-negative elastic net decoding to information retrieval, aiming to improve retrieval representations and accuracy.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-16 EN New Model Releases extract

ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions

ChLogic evaluates logical reasoning robustness in Chinese

LLMs do well on standardized logical reasoning benchmarks, but whether this holds beyond English is unclear. ChLogic is an English-Chinese aligned benchmark testing whether models preserve logical reasoning when the same latent structure is expressed in Chinese.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-16 EN New Model Releases extract

Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning Models

Dynamic rollout editing reduces overthinking in RL reasoning models

Neural Network Reinforcement Learning Software Engineering

Long chain-of-thought reasoning helps, but models often keep generating unnecessary reasoning after reaching a correct answer. Framing this as overthinking in GRPO-style RL post-training, the paper proposes dynamic rollout editing to reduce it.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.LG (Machine Learning) · 2026-06-16 EN Safety & Evaluation extract

AnchorKV: Safety-Aware KV Cache Compression via Soft Penalty with a Refusal Anchor

AnchorKV: safety-aware KV cache compression via soft penalties

Inference Reinforcement Learning

AnchorKV is a safety-aware KV cache compression method that uses soft penalties (anchors) to retain important key-value entries while reducing memory. Summary is largely title-based; details are as presented by the source and not independently verified.

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-06-16 EN New Model Releases extract

WallZero: Mastering the Game of WallGo with Strategic Analysis

WallZero masters the board game WallGo with strategic analysis

Meta Retrieval-Augmented Generation (RAG) Reinforcement Learning

WallGo is a recently introduced strategic board game. WallZero masters WallGo through an approach incorporating strategic analysis, demonstrating game-playing performance and strategic insights.

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.LG (Machine Learning) · 2026-06-16 EN Multimodal extract

Qwen-RobotManip Technical Report: Alignment Unlocks Scale for Robotic Manipulation Foundation Models

Qwen-RobotManip: alignment unlocks scale for robot manipulation models

Computer Vision

Language and multimodal foundation models generalize by aligning heterogeneous data under a unified formulation and training at scale. This technical report investigates applying that recipe to robotic manipulation, arguing alignment unlocks scale for manipulation foundation models.

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-16 EN New Model Releases extract

Environment-Grounded Automated Prompt Optimization for LLM Game Agents

Environment-grounded automated prompt optimization for LLM game agents

AI Agents Fine-tuning Reinforcement Learning

LLM agents in interactive environments are sensitive to prompts, yet prompt engineering stays manual and task-specific. The paper decomposes the observation-to-action pipeline and proposes an environment-grounded automated prompt optimization framework for LLM game agents.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.LG (Machine Learning) · 2026-06-16 EN New Model Releases extract

From Drift to Coherence: Stabilizing Beliefs in LLMs

From drift to coherence: stabilizing beliefs in LLMs

Fine-tuning Inference Reinforcement Learning Software Engineering

LLMs are hypothesized to perform implicit Bayesian inference, yet the martingale property of predictive beliefs has been shown to fail in synthetic in-context learning. Revisiting this in typical regimes like multiple-choice QA, the paper studies how to stabilize beliefs from drift to coherence.

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-16 EN Safety & Evaluation extract

When Multiple Scripts Matter: Evaluating ASR in Clinical Settings

Evaluating ASR in clinical settings when multiple scripts matter

Meta Speech Processing

Automatic speech recognition in non-English clinical settings faces multiscript variability, where a term appears in multiple valid orthographies. String-matching metrics treat variants as errors and underestimate performance; the paper studies ASR evaluation when multiple scripts matter.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-16 EN Developer Tools extract

Beyond Native Success: Auditing Deployment-Interface Exposure of CLIP Backdoors

Auditing deployment-interface exposure of CLIP backdoors

Neural Network Reinforcement Learning

CLIP models are reused across downstream interfaces including feature extraction, retrieval, reranking and selection. Existing CLIP backdoors are validated on small attack-native tasks; the paper audits backdoor exposure across deployment interfaces beyond native success.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-16 EN Safety & Evaluation extract

Toward Accessible Psychotherapy Training Using AI-Driven Interactive Patient Avatars

AI-driven patient avatars for more accessible psychotherapy training

GPT

Training psychotherapists in evidence-based interventions like Acceptance and Commitment Therapy needs repeated practice with feedback, limited by ethical, logistical and resource constraints. The paper introduces AI-driven interactive patient avatars to make such training more accessible.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-16 EN Multimodal extract

Vision-language models for chest radiography do not always need the image

Computer Vision Inference Software Engineering

Medical vision-language models combine images and text for reporting. For chest radiography, the paper shows these models do not always need the image to make predictions, and discusses the implications for evaluation and clinical use.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-16 EN Safety & Evaluation extract

EComAgentBench: Benchmarking Shopping Agents on Long-Horizon Tasks with Distributed Hidden Intent

EComAgentBench: shopping agents on long-horizon tasks with hidden intent

AI Agents Software Engineering

As LLM-based shopping agents reach production, existing benchmarks miss how requirements arrive: implicitly, in a profile, or only when the right question is asked. EComAgentBench evaluates shopping agents on long-horizon tasks with distributed hidden intent.

Read original (arXiv cs.CL (Computation and Language)) ↗

ITmedia AI+ · 2026-06-16 JA New Model Releases extract

OpenAIの高度AIでソフトバンクの脆弱性を1万件発見　孫正義氏「大変な危機」　日本の重要インフラ企業へ診断サービス提供

SoftBank unveils OpenAI-powered Patching-as-a-Service security offering

GPT OpenAI

SoftBank Group announced "Patching as a Service" on June 16, a cybersecurity offering built on OpenAI technologies such as "GPT-5.5 Cyber." It simulates attacks on corporate systems to find vulnerabilities, then proposes remediation plans and implementation end-to-end. SoftBank says it will prioritize select firms supporting Japan's critical infrastructure, while chairman Masayoshi Son stressed the gravity of the cyber threat.

Read original (ITmedia AI+) ↗

arXiv cs.CL (Computation and Language) · 2026-06-16 EN New Model Releases extract

LLMs Infer Cultural Context but Fail to Apply It When Responding

LLMs infer cultural context but fail to apply it when responding

Inference Retrieval-Augmented Generation (RAG) Reinforcement Learning Software Engineering

LLMs are known to overrepresent dominant, often Western cultures while marginalizing others. The paper evaluates how this affects culturally adapted response generation, finding that models can infer cultural context but fail to apply it when responding.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-16 EN New Model Releases extract

SuCo: Sufficiency-guided Continuous Adaptive Reasoning

SuCo: sufficiency-guided continuous adaptive reasoning

Fine-tuning Reinforcement Learning Software Engineering

SuCo is a method for sufficiency-guided continuous adaptive reasoning that adapts the reasoning process to a necessary-and-sufficient extent, aiming to balance efficiency and accuracy. Summary is largely title-based; details are as presented by the source.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-16 EN Safety & Evaluation extract

Bridging Functional Correctness and Runtime Efficiency Gaps in LLM-Based Code Translation

Bridging correctness and runtime efficiency in LLM code translation

Neural Network Retrieval-Augmented Generation (RAG)

LLMs have advanced the functional correctness of automated code translation, but runtime efficiency of translated programs has received little attention. As Moore's law wanes, the paper works to bridge the gap between functional correctness and runtime efficiency in LLM-based code translation.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-16 EN New Model Releases extract

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

From trainee to trainer: LLM-designed RL training environments

Gemini GPT Reinforcement Learning

RL pipelines for LLM training often rely on manually redesigned environments between stages, forcing heuristic guesses about good configurations. The paper has the LLM itself design training environments for reinforcement learning with multi-agent reasoning, moving from trainee to trainer.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-16 EN New Model Releases extract

MambaCount: Efficient Text-guided Open-vocabulary Object Counting with Spatial Sparse State Space Duality Block

MambaCount: efficient open-vocabulary counting via state-space duality

Reinforcement Learning Transformer

Text-guided open-vocabulary object counting is hard in dense scenes with large scale variation, and existing Transformer methods are limited by quadratic complexity. MambaCount uses a spatial sparse state space duality block for efficient open-vocabulary object counting.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-16 EN New Model Releases extract

OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation

OPD-Evolver cultivates self-evolving agents via on-policy distillation

AI Agents

Memory is a standard substrate for self-evolving agents, but retaining experience differs from learning how to evolve through it. OPD-Evolver uses on-policy distillation to cultivate a holistic agent evolver that selects useful experience, acts on it and writes reusable knowledge.

Read original (arXiv cs.CL (Computation and Language)) ↗

OpenAI Blog · 2026-06-16 EN Safety & Evaluation extract

Predicting model behavior before release by simulating deployment

OpenAI unveils Deployment Simulation to predict model behavior pre-release

OpenAI

OpenAI introduced Deployment Simulation, a method to predict an AI model's behavior before deployment by using real conversation data to simulate responses, aiming to improve safety and evaluation accuracy. The claims are OpenAI's own and not independently verified.

Read original (OpenAI Blog) ↗

Lobste.rs (AI tagged) · 2026-06-15 EN New Model Releases extract

June Framework Memory and storage pricing updates

Framework updates memory and storage pricing amid volatile market

Retrieval-Augmented Generation (RAG)

A Framework blog post reports updated memory and storage pricing for its desktop products amid a volatile memory market. It states the 128GB Framework Desktop has risen about $1,660 to $4,839, up from $2,000 at launch. The piece concerns hardware market dynamics rather than AI directly and reached the feed via lobste.rs.

Read original (Lobste.rs (AI tagged)) ↗

arXiv cs.LG (Machine Learning) · 2026-06-15 EN Infrastructure & Hardware extract

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

HABC: hierarchical advantage weighting for RL fine-tuning of VLAs

Fine-tuning Reinforcement Learning

Online RL fine-tuning of pretrained VLA policies yields only one binary outcome per episode, yet actor updates need per-transition signals. The authors argue a single scalar conflates viability and efficiency and that mixing autonomous and intervention segments misassigns credit. Their method, Hierarchical Advantage-Weighted Behavior Cloning (HABC), trains separate critic heads for the two objectives on different data subsets.

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-15 EN New Model Releases extract

Benchmarking LLM Agents on Meta-Analysis Articles from Nature Portfolio

A benchmark for LLM agents on Nature Portfolio meta-analyses

AI Agents Meta Retrieval-Augmented Generation (RAG)

This work introduces a benchmark that evaluates LLM agents on meta-analysis articles from Nature Portfolio. The article excerpt was unavailable, so this summary is limited to a neutral description based on the title.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-15 EN Inference & Efficiency extract

KVEraser: Learning to Steer KV Cache for Efficient Localized Context Erasing

KVEraser edits the KV cache to erase context efficiently

Fine-tuning Reinforcement Learning

Erasing a span from a long-context KV cache is costly because a local edit propagates to all later tokens, forcing recomputation of the suffix. KVEraser instead replaces only the erased interval's KV states with learned steering states while reusing the rest of the cache. A two-stage training pipeline teaches a transferable erasing mechanism for stale facts, wrong tool outputs, or prompt injections.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.CL (Computation and Language) · 2026-06-15 EN New Model Releases extract

DEEPRUBRIC: Evidence-Tree Rubric Supervision for Efficient Reinforcement Learning of Deep Research Agents

DeepRubric: evidence-tree rubrics to boost deep-research agent RL

AI Agents Reinforcement Learning

DeepRubric is a data-construction framework for RL of deep research agents that reverses the usual query-to-rubric flow: starting from a seed topic it builds an evidence tree to decide what an evidence-backed report should be judged on, then synthesizes aligned query-rubric pairs for more reliable reward supervision.

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.LG (Machine Learning) · 2026-06-15 EN New Model Releases extract

HAMON: Passive Optical Sequence Mixing for Long-Horizon Forecasting

HAMON: a passive optical core for long-horizon forecasting

Inference Neural Network Transformer

HAMON is a passive diffractive optical forecasting core: history is encoded onto an optical aperture and cascaded trainable phase masks with free-space diffraction shape the forecast directly in the output field. Inference is a single passive optical pass with no digital sequence-mixing layer, yet it beats strong digital baselines on ETTm2.

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-06-15 EN Multimodal extract

FusionRS: A Large-Scale RGB-Infrared Remote Sensing Dataset for Dual-Modal Vision-Language Foundation Models

FusionRS: a large-scale RGB-infrared-text remote sensing dataset

Computer Vision

Noting that remote-sensing vision-language models remain RGB-centric, the paper introduces FusionRS, described as the first large-scale RGB-infrared-text dataset for dual-modal learning. It is built by translating public RGB images into infrared-style counterparts, pairing each with conventional and infrared-aware captions.

Read original (arXiv cs.AI (Artificial Intelligence)) ↗