New Model Releases A

Showing 31–60 of 268
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    CATCH-ME if you RAG: a dataset of Contextually Annotated multi-Turn Counterspeech against Hate and Misinformation Exchanges
    CATCH-ME: a counterspeech dataset against hate and misinformation
    Neural Network Natural Language Processing (NLP) Retrieval-Augmented Generation (RAG) Reinforcement Learning Speech Processing
    The paper introduces CATCH-ME, a dataset of contextually annotated multi-turn counterspeech against overlapping hate speech and misinformation. It addresses NLP's tendency to treat the two threats in isolation and the tendency of zero-shot LLMs to produce repetitive, vague counterspeech.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    Critical Percolation as a Synthetic Data Model for Interpretability
    Critical percolation as a synthetic data model for interpretability
    Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning
    The paper introduces critical percolation as a synthetic data model for interpretability research. It builds a family of synthetic datasets with the hierarchical, multi-scale structure of natural data, addressing the gap that typical interpretability toy datasets lack such structure.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    Integrating national forest inventory, airborne lidar, and satellite imagery for wall-to-wall mapping of forest structure with computer vision
    Wall-to-wall forest structure mapping from inventory, lidar, imagery
    Computer Vision Neural Network
    The paper integrates national forest inventory data, airborne lidar, and satellite imagery with computer vision to produce wall-to-wall maps of forest structure. It targets the persistent need for annually updated, large-landscape maps to support forest and wildfire risk management.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    ELVA: Exploring Ranking-Driven Universal Multimodal Retrieval
    ELVA: ranking-driven universal multimodal retrieval
    Deep Learning Machine Learning Retrieval-Augmented Generation (RAG) Reinforcement Learning
    Leveraging multimodal large language models through contrastive learning has become mainstream for retrieval. ELVA explores a ranking-driven approach to universal multimodal retrieval.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Multimodal extract
    Lagrange: An Open-Vocabulary, Energy-Based Sparse Framework for Generalized End-to-End Driving
    Lagrange: open-vocabulary energy-based framework for end-to-end driving
    Computer Vision Machine Learning Neural Network Reinforcement Learning
    Scaling end-to-end autonomous driving to complex open-world settings demands strong perception. Lagrange offers an open-vocabulary, energy-based sparse framework for generalized end-to-end driving.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    Editorial Alignment: A Participatory Approach to Engaging Editorial Expertise in LLM-mediated Knowledge Dissemination
    Editorial alignment: engaging editorial expertise in LLM knowledge dissemination
    LLM-driven information services are reshaping how public knowledge is produced. This work proposes a participatory approach to engage editorial expertise in LLM-mediated knowledge dissemination.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    The Register Gap: A Meaning Intelligence Framework for Nigerian Public Discourse
    The Register Gap: a meaning intelligence framework for Nigerian discourse
    Deep Learning Gemini Neural Network Retrieval-Augmented Generation (RAG)
    This work introduces the Meaning Intelligence Framework, a nine-dimension annotation and evaluation scheme, to study the register gap in Nigerian public discourse.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Infrastructure & Hardware extract
    Navigating Unreliable Parametric and Contextual Knowledge: Explicit Knowledge Conflict Resolution for LLM Inference
    Explicit knowledge conflict resolution for LLM inference
    Inference Retrieval-Augmented Generation (RAG) Reinforcement Learning Software Engineering
    Large language models perform strongly across language tasks but can hold conflicting parametric and contextual knowledge. This work proposes explicit knowledge conflict resolution to navigate unreliable knowledge during inference.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Multimodal extract
    SPOT-E: Test-Time Entropy Shaping with Visual Spotlights for Frozen VLMs
    SPOT-E: test-time entropy shaping with visual spotlights for frozen VLMs
    Computer Vision Inference Reinforcement Learning Software Engineering
    Vision-language models often underperform on evidence-intensive tasks by missing decisive visual cues. SPOT-E applies test-time entropy shaping with visual spotlights to improve frozen VLMs.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    FlowMaps: Modeling Long-Term Multimodal Object Dynamics with Flow Matching
    FlowMaps: long-term multimodal object dynamics with flow matching
    AI Agents Reinforcement Learning
    Joint spatial and temporal understanding of 3D scenes is essential for deployed robots. FlowMaps models long-term multimodal object dynamics using flow matching.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Beyond Accuracy: Measuring Logical Compliance of Predictive Models
    Beyond accuracy: measuring logical compliance of predictive models
    Embeddings Machine Learning Reinforcement Learning
    Machine learning models are mostly evaluated through predictive metrics such as accuracy. This work goes beyond accuracy to measure the logical compliance of predictive models.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Off-Policy Evaluation for Missingness-Aware Policies in MDPs with Rewards Missing Not at Random
    Off-policy evaluation when rewards are missing not at random
    Reinforcement Learning
    The paper studies off-policy evaluation in finite-horizon MDPs when rewards are missing not at random, as in offline reinforcement learning with sparse, irregular, or censored reward records. It develops missingness-aware policies for settings such as health care and marketing.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    MedRLM: Recursive Multimodal Health Intelligence for Long-Context Clinical Reasoning, Sensor-Guided Screening, Evidence-Grounded Decision Support, and Community-to-Tertiary Referral Optimization
    MedRLM: recursive multimodal AI for long-context clinical reasoning
    AI Agents Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning Software Engineering
    The paper introduces MedRLM, a recursive multimodal health-intelligence system for long-context clinical reasoning, sensor-guided screening, evidence-grounded decision support, and community-to-tertiary referral optimization. It targets reasoning over heterogeneous, longitudinal patient data, beyond the single-step prompting or retrieval of current medical LLMs.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    NAMESAKES: Probing Identity Memorization in Text-to-Image Models
    NAMESAKES: probing identity memorization in text-to-image models
    Neural Network
    The paper introduces NAMESAKES, a study probing identity memorization in text-to-image models, which can generate realistic likenesses of individuals from their names. It addresses the difficulty of telling whether a generated face is memorized or fabricated without ground-truth photos, training data, or white-box model access.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Infrastructure & Hardware extract
    HydraHead: From Head-Level Functional Heterogeneity to Specialized Attention Hybridization
    HydraHead: head-level hybridization of linear and full attention
    Neural Network Retrieval-Augmented Generation (RAG)
    The paper proposes HydraHead, a hybrid attention design that exploits head-level functional heterogeneity to combine linear and full attention. It moves beyond the common layer-wise hybridization strategy, addressing the difficulty of integrating linear attention with full attention for efficient long-context processing.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • OpenAI Blog · EN New Model Releases extract
    Improving health intelligence in ChatGPT
    OpenAI improves ChatGPT health responses with GPT-5.5 Instant
    GPT
    OpenAI says GPT-5.5 Instant strengthens ChatGPT's health and wellness responses through better reasoning, richer context, and clearer communication. The work is backed by physician-informed evaluations aimed at delivering more reliable, trustworthy health guidance.
    Read original (OpenAI Blog) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis
    An information-theoretic look at supervising latent chain-of-thought
    The paper gives an information-theoretic analysis of what makes supervision effective in latent chain-of-thought reasoning, which internalizes reasoning in continuous hidden states. It examines why outcome supervision provides weak learning signals, making robust latent reasoning difficult.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM Agents
    Investigating over-privileged tool selection in LLM agents
    AI Agents Meta Neural Network
    The paper investigates over-privileged tool selection in LLM agents, which autonomously choose among tools with different privilege levels. It addresses a gap in prior tool-selection research, which focuses on safety-agnostic metadata preferences, by studying when lower-privilege tools would suffice.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    REDACT: A Systematically Controlled Multilingual Benchmark for Personal Information Detection
    REDACT: a controlled multilingual benchmark for PII detection
    Claude GPT Meta Neural Network OpenAI
    The paper presents REDACT, a systematically controlled multilingual benchmark for personal information (PII) detection. It addresses limitations of existing corpora—few entity types, ad hoc generation, and little insight into which surface conditions cause detector failures.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    AtomMem: Building Simple and Effective Memory System for LLM Agents via Atomic Facts
    AtomMem: an LLM-agent memory system built on atomic facts
    AI Agents Neural Network Retrieval-Augmented Generation (RAG)
    The paper proposes AtomMem, a simple and effective memory system for LLM agents built around atomic facts. It addresses the limits of fixed context windows for accumulating and reusing information across sessions, and the coarse, unstable memory of existing memory-augmented systems.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • Hacker News (Front Page) · EN New Model Releases extract
    DeepSeek Introduces Vision
    DeepSeek introduces vision capabilities
    DeepSeek
    An item reporting that DeepSeek has introduced vision capabilities, adding image understanding to its previously text-focused models. The multimodal upgrade broadens the range of tasks the models can handle.
    Read original (Hacker News (Front Page)) ↗
  • Lobste.rs (AI tagged) · EN New Model Releases extract
    Announcing Stack Overflow for Agents
    Announcing Stack Overflow for Agents
    AI Agents Neural Network
    An announcement of Stack Overflow for Agents, aimed at AI agents. Like the Q&A site human developers use, it seeks to let agents reference and share knowledge and code examples for solving problems.
    Read original (Lobste.rs (AI tagged)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Think Again or Think Longer? Selective Verification for Budget-Aware Reasoning
    Selective verification for budget-aware test-time reasoning
    Machine Learning Software Engineering
    The paper studies budget-aware test-time reasoning as a deployment allocation problem, asking whether to 'think again' or 'think longer.' It proposes selective verification, since extra reasoning is not uniformly useful—it can repair failures, waste compute on correct answers, or introduce harmful changes.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models
    Manifold Bandits: Bayesian curriculum learning for LLM reasoning
    Retrieval-Augmented Generation (RAG) Reinforcement Learning
    The paper proposes Manifold Bandits, a Bayesian curriculum-learning method that samples training problems over the latent geometry of large language models. It targets reinforcement learning for LLM reasoning, where training efficiency depends heavily on how prompts are selected during optimization.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Benchmarking Agentic Review Systems
    Benchmarking agentic peer-review systems
    GPT Neural Network OpenAI
    The paper benchmarks agentic review systems, which are emerging to relieve the pressure AI-assisted research places on peer review. It evaluates two open-source systems, one proprietary system, and a zero-shot baseline, addressing the open question of how such systems should be assessed.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • ITmedia AI+ · JA New Model Releases extract
    「シャドーAI」7割超の企業が対策追い付かず “会社が選んだAIだけ利用”はもう限界? ガートナー
    Gartner: 73% of Japanese firms cannot keep up with shadow AI
    Gartner reports that 73% of Japanese companies have failed to address shadow AI, where employees use unsanctioned AI tools at work. Restricting staff to only company-approved AI is nearing its limits, making governance and enablement a shared challenge.
    Read original (ITmedia AI+) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Closing the Calibration Gap in Semantic Caching
    Closing the calibration gap in semantic caching
    Inference
    The paper addresses the calibration gap in semantic caching, which cuts LLM inference costs by serving cached responses to semantically similar queries. It shows that evaluating with PR-AUC—which only measures ranking, not usability at a fixed threshold—leads to systematically poor deployment choices.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • Simon Willison's Weblog · EN Infrastructure & Hardware extract
    GLM-5.2 is probably the most powerful text-only open weights LLM
    GLM-5.2 may be the most powerful text-only open weights LLM
    DeepSeek Mixture of Experts (MoE)
    Chinese AI lab Z.ai released GLM-5.2 to coding-plan subscribers on June 13 and then published full open weights under an MIT license on June 16. Similar in size to GLM-5 and GLM-5.1, it may be the most powerful text-only open weights LLM, per Simon Willison.
    Read original (Simon Willison's Weblog) ↗
  • ITmedia AI+ · JA New Model Releases extract
    「AIを使う学生」vs.「使わない学生」、エッセイが創造的なのはどっち? 米大学が2025年に実証実験
    AI-using vs non-using students: whose essays are more creative?
    GPT
    Georgetown University researchers published a study on the homogenizing effect of LLMs on creative diversity, empirically comparing human and ChatGPT writing. The article reports how using AI affects the creativity and diversity of students' essays.
    Read original (ITmedia AI+) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Native Active Perception as Reasoning for Omni-Modal Understanding
    Active perception as reasoning for efficient omni-modal understanding
    Deep Learning Fine-tuning Machine Learning Neural Network Retrieval-Augmented Generation (RAG)
    Passive long-video models 'watch it all,' processing frames uniformly so cost grows with duration regardless of query difficulty. This work treats perception as reasoning, with native active perception that selectively attends to relevant frames for efficient omni-modal understanding.
    Read original (arXiv cs.CL (Computation and Language)) ↗