Developer Tools B

Showing 121–150 of 305
  • arXiv cs.AI (Artificial Intelligence) · EN Multimodal extract
    A Hybrid LSTM--Vision Transformer Architecture for Predicting HRRR Forecast Errors
    Hybrid LSTM–Vision Transformer predicts HRRR forecast errors
    Reinforcement Learning Transformer
    Forecast errors in high-resolution numerical weather prediction such as HRRR often stem from unresolved planetary boundary layer processes, convection, and terrain-induced circulations. This work uses a hybrid LSTM–Vision Transformer architecture to predict HRRR forecast errors from vertically structured features.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Sumi: Open Uniform Diffusion Language Model from Scratch
    Sumi: an open uniform diffusion language model from scratch
    Deep Learning Reinforcement Learning
    Diffusion models are a promising alternative to autoregressive ones, and uniform diffusion language models (UDLMs) let any token be updated at any step. This work releases Sumi, an open uniform diffusion language model built from scratch, supporting research and reproducibility in diffusion LMs.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    G-IdiomAlign: A Gloss-Pivoted Benchmark for Cross-Lingual Idiom Alignment
    G-IdiomAlign: a gloss-pivoted cross-lingual idiom benchmark
    Embeddings
    Idioms resist literal cross-lingual mapping because they are non-compositional. G-IdiomAlign anchors each idiom to an English Wiktionary gloss and adds a high-confidence reference alignment set. Two protocols (multiple-choice idiom equivalence and gloss-contrastive generation) isolate the effect of explicit glosses.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    ThinkDeception: A Progressive Reinforcement Learning Framework for Interpretable Multimodal Deception Detection
    ThinkDeception: interpretable multimodal deception detection via RL
    Machine Learning Neural Network Reinforcement Learning
    Existing multimodal deception detection relies on end-to-end black boxes that offer no transparent reasoning. ThinkDeception is a progressive reinforcement learning framework that explicitly captures subtle cross-modal cues and produces interpretable reasoning trajectories for deception detection.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Beyond Tokenization: Direct Timestep Embedding and Contrastive Alignment for Time-Series Question Answering
    Direct timestep embedding and contrastive alignment for time-series QA
    Embeddings Machine Learning Retrieval-Augmented Generation (RAG) Software Engineering
    Time-series question answering casts analysis as natural-language QA. Instead of tokenizing the series, this work embeds timesteps directly and uses contrastive alignment to match language representations, avoiding the information loss of tokenization.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    Mitigating Scoring Errors and Compensating for Nonverbal Subtests in Speech-Based Dementia Assessment
    Mitigating scoring errors in speech-based dementia assessment
    Embeddings Retrieval-Augmented Generation (RAG) Reinforcement Learning Speech Processing
    Early detection of cognitive impairment relies on neuropsychological tests whose scoring is subjective. This work mitigates scoring errors and compensates for nonverbal subtests in speech-based dementia assessment, aiming for more objective and reliable screening.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    A Controlled Benchmark of Quantum-Latent GAN Augmentation for Brain MRI
    A controlled benchmark of quantum-latent GAN augmentation for brain MRI
    Medical image classification is constrained by limited labeled data. This paper builds a controlled benchmark evaluating quantum-latent GAN data augmentation for brain MRI classification, measuring its effect under standardized conditions.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Training & Fine-tuning extract
    GraphPO: Graph-based Policy Optimization for Reasoning Models
    GraphPO: graph-based policy optimization for reasoning models
    Neural Network Reinforcement Learning Software Engineering
    Reinforcement learning with verifiable rewards has become standard for reasoning models. GraphPO introduces a graph-based policy optimization method that exploits structure across reasoning steps to improve reasoning performance.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Multimodal extract
    RTSGameBench: An RTS Benchmark for Strategic Reasoning by Vision-Language Models
    RTSGameBench: an RTS benchmark for strategic reasoning by VLMs
    AI Agents Computer Vision Neural Network Retrieval-Augmented Generation (RAG)
    Modern vision-language models struggle with strategic reasoning. RTSGameBench uses real-time strategy games to benchmark VLMs on planning and situational judgment, probing their strategic reasoning abilities.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Developer Tools extract
    As Easy as Rocket Science: Assessing the Ability of Large Language Models to Interpret Negation in Figurative Language
    Can LLMs interpret negation in figurative language?
    Neural Network Reinforcement Learning
    Figurative language and negation both challenge current language models. This study assesses how well large language models interpret negation embedded in figurative expressions, revealing model limitations where the two phenomena intersect.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Developer Tools extract
    Learning Robust Pair Confidence for Multimodal Emotion-Cause Pair Extraction
    Learning robust pair confidence for multimodal emotion-cause extraction
    Inference Retrieval-Augmented Generation (RAG)
    Multimodal emotion-cause pair extraction requires reliable pairing of emotions and their causes. This work learns robust pair confidence, yielding emotion-cause extraction that is more resilient to noise and ambiguity.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Improving Medical Communication using Rubric-Guided Counterfactual Recommendations
    Rubric-guided counterfactual recommendations for medical communication
    Inference Meta
    Text-based telemedicine increasingly relies on lightweight patient feedback. This work improves medical communication using rubric-guided counterfactual recommendations, enhancing the quality of patient-clinician interactions.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • Stratechery (free posts) · EN Safety & Evaluation extract
    The State of Fable, The Jailbreak Problem, SpaceX Acquires Cursor
    Stratechery on Fable's state, jailbreaks, and SpaceX buying Cursor
    Anthropic
    A Stratechery column by Ben Thompson on three topics: the state of Anthropic's Fable model, the AI jailbreak problem, and SpaceX's acquisition of Cursor. Thompson argues the administration is likely wrong about Fable but that responsibility ultimately lies with Anthropic. Views are the author's; deal specifics are unverified.
    Read original (Stratechery (free posts)) ↗
  • arXiv cs.CL (Computation and Language) · EN Inference & Efficiency extract
    Efficient Financial Language Understanding via Distillation with Synthetic Data
    Efficient financial language understanding via distillation with synthetic data
    Neural Network Natural Language Processing (NLP) Reinforcement Learning
    Large instruction-following models are powerful but costly to deploy, especially in finance. This work distills capabilities using synthetic data to build lightweight models that understand financial language efficiently.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    Aligning Implied Statements for Implicit Hate Speech Generalizability with Context-Bounded Semi-hard Negative Mining
    Aligning implied statements for generalizable implicit hate detection
    Speech Processing
    Classifying implicit hate speech is hard because intent is rarely explicit. This work aligns implied statements and applies context-bounded semi-hard negative mining to improve the generalizability of implicit hate speech detection.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Developer Tools extract
    ScholarSum: Student-Teacher Abstractive Summarization via Knowledge Graph Reasoning and Reflective Refinement
    ScholarSum: student-teacher abstractive summarization with KG reasoning
    Abstractive summarization enables efficient understanding. ScholarSum combines a student-teacher framework with knowledge-graph reasoning and reflective refinement to produce summaries with improved factuality and coherence.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Agents & Tool Use extract
    Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning
    Beyond reward engineering: a data recipe for long-context RL
    AI Agents Retrieval-Augmented Generation (RAG) Reinforcement Learning
    Long-context reasoning is essential for large language models. Rather than relying on reward engineering, this work presents a data recipe for long-context reinforcement learning that drives effective training.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • ITmedia AI+ · JA New Model Releases extract
    Cursor、Gitホスティング「Origin」発表 SpaceXによる買収発表直後に
    Cursor unveils 'Origin' Git hosting, seen as a GitHub rival
    Cursor, the AI coding tool, announced 'Origin', a Git hosting service that the article frames as aimed at rivaling GitHub. The reveal reportedly came right after news of SpaceX acquiring Cursor. Acquisition terms and Origin's features are article-based, and third-party verification is unconfirmed.
    Read original (ITmedia AI+) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    Beyond Scalar Scores: Exploring LLM-based Metrics for Clinical Significance Evaluation in Radiology Reports
    Beyond scalar scores: LLM-based metrics for radiology report significance
    Inference Machine Learning
    Reliable evaluation of generated radiology reports requires strict clinical validity. Going beyond scalar scores, this work explores LLM-based metrics for clinical significance evaluation, assessing report quality in clinically meaningful terms.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    HandwritingAgent: Language-Driven Handwriting Synthesis in Scalable Vector Space
    HandwritingAgent: language-driven handwriting synthesis in vector space
    Deep Learning Neural Network Retrieval-Augmented Generation (RAG)
    Emulating natural handwriting styles remains an open problem. HandwritingAgent synthesizes handwriting in a scalable vector space from language-driven instructions, enabling generation of diverse, resolution-independent handwriting styles.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    RedactionBench
    RedactionBench: a benchmark for redacting sensitive information
    Neural Network Reinforcement Learning
    Large language models are increasingly applied to sensitive domains. RedactionBench evaluates how well models redact sensitive information in such settings, supporting verification toward safer deployment.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Lost in a Single Vector: Improving Long-Document Retrieval with Chunk Evidence Aggregation
    Improving long-document retrieval with chunk evidence aggregation
    Deep Learning Inference Reinforcement Learning
    Dense retrieval matches one query vector against one document vector, but long documents get lost in a single vector. This work splits documents into chunks and aggregates per-chunk evidence to improve long-document retrieval.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    LLMs Struggle to Measure What Distinguishes Students of Different Proficiency Levels: A Study of Item Discrimination in Reading Comprehension Assessment
    LLMs struggle to measure item discrimination in reading assessment
    Software Engineering
    Item discrimination is a fundamental psychometric property that distinguishes students of different proficiency. This study shows that large language models struggle to measure item discrimination in reading comprehension assessment, exposing limits of automated evaluation.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Attention as Frustrated Synchronization
    Attention as frustrated synchronization
    Transformer
    A network of oscillators that synchronizes perfectly computes nothing. This work frames attention as frustrated synchronization, offering a physics-inspired view that interprets the workings of attention through partial, non-trivial synchronization.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • ITmedia AI+ · JA Developer Tools extract
    日立、OpenAIとの連携を本格化 「Codex」でレガシーシステム刷新、サイバー防衛も
    Hitachi deepens OpenAI tie-up, using Codex to modernize legacy systems
    OpenAI
    Hitachi is expanding its partnership with OpenAI, pairing the code-analysis AI "Codex" with its own systems-development expertise. It aims to establish an AI-driven workflow that visualizes upstream specifications from existing code through migration testing, and also cites cybersecurity defense as a use case.
    Read original (ITmedia AI+) ↗
  • ITmedia AI+ · JA Developer Tools extract
    SpaceX、AIコーディング「Cursor」を9.6兆円で買収 「近く大幅な改善」へ
    SpaceX reported to acquire AI coding tool Cursor for 9.6 trillion yen
    SpaceX is reported to be acquiring the AI coding tool "Cursor" for 9.6 trillion yen. Cursor said on its official X account that "major improvements are coming soon," according to the article. Deal details and the headline figure are based on the report and remain unverified by third parties.
    Read original (ITmedia AI+) ↗
  • Hacker News (Front Page) · EN Developer Tools extract
    GrapheneOS has been ported to Android 17
    GrapheneOS ported to Android 17, releases coming soon
    A forum post reporting that the privacy-focused mobile OS GrapheneOS has been ported to Android 17, with official releases said to be coming soon. Porting details are based on the community announcement.
    Read original (Hacker News (Front Page)) ↗
  • arXiv cs.CL (Computation and Language) · EN Inference & Efficiency extract
    Variable-Width Transformers
    Variable-width transformer cuts FLOPs ~22% via x-shaped layer widths
    Deep Learning Mixture of Experts (MoE) Retrieval-Augmented Generation (RAG) Reinforcement Learning Transformer
    The paper proposes an x-shaped transformer that keeps early and late layers wide while narrowing the middle, using a parameter-free residual resizing mechanism. Across dense 200M-2B and 3B MoE decoder-only models it outperforms parameter-matched uniform baselines and reduces FLOPs by about 22% under loss-matched scaling, with smaller KV cache.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    ReproRepo: Scaling Reproducibility Audits with GitHub Repository Issues
    ReproRepo scales reproducibility audits using GitHub repo issues
    AI Agents GPT Machine Learning Retrieval-Augmented Generation (RAG) Reinforcement Learning
    Reproducing results from papers and code is central to science but existing benchmarks are hard to scale. ReproRepo leverages GitHub repository issues to evaluate, at scale, how well LLM agents can assist with reproducibility tasks, addressing the manual effort that limits prior reproducibility benchmarks.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    EvolveNav: Proactive Preflection and Self-Evolving Memory for Zero-Shot Object Goal Navigation
    EvolveNav: a self-evolving framework for zero-shot object-goal navigation
    AI Agents Neural Network Retrieval-Augmented Generation (RAG)
    The paper proposes a self-evolving zero-shot object-goal navigation framework that builds an agentic rule memory by extracting actionable knowledge from past trajectories and uses a retrieval strategy to enable continuous test-time improvement.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗