Industry Adoption C
Showing 61–83 of 83
-
Understanding the Behaviors of Environment-aware Information RetrievalPaper: RL adapts LLM query formulation per retrieverAn arXiv paper presents a systematic analysis of how LLMs can learn, via reinforcement learning, to adapt their query formulation strategies to different retrievers in retrieval-augmented generation. Summarized neutrally from the abstract.
-
Building llm-driven “ai” still requires domain knowledgeBuilding LLM-driven tools still hinges on capturing domain knowledgeA developer shares lessons from building an LLM-driven tool that answers user questions via a customer API. Capturing and writing down domain knowledge is much of the work, easier than earlier AI generations since it need not be rigidly structured, yet exactly where prior efforts foundered.
-
Gen-VCoT: Generative Visual Chain-of-Thought Reasoning via Diffusion-Based RGB Intermediate RepresentationsGen-VCoT uses generated RGB visual intermediates for multimodal reasoningGen-VCoT replaces text-only chain-of-thought with generated RGB intermediates, staging visual grounding (SAM), depth (Marigold), and semantic reasoning (Qwen2-VL) under an adaptive router. It improves spatial (+25%) and depth (+50%) questions but can hurt simple factual ones; text CoT still wins on CLEVR, suggesting task-dependent representations.
-
GD$^2$PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy OptimizationGD²PO eases multi-reward conflicts in LLM RL via dynamic reward decouplingAs LLM post-training RL uses multi-dimensional rewards, conflicting signals across reward groups can cancel out and hinder training. GD²PO decouples rewards into groups and, inspired by DAPO, dynamically filters near-zero-advantage rollouts, reducing conflicts and improving RL training efficiency.
-
Can LLM Agents Infer World Models? Evidence from Agentic Automata LearningCan LLM agents infer world models? Evidence from automata learningThis arXiv paper proposes agentic automata learning to assess how well tool-calling LLM agents can uncover hidden environments through interaction. An agent must infer a hidden deterministic finite automaton (DFA) via membership and equivalence queries, yielding a scalable testbed with controlled task complexity. Evaluating state-of-the-art LLMs, the authors find performance drops sharply as DFA size grows, with reasoning models markedly stronger.
-
SkillWiki: A Living Knowledge Infrastructure for Agent SkillsSkillWiki: a living knowledge infrastructure for agent skillsWhile knowledge is managed via Wikipedia and software via GitHub, agent skills still lack infrastructure for large-scale production, governance, and evolution. SkillWiki is a living knowledge infrastructure turning heterogeneous knowledge into reusable skill assets linked to their originating evidence. It presents the full skill lifecycle, from knowledge ingestion to provenance-aware exploration, governance, and execution-driven evolution, with a live demo and source code available.
-
daVinci-kernel: Co-Evolving Skill Selection, Summarization, and Utilization via RL for GPU Kernel OptimizationdaVinci-kernel: an RL framework co-evolving skills for GPU kernel tuningGPU kernel optimization assumes correctness and targets execution efficiency. The authors present daVinci-kernel, an RL framework coupling skill discovery and exploitation via a dynamically evolving skill library. Three agents share one LLM backbone: a Selection Agent retrieving techniques via BM25 and LLM reranking, a Policy Agent generating CUDA/Triton kernels, and a Summary Agent distilling rollouts into reusable skills. Skills are added only after execution verification confirms speedups.
-
Javaアプリ更新を1カ月→3日に爆速化 “ソースコード生成AI止まり”じゃない「IBM Bob」の仕組みIBM unveils 'IBM Bob', an AI that speeds Java app modernizationIBM's new AI tool 'IBM Bob' reportedly cut Java application modernization from 30 days to 3 at early adopters. Its distinguishing feature is going beyond mere source-code generation.
-
Cohere triples UK footprint with new London office to support R&D growthCohere triples its UK footprint with a new London R&D officeCohere announced it will move to a larger London office at 100 New Oxford Street, nearly tripling its UK footprint. The expansion backs the city's AI talent and R&D base and supports growing demand for secure, enterprise-grade sovereign AI across the UK and Europe.
-
Introducing the OpenAI Partner NetworkOpenAI launches Partner Network, investing $150M to speed enterprise AIOpenAI introduced its Partner Network, committing $150M to help global partners accelerate enterprise AI adoption, deployment, and transformation. The program aims to broaden OpenAI's reach into enterprise markets through a structured partner ecosystem.
-
Sakana AI、初の商用プロダクト「Sakana Marlin」を提供開始Sakana AI launches Marlin, its first commercial autonomous research assistantSakana AI has launched Sakana Marlin, its first commercial product: an autonomous research assistant for business. Given a research theme, it works autonomously for up to about eight hours—forming hypotheses, gathering and verifying information—then outputs structured summary slides and a report spanning dozens of pages. Built on the firm's long-horizon reasoning technology, it aims to act as a 'virtual CSO,' is self-serve, and available same day, with plans from free pay-per-use to Enterprise.
-
Statement on the US government directive to suspend access to Fable 5 and Mythos 5Willison on the US directive to suspend Fable 5 and Mythos 5Simon Willison comments on the US government's national-security export-control directive suspending all foreign-national access to Fable 5 and Mythos 5, calling the move extraordinary and questioning its rationale and impact.
-
最新AI「Fable 5」でYouTube動画作ってみた 想像以上の出来に驚愕、ただし大きな弱点もHands-on: making a YouTube video with the new Fable 5 AIA hands-on test of using the new Fable 5 AI to produce a YouTube video. The author is impressed by the surprisingly high quality of the output but also flags a significant weakness in the workflow.
-
TCS and Anthropic partner to bring Claude to regulated industriesAnthropic partners with TCS to bring Claude to regulated industriesAnthropic announced a partnership with Tata Consultancy Services. TCS will deploy Claude to 50,000 employees across 56 countries, build Claude-powered products for finance, healthcare and the public sector, and join the Claude Partner Network.
-
AgentSpec: Understanding Embodied Agent Scaffolds Through Controlled CompositionAgentSpec dissects embodied agent scaffolds via controlled compositionAgentSpec studies scaffolded LLM agents that combine reasoning, memory, reflection, and action through controlled composition. It aims to isolate how each component contributes to overall performance.
-
Listening with Attention: Entropy-Guided Explainability for Transformer-Based Audio ModelsEntropy-guided explainability for Transformer-based audio modelsTransformer-based ASR models like Whisper are accurate but hard to interpret, and existing XAI methods lack faithfulness and temporal precision. The paper proposes an entropy-guided explainability approach for such audio models.
-
When Good Verifiers Go Bad: Self-Improving VLMs Can Regress on New TasksWhen good verifiers go bad: self-improving VLMs can regress on new tasksVerifier-driven self-DPO, where a frozen verifier scores candidates to form preference pairs, is a common recipe for self-improving vision-language models. The paper shows that under this setup VLMs can regress on new tasks when the verifier misbehaves.
-
A Statistical and Machine Learning Framework for Operational Threshold Detection and Deployable Dispatch Controller Development in Hydrogen Multi-Energy SystemsML framework for threshold detection in hydrogen multi-energy systemsThe study presents a statistical and machine learning framework characterizing a hydrogen-based multi-energy system. It targets operational threshold detection and deployable dispatch controller development.
-
When Errors Become Narratives: A Longitudinal Taxonomy of Silent Failures in a Production LLM Agent RuntimeA longitudinal taxonomy of silent failures in a production LLM agent runtimeLLM agents increasingly run as long-lived autonomous runtimes that schedule jobs, call tools, maintain memory, and push results to humans. This longitudinal study of one persistent system presents a taxonomy of its silent failures.
-
VISTA: View-Consistent Self-Verified Training for GUI GroundingVISTA: view-consistent self-verified training for GUI groundingApplying GRPO to GUI grounding samples rollouts from a single screenshot, so groups often turn all-failure or all-success and yield weak signal. VISTA introduces view-consistent, self-verified training to stabilize GUI grounding.
-
Securing the Future of IoMT in the Post-Quantum Era: An Edge-Native Federated Learning ApproachEdge-native federated learning to secure IoMT in the post-quantum eraInternet of Medical Things devices handle sensitive health data under tight resource constraints, making security and privacy critical, while federated learning adds complexity. The paper proposes an edge-native federated learning approach to secure IoMT in the post-quantum era.
-
Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA Accelerated InfrastructureNVIDIA details deploying MiniMax M3 for long-context agentic workflowsNVIDIA's developer blog explains how to deploy MiniMax M3 on NVIDIA accelerated infrastructure for long-context reasoning and agentic workflows, addressing fragmented enterprise AI pipelines spanning text, vision, and other modalities.
-
Beyond the Training Distribution: Evaluating Predictions Under Distribution Shift and Selection BiasEvaluating predictions under distribution shift and selection biasKnowing how a model will perform in a new environment before deployment helps prevent harm. The paper evaluates predictions under two common sources of degradation: distribution shift and selection bias.