Agents & Tool Use A
Showing 1–30 of 33
-
「待ちの営業」はもう限界 ホンダがAIエージェントで挑む、商機を逃さない「濃い商談」の創出Honda brings AI agents to car sales to drive higher-quality dealsHonda has deployed AI agents in new-car sales to help create higher-quality, opportunity-capturing deals as buyer behavior shifts. The system helps salespeople move beyond passive 'wait-and-see' selling, and has already produced closed sales.
-
工数「76%」削減 味の素グループが「経理AIエージェント」導入で先陣を切れたワケAjinomoto deploys autonomous accounting AI agent, cuts workload 76%Ajinomoto's finance arm has begun running an accounting AI agent that autonomously handles expense-approval work, cutting workload by 76%. The move makes it an early mover in a field where intolerance for errors had bred caution about adopting AI.
-
話題の「Claude Mythos」登場で変わるセキュリティ AIエージェント時代の防衛策Claude Mythos reshapes security as AI attacks turn hourlyThe new AI model "Claude Mythos" makes AI-driven attacks feel imminent, shifting the timeline from months to hours. As vulnerability discovery grows more capable, corporate AI rules and governance lag behind. The article outlines defenses for the AI agent era.
-
Probe-and-Refine Tuning of Repository Guidance for Coding AgentsProbe-and-Refine: tuning repository guidance for coding agentsThe paper presents Probe-and-Refine, a method for tuning the repository guidance (such as AGENTS.md files) that LLM-based coding agents rely on. It targets the higher-level operational knowledge—file layout, test workflows, and error-prone patterns—that is not contained in the code itself.
-
Efficient and Sound Probabilistic Verification for AI AgentsEfficient and sound probabilistic verification for AI agentsSecuring AI agents that operate in complex digital environments has become critical, motivating runtime verification. This paper presents an efficient and sound probabilistic verification approach for AI agents.
-
When Does Streaming Tool Use Help? Characterizing Tool-Intent Stabilization in Streaming Retrieval-Augmented GenerationWhen does streaming tool use help in streaming RAG?The paper characterizes when streaming tool use helps in streaming retrieval-augmented generation, which issues tool queries in parallel with ongoing user input to cut perceived latency. It argues the benefit is query-intrinsic and studies how tool intent stabilizes before an utterance is complete.
-
When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM AgentsInvestigating over-privileged tool selection in LLM agentsThe paper investigates over-privileged tool selection in LLM agents, which autonomously choose among tools with different privilege levels. It addresses a gap in prior tool-selection research, which focuses on safety-agnostic metadata preferences, by studying when lower-privilege tools would suffice.
-
Connect the Dots: Training LLMs for Long-Lifecycle Agents with Cross-Domain Generalization Via Reinforcement LearningConnect the Dots: RL training for long-lifecycle LLM agentsThe paper presents Connect the Dots (CoD), a reinforcement-learning framework for training large language models as long-lifecycle agents. It targets the meta-capability of solving a long sequence of tasks while continuously exploring an environment, aiming for cross-domain generalization.
-
かんぽ生命、AIで営業支援 “郵便局での一言”拾って保険提案へ 寸劇で分かる活用例Japan Post Insurance adds AI agents to its sales workflowJapan Post Insurance, serving 17 million customers, has embedded AI agents into its sales workflow, turning offhand remarks at post offices into insurance proposals. A demonstration shows how the technology changes frontline staff preparing for client meetings.
-
STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy StabilitySTARE reweights token advantages to stabilize policy entropyReinforcement learning with verifiable rewards, such as GRPO, dominates post-training for complex LLM reasoning but often suffers policy entropy collapse. STARE introduces surprisal-guided token-level advantage reweighting to stabilize policy entropy and preserve exploration during training.
-
Towards an Agent-First Web: Redesigning the Web for AI AgentsTowards an agent-first web: redesigning the web for AI agentsThe Web was built on a three-decade assumption that its primary content consumer is human, which permeates every layer of its access model. This work argues for an agent-first web, redesigning the Web for AI agents and rethinking access, structure, and interaction for an agent-driven era.
-
RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use AgentsRODS: reward-driven online data synthesis for tool-use agentsMulti-turn tool-use RL is bottlenecked by the rapid depletion of informative samples in static datasets. Observing that GRPO's gradient signal concentrates on certain tasks, RODS performs reward-driven online data synthesis to continually supply informative samples for multi-turn tool-use agents.
-
Decoupling Search from Reasoning: A Vendor-Agnostic Grounding Architecture for LLM AgentsDecoupling search from reasoning: a vendor-agnostic grounding architectureProduction LLM agents increasingly depend on real-time search but get locked into vendor-specific grounding. This work decouples search from reasoning with a vendor-agnostic grounding architecture, letting search backends be swapped while preserving reasoning quality.
-
Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement LearningBeyond reward engineering: a data recipe for long-context RLLong-context reasoning is essential for large language models. Rather than relying on reward engineering, this work presents a data recipe for long-context reinforcement learning that drives effective training.
-
「ポケカ対戦AIエージェント」開発コンテスト開始 「不完全情報ゲーム」をどう制するかContest launches to build AI agents for Pokemon TCG, an imperfect-info gameA development contest has begun for AI agents that play the Pokemon Trading Card Game. Unlike chess or shogi, it is an "imperfect-information game" where the opponent's hand is hidden, testing how well AI can handle strategic uncertainty.
-
Building AI Agents for AR Glasses and XR Devices with NVIDIA XR AINVIDIA unveils XR AI to build AI agents for AR glasses and XR devicesNVIDIA introduced NVIDIA XR AI, a framework for developers to build AI agents for AR glasses and wearable XR devices. It targets the gap between ready hardware and the work of integrating live, real-time AI experiences. Capabilities are per NVIDIA's own announcement; third-party verification pending.
-
GitLab、AIエージェント向けの次世代Git互換ソースコード管理サービス「Project Switch」発表。最大で50倍高速かつ半分のトークンで利用可能にGitLab unveils 'Project Switch,' a Git-compatible SCM service for AI agentsGitLab announced Project Switch, a next-generation Git-compatible source code management service aimed at AI agents, at its GitLab Transcend event in London. Reports cite up to 50x speed and roughly half the token usage; figures reflect the announcement and remain unverified.
-
Your AI Travel Agent Would Book You a Bullfight: An Agentic Benchmark for Implicit Animal Welfare in Frontier AI ModelsAn agentic benchmark for implicit animal welfare in frontier AIAI agents are shifting from advisors to actors that book travel and run procurement. Existing animal-welfare benchmarks grade only text answers, so this work introduces an agentic benchmark testing whether implicit animal-welfare reasoning transfers to agent actions in frontier models.
-
Securing the future of AI agentsDeepMind outlines an AI Control Roadmap to secure AI agentsGoogle DeepMind presents an AI Control Roadmap for securing the future of AI agents, combining traditional safeguards with real-time monitoring to protect internal systems. The framework lays out layered defenses against agent misuse and unsafe behavior as agents proliferate.
-
Compositional Skill Routing for LLM Agents: Decompose, Retrieve, and ComposeCompositional skill routing for LLM agents: decompose, retrieve, composeLLM agents rely on reusable tool specifications (skills), but real tasks require composing multiple skills. The paper formalizes compositional skill routing: decomposing a complex query into atomic sub-tasks, retrieving relevant skills, and composing them.
-
ProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM AgentsProvenanceGuard: source-aware factuality verification for MCP agentsTool-using LLM agents use the Model Context Protocol to answer from heterogeneous sources like search, APIs, databases and clinical records. ProvenanceGuard provides source-aware factuality verification to catch provenance-sensitive failure modes that standard metrics miss.
-
LLM Consumer Behavior Theory: Foundations of a Novel Research FieldLLM Consumer Behavior Theory: a new field for agentic marketsThe paper introduces LLM Consumer Behavior Theory, a proposed field analyzing consumer behavior in agentic markets where LLMs make consumption decisions on behalf of users, drawing on classical and behavioral economics alongside NLP.
-
Beyond Domains: Reusing Web Skills via Transferable Interaction PatternsReusing web skills via transferable interaction patternsLLM web agents are usually deployed as tool callers that read a fresh page observation each turn and emit a structured action. The paper proposes reusing web skills across domains via transferable interaction patterns rather than domain-specific behaviors.
-
datasette-agent 0.3a0Simon Willison releases datasette-agent 0.3a0 with approval-gated SQL writesSimon Willison released datasette-agent 0.3a0, adding a new 'execute_write_sql' tool that requests user approval before writing to a database while respecting user permissions. It extends the approval mechanism introduced in the prior 0.2a0 release, enabling agent-driven write operations under explicit user consent.
-
Stack Overflow、AIエージェント同士が掲示板で技術情報を共有する「Stack Overflow for Agents」ベータ公開Stack Overflow launches 'Stack Overflow for Agents' betaStack Overflow has launched a beta of 'Stack Overflow for Agents,' a service where AI agents share technical solutions and other information on an open message board. The move appears aimed at extending its human Q&A knowledge base into information exchange among agents.
-
GIST-CMTF: Goal-State Inference for Causal Minimal Tool Filtering in LLM AgentsGIST-CMTF adds goal-state inference to causal minimal tool filteringThe paper introduces GIST-CMTF, which augments Causal Minimal Tool Filtering with goal-state inference for tool-augmented LLM agents. It addresses wrong-goal execution, where ambiguous requests such as "handle my appointment" map to multiple goals and an agent may follow a valid causal tool path toward an unintended objective.
-
OpenClaw-Skill: Collective Skill Tree Search for Agentic Large Language ModelsOpenClaw-Skill: collective skill tree search for LLM agentsThe paper proposes Collective Skill Tree Search (CSTS), a tree-search framework that automatically builds reusable skills for LLM agents via iterative collective generation and assessment across multiple models. Claims reflect the abstract.
-
Multimodal Evaluator Preference Collapse: Cross-Modal Contagion in Self-Evolving AgentsPaper on evaluator preference collapse in self-evolving agentsAn arXiv paper reportedly examining preference collapse in multimodal evaluators and its cross-modal contagion within self-evolving agent systems. The source excerpt was unavailable (content filter), so this summary is based on the title only; see the original for methods and findings.
-
SING: Synthetic Intention Graph for Scalable Active Tool Discovery in LLM AgentsSING: synthetic intention graph for scalable active tool discoveryThis arXiv paper addresses tool selection for LLM agents whose harnesses connect to hundreds or thousands of APIs, where exhaustive tool-schema injection is costly and imposes a closed-world assumption. Noting that one-shot retrieval often fails to align isolated tool descriptions with the agent's true intent—especially in long-horizon tasks—the authors propose SING, a Synthetic Intention Graph for scalable, active tool discovery.
-
Sakana AI、初の商用プロダクト「Marlin」リリース その実力は?【出力レポート全文掲載】Sakana AI launches its first commercial product, Sakana MarlinSakana AI has launched Sakana Marlin, an AI research agent, commercializing the beta it had offered since April. Ahead of the release it held a press hands-on, showing reporters reports the AI generated from pre-collected themes.