Agents & Tool Use A

Showing 1–30 of 33
  • ITmedia AI+ · JA Industry Adoption extract
    「待ちの営業」はもう限界 ホンダがAIエージェントで挑む、商機を逃さない「濃い商談」の創出
    Honda brings AI agents to car sales to drive higher-quality deals
    AI Agents
    Honda has deployed AI agents in new-car sales to help create higher-quality, opportunity-capturing deals as buyer behavior shifts. The system helps salespeople move beyond passive 'wait-and-see' selling, and has already produced closed sales.
    Read original (ITmedia AI+) ↗
  • ITmedia AI+ · JA Industry Adoption extract
    工数「76%」削減 味の素グループが「経理AIエージェント」導入で先陣を切れたワケ
    Ajinomoto deploys autonomous accounting AI agent, cuts workload 76%
    AI Agents
    Ajinomoto's finance arm has begun running an accounting AI agent that autonomously handles expense-approval work, cutting workload by 76%. The move makes it an early mover in a field where intolerance for errors had bred caution about adopting AI.
    Read original (ITmedia AI+) ↗
  • ITmedia AI+ · JA Agents & Tool Use extract
    話題の「Claude Mythos」登場で変わるセキュリティ AIエージェント時代の防衛策
    Claude Mythos reshapes security as AI attacks turn hourly
    AI Agents Claude
    The new AI model "Claude Mythos" makes AI-driven attacks feel imminent, shifting the timeline from months to hours. As vulnerability discovery grows more capable, corporate AI rules and governance lag behind. The article outlines defenses for the AI agent era.
    Read original (ITmedia AI+) ↗
  • arXiv cs.LG (Machine Learning) · EN New Model Releases extract
    Probe-and-Refine Tuning of Repository Guidance for Coding Agents
    Probe-and-Refine: tuning repository guidance for coding agents
    AI Agents Fine-tuning Retrieval-Augmented Generation (RAG) Software Engineering
    The paper presents Probe-and-Refine, a method for tuning the repository guidance (such as AGENTS.md files) that LLM-based coding agents rely on. It targets the higher-level operational knowledge—file layout, test workflows, and error-prone patterns—that is not contained in the code itself.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Agents & Tool Use extract
    Efficient and Sound Probabilistic Verification for AI Agents
    Efficient and sound probabilistic verification for AI agents
    AI Agents Deep Learning Inference Neural Network
    Securing AI agents that operate in complex digital environments has become critical, motivating runtime verification. This paper presents an efficient and sound probabilistic verification approach for AI agents.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Inference & Efficiency extract
    When Does Streaming Tool Use Help? Characterizing Tool-Intent Stabilization in Streaming Retrieval-Augmented Generation
    When does streaming tool use help in streaming RAG?
    Retrieval-Augmented Generation (RAG) Reinforcement Learning Software Engineering
    The paper characterizes when streaming tool use helps in streaming retrieval-augmented generation, which issues tool queries in parallel with ongoing user input to cut perceived latency. It argues the benefit is query-intrinsic and studies how tool intent stabilizes before an utterance is complete.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM Agents
    Investigating over-privileged tool selection in LLM agents
    AI Agents Meta Neural Network
    The paper investigates over-privileged tool selection in LLM agents, which autonomously choose among tools with different privilege levels. It addresses a gap in prior tool-selection research, which focuses on safety-agnostic metadata preferences, by studying when lower-privilege tools would suffice.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Agents & Tool Use extract
    Connect the Dots: Training LLMs for Long-Lifecycle Agents with Cross-Domain Generalization Via Reinforcement Learning
    Connect the Dots: RL training for long-lifecycle LLM agents
    AI Agents Meta Neural Network Reinforcement Learning
    The paper presents Connect the Dots (CoD), a reinforcement-learning framework for training large language models as long-lifecycle agents. It targets the meta-capability of solving a long sequence of tasks while continuously exploring an environment, aiming for cross-domain generalization.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • ITmedia AI+ · JA Industry Adoption extract
    かんぽ生命、AIで営業支援 “郵便局での一言”拾って保険提案へ 寸劇で分かる活用例
    Japan Post Insurance adds AI agents to its sales workflow
    AI Agents
    Japan Post Insurance, serving 17 million customers, has embedded AI agents into its sales workflow, turning offhand remarks at post offices into insurance proposals. A demonstration shows how the technology changes frontline staff preparing for client meetings.
    Read original (ITmedia AI+) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability
    STARE reweights token advantages to stabilize policy entropy
    Algorithms & Theory Retrieval-Augmented Generation (RAG) Reinforcement Learning
    Reinforcement learning with verifiable rewards, such as GRPO, dominates post-training for complex LLM reasoning but often suffers policy entropy collapse. STARE introduces surprisal-guided token-level advantage reweighting to stabilize policy entropy and preserve exploration during training.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Agents & Tool Use extract
    Towards an Agent-First Web: Redesigning the Web for AI Agents
    Towards an agent-first web: redesigning the web for AI agents
    AI Agents Meta Reinforcement Learning
    The Web was built on a three-decade assumption that its primary content consumer is human, which permeates every layer of its access model. This work argues for an agent-first web, redesigning the Web for AI agents and rethinking access, structure, and interaction for an agent-driven era.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Developer Tools extract
    RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents
    RODS: reward-driven online data synthesis for tool-use agents
    AI Agents Inference Reinforcement Learning
    Multi-turn tool-use RL is bottlenecked by the rapid depletion of informative samples in static datasets. Observing that GRPO's gradient signal concentrates on certain tasks, RODS performs reward-driven online data synthesis to continually supply informative samples for multi-turn tool-use agents.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Inference & Efficiency extract
    Decoupling Search from Reasoning: A Vendor-Agnostic Grounding Architecture for LLM Agents
    Decoupling search from reasoning: a vendor-agnostic grounding architecture
    AI Agents Deep Learning Model Context Protocol (MCP) Reinforcement Learning Software Engineering
    Production LLM agents increasingly depend on real-time search but get locked into vendor-specific grounding. This work decouples search from reasoning with a vendor-agnostic grounding architecture, letting search backends be swapped while preserving reasoning quality.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Agents & Tool Use extract
    Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning
    Beyond reward engineering: a data recipe for long-context RL
    AI Agents Retrieval-Augmented Generation (RAG) Reinforcement Learning
    Long-context reasoning is essential for large language models. Rather than relying on reward engineering, this work presents a data recipe for long-context reinforcement learning that drives effective training.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • ITmedia AI+ · JA Agents & Tool Use extract
    「ポケカ対戦AIエージェント」開発コンテスト開始 「不完全情報ゲーム」をどう制するか
    Contest launches to build AI agents for Pokemon TCG, an imperfect-info game
    AI Agents
    A development contest has begun for AI agents that play the Pokemon Trading Card Game. Unlike chess or shogi, it is an "imperfect-information game" where the opponent's hand is hidden, testing how well AI can handle strategic uncertainty.
    Read original (ITmedia AI+) ↗
  • NVIDIA Developer Blog · EN Agents & Tool Use extract
    Building AI Agents for AR Glasses and XR Devices with NVIDIA XR AI
    NVIDIA unveils XR AI to build AI agents for AR glasses and XR devices
    AI Agents Computer Vision Generative AI NVIDIA
    NVIDIA introduced NVIDIA XR AI, a framework for developers to build AI agents for AR glasses and wearable XR devices. It targets the gap between ready hardware and the work of integrating live, real-time AI experiences. Capabilities are per NVIDIA's own announcement; third-party verification pending.
    Read original (NVIDIA Developer Blog) ↗
  • Publickey · JA New Model Releases extract
    GitLab、AIエージェント向けの次世代Git互換ソースコード管理サービス「Project Switch」発表。最大で50倍高速かつ半分のトークンで利用可能に
    GitLab unveils 'Project Switch,' a Git-compatible SCM service for AI agents
    AI Agents Machine Learning
    GitLab announced Project Switch, a next-generation Git-compatible source code management service aimed at AI agents, at its GitLab Transcend event in London. Reports cite up to 50x speed and roughly half the token usage; figures reflect the announcement and remain unverified.
    Read original (Publickey) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Your AI Travel Agent Would Book You a Bullfight: An Agentic Benchmark for Implicit Animal Welfare in Frontier AI Models
    An agentic benchmark for implicit animal welfare in frontier AI
    AI Agents Claude DeepSeek Gemini GPT
    AI agents are shifting from advisors to actors that book travel and run procurement. Existing animal-welfare benchmarks grade only text answers, so this work introduces an agentic benchmark testing whether implicit animal-welfare reasoning transfers to agent actions in frontier models.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • Google DeepMind Blog · EN Agents & Tool Use extract
    Securing the future of AI agents
    DeepMind outlines an AI Control Roadmap to secure AI agents
    AI Agents
    Google DeepMind presents an AI Control Roadmap for securing the future of AI agents, combining traditional safeguards with real-time monitoring to protect internal systems. The framework lays out layered defenses against agent misuse and unsafe behavior as agents proliferate.
    Read original (Google DeepMind Blog) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    Compositional Skill Routing for LLM Agents: Decompose, Retrieve, and Compose
    Compositional skill routing for LLM agents: decompose, retrieve, compose
    AI Agents Model Context Protocol (MCP) Neural Network Reinforcement Learning
    LLM agents rely on reusable tool specifications (skills), but real tasks require composing multiple skills. The paper formalizes compositional skill routing: decomposing a complex query into atomic sub-tasks, retrieving relevant skills, and composing them.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Agents & Tool Use extract
    ProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM Agents
    ProvenanceGuard: source-aware factuality verification for MCP agents
    AI Agents Model Context Protocol (MCP) Software Engineering
    Tool-using LLM agents use the Model Context Protocol to answer from heterogeneous sources like search, APIs, databases and clinical records. ProvenanceGuard provides source-aware factuality verification to catch provenance-sensitive failure modes that standard metrics miss.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    LLM Consumer Behavior Theory: Foundations of a Novel Research Field
    LLM Consumer Behavior Theory: a new field for agentic markets
    AI Agents Natural Language Processing (NLP) Retrieval-Augmented Generation (RAG)
    The paper introduces LLM Consumer Behavior Theory, a proposed field analyzing consumer behavior in agentic markets where LLMs make consumption decisions on behalf of users, drawing on classical and behavioral economics alongside NLP.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Inference & Efficiency extract
    Beyond Domains: Reusing Web Skills via Transferable Interaction Patterns
    Reusing web skills via transferable interaction patterns
    AI Agents Meta Retrieval-Augmented Generation (RAG)
    LLM web agents are usually deployed as tool callers that read a fresh page observation each turn and emit a structured action. The paper proposes reusing web skills across domains via transferable interaction patterns rather than domain-specific behaviors.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • Simon Willison's Weblog · EN New Model Releases extract
    datasette-agent 0.3a0
    Simon Willison releases datasette-agent 0.3a0 with approval-gated SQL writes
    Neural Network
    Simon Willison released datasette-agent 0.3a0, adding a new 'execute_write_sql' tool that requests user approval before writing to a database while respecting user permissions. It extends the approval mechanism introduced in the prior 0.2a0 release, enabling agent-driven write operations under explicit user consent.
    Read original (Simon Willison's Weblog) ↗
  • Publickey · JA New Model Releases extract
    Stack Overflow、AIエージェント同士が掲示板で技術情報を共有する「Stack Overflow for Agents」ベータ公開
    Stack Overflow launches 'Stack Overflow for Agents' beta
    AI Agents Machine Learning
    Stack Overflow has launched a beta of 'Stack Overflow for Agents,' a service where AI agents share technical solutions and other information on an open message board. The move appears aimed at extending its human Q&A knowledge base into information exchange among agents.
    Read original (Publickey) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    GIST-CMTF: Goal-State Inference for Causal Minimal Tool Filtering in LLM Agents
    GIST-CMTF adds goal-state inference to causal minimal tool filtering
    AI Agents Deep Learning Inference
    The paper introduces GIST-CMTF, which augments Causal Minimal Tool Filtering with goal-state inference for tool-augmented LLM agents. It addresses wrong-goal execution, where ambiguous requests such as "handle my appointment" map to multiple goals and an agent may follow a valid causal tool path toward an unintended objective.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    OpenClaw-Skill: Collective Skill Tree Search for Agentic Large Language Models
    OpenClaw-Skill: collective skill tree search for LLM agents
    AI Agents Retrieval-Augmented Generation (RAG) Reinforcement Learning
    The paper proposes Collective Skill Tree Search (CSTS), a tree-search framework that automatically builds reusable skills for LLM agents via iterative collective generation and assessment across multiple models. Claims reflect the abstract.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Funding & M&A extract
    Multimodal Evaluator Preference Collapse: Cross-Modal Contagion in Self-Evolving Agents
    Paper on evaluator preference collapse in self-evolving agents
    AI Agents DeepSeek GPT
    An arXiv paper reportedly examining preference collapse in multimodal evaluators and its cross-modal contagion within self-evolving agent systems. The source excerpt was unavailable (content filter), so this summary is based on the title only; see the original for methods and findings.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Agents & Tool Use extract
    SING: Synthetic Intention Graph for Scalable Active Tool Discovery in LLM Agents
    SING: synthetic intention graph for scalable active tool discovery
    AI Agents Neural Network Reinforcement Learning
    This arXiv paper addresses tool selection for LLM agents whose harnesses connect to hundreds or thousands of APIs, where exhaustive tool-schema injection is costly and imposes a closed-world assumption. Noting that one-shot retrieval often fails to align isolated tool descriptions with the agent's true intent—especially in long-horizon tasks—the authors propose SING, a Synthetic Intention Graph for scalable, active tool discovery.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • ITmedia AI+ · JA New Model Releases extract
    Sakana AI、初の商用プロダクト「Marlin」リリース その実力は?【出力レポート全文掲載】
    Sakana AI launches its first commercial product, Sakana Marlin
    AI Agents Reinforcement Learning
    Sakana AI has launched Sakana Marlin, an AI research agent, commercializing the beta it had offered since April. Ahead of the release it held a press hands-on, showing reporters reports the AI generated from pre-collected themes.
    Read original (ITmedia AI+) ↗