Agents & Tool Use｜AI/Tech News Trends

ITmedia AI+ · 2026-08-02 JA Agents & Tool Use extract

「AI、結局使えないじゃん」問題　セールスフォースが431万件対応で導いた正解

Salesforce turns AI into ROI, handling 4.31M cases in-house

AI Agents

Salesforce completed 4.31 million customer interactions through its own 'customer zero' practice, lifting deal counts by 40-50%. COO Ryota Tanaka outlines the playbook—choosing domains with solid data and KPIs, and starting with simple, high-frequency tasks—to convert AI investment into clear returns.

Read original (ITmedia AI+) ↗

ITmedia AI+ · 2026-08-02 JA Agents & Tool Use extract

賞金1000万のAIコンテスト、でも「実現性は問わず」　サイバーエージェントのAI推進策

CyberAgent's ¥10M AI contest ignores feasibility to shift staff mindset

AI Agents

CyberAgent ran an AI contest with a 10-million-yen prize that deliberately ignores feasibility, aiming to change how employees engage with generative AI. Since tools yield no value if unused, lowering the barrier to entry is meant to spur company-wide adoption.

Read original (ITmedia AI+) ↗

Sakana AI Blog (ja) · 2026-08-02 JA New Model Releases extract

Sakana AI、日本語特化のLLM API「Sakana Namazu」を提供開始

Sakana AI launches Namazu, a Japanese-focused OpenAI-compatible LLM API

AI Agents Inference Machine Learning Meta OpenAI

Sakana AI released Namazu, an LLM API tuned for Japanese and local business use. Built on Moonshot AI's open Kimi K2.6 and refined with in-house data, it adds built-in web search and code execution. Being OpenAI-compatible, existing code works by swapping the base_url, filling the gap between costly frontier models and raw open ones.

Read original (Sakana AI Blog (ja)) ↗

Simon Willison's Weblog · 2026-08-02 EN New Model Releases extract

July 2026 newsletter

Simon Willison publishes his latest monthly newsletter

Anthropic Claude DeepSeek GPT Model Context Protocol (MCP)

Developer Simon Willison released the latest edition of his sponsors-only monthly newsletter. It rounds up recent developments across AI models and tooling—spanning GPT, Claude, DeepSeek, Anthropic, and MCP—offering an individual's closely watched view of the fast-moving AI landscape.

Read original (Simon Willison's Weblog) ↗

ITmedia AI+ · 2026-08-02 JA New Model Releases extract

Google、パーソナルAI「Gemini Spark」を日本でも利用可能に　Chrome統合は米国から

Google expands Gemini Spark personal AI to 160+ countries incl. Japan

AI Agents Gemini Google

Google extended its Gemini Spark personal AI agent to more than 160 countries, including Japan. Running on Google's cloud, it can act even when a PC is off or a phone is locked, handling tasks based on triggers. Chrome integration will roll out first in the US.

Read original (ITmedia AI+) ↗

Simon Willison's Weblog · 2026-07-31 EN New Model Releases extract

Stateless MCP has recaptured my interest (and inspired mcp-explorer and datasette-mcp)

Simon Willison: stateless MCP (MCP 2.0) has recaptured my interest

Anthropic Claude Model Context Protocol (MCP) OpenAI Reinforcement Learning

Simon Willison wrote that the rollout of stateless MCP—the MCP 2.0 or 2026-07-28 Model Context Protocol specification—has renewed his interest in the protocol. He says it inspired him to build tools such as mcp-explorer and datasette-mcp on top of the new stateless design.

Read original (Simon Willison's Weblog) ↗

Simon Willison's Weblog · 2026-07-31 EN New Model Releases extract

llm-mcp-client 0.1a0

Simon Willison releases llm-mcp-client 0.1a0

Model Context Protocol (MCP)

Simon Willison released llm-mcp-client 0.1a0, a tool for connecting his LLM utility to Model Context Protocol (MCP) servers. Detailed in an accompanying blog post, the release adds to the growing set of tooling built around the MCP ecosystem.

Read original (Simon Willison's Weblog) ↗

Simon Willison's Weblog · 2026-07-31 EN New Model Releases extract

datasette-agent 0.4a0

Datasette Agent 0.4a0 lets agent tools run code in the user's browser

Datasette Agent 0.4a0 adds an await context.browser_task() mechanism that lets agent tools execute custom JavaScript directly in the user's browser. The release makes it easier for Datasette Agent plugins to provide tools that run client-side.

Read original (Simon Willison's Weblog) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Infrastructure & Hardware

Beyond Component Testing: Validating Agentic AI Systems

Neural Network Retrieval-Augmented Generation (RAG)

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Agents & Tool Use

Tool Specifications Matter: Uncovering and Mitigating Safety Risks in AI Agents

AI Agents Deep Learning Inference Retrieval-Augmented Generation (RAG) Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-31 EN Agents & Tool Use

Data Turnstile: A Scalable Open Framework for Function-Calling Data Generation

Neural Network

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-31 EN Safety & Evaluation

Don't Mix Rewards, Mix Policies: Policy Decomposition and Optimization for Multi-Reward RL

Inference Reinforcement Learning Software Engineering

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

ITmedia AI+ · 2026-07-31 JA Developer Tools extract

PerplexityがAIエージェントの“暴走”対策ツールをオープンソースに　Claude CodeやCodexを監視

Perplexity open-sources 'Numbat' to rein in runaway AI agents

AI Agents Claude

Perplexity open-sourced Numbat, a set of tools to detect and prevent dangerous AI agent behavior. Integrated with Claude Code and Codex, it aims to stop task-obsessed agents from going rogue before actions execute, adding a safeguard for autonomous agent workflows.

Read original (ITmedia AI+) ↗

ITmedia AI+ · 2026-07-31 JA Inference & Efficiency extract

Chromeに13年以上潜んでいた脆弱性、AIで発見　直近2回のアプデで過去23回分を上回るバグ修正

Google's Gemini agent finds 13-year-old Chrome flaw; tests twice-weekly updates

AI Agents Gemini Google

Google detailed its use of AI for Chrome security, saying a Gemini-based agent uncovered a vulnerability hidden for over 13 years and that its last two updates fixed more bugs than the previous 23 combined. To counter faster AI-driven attacks, Google is trialing twice-weekly security updates.

Read original (ITmedia AI+) ↗

NVIDIA Developer Blog · 2026-07-30 EN Agents & Tool Use extract

Four Ways to Deploy More Secure AI Agents

NVIDIA outlines four ways to deploy more secure AI agents

AI Agents Generative AI NVIDIA

NVIDIA outlined four approaches to deploying AI agents more securely in production, covering access controls, guardrails, and monitoring. The guidance targets security risks that arise as autonomous agents take on real workloads.

Read original (NVIDIA Developer Blog) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN Agents & Tool Use

Benchmarks Are Not Validation: A System-Level View of Financial LLM Applications

Generative AI Reinforcement Learning

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN Agents & Tool Use

AskChem: Claim-Centered Infrastructure for Chemistry Literature Synthesis

AI Agents GPT Model Context Protocol (MCP) Software Engineering

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

Publickey · 2026-07-30 JA New Model Releases extract

JetBrains、AIが少ないトークンでコンテキストを取得しやすく、よりよいコード生成を可能にする「JetBrains Context」発表

JetBrains unveils 'JetBrains Context' to feed AI agents code context efficiently

AI Agents Machine Learning

JetBrains announced JetBrains Context, a service that builds an intelligence layer over code repositories. By supplying AI agents with the right code context using fewer tokens, it aims to enable better code generation from agentic coding tools.

Read original (Publickey) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN New Model Releases

MANTA: Multi-Agent Network Topology Adaptation for Self-Evolving Multi-Agent Systems

Inference Neural Network Retrieval-Augmented Generation (RAG)

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

Microsoft Research Blog · 2026-07-30 EN Agents & Tool Use extract

Echoverse: Deep, evolving environments for computer-use agents

Microsoft's Echoverse trains computer-use agents in evolving environments

AI Agents Microsoft

Microsoft Research unveiled Echoverse, a set of deep, evolving environments for training computer-use agents that struggle with multi-step workflows such as email and customer support. Training in realistic settings aims to improve agents' ability to complete complex tasks.

Read original (Microsoft Research Blog) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-30 EN New Model Releases

EMBL AI Librarian: Life-Sciences Knowledge Layer for AI Agents

AI Agents GPT Software Engineering

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.LG (Machine Learning) · 2026-07-30 EN Agents & Tool Use

ClawTrack: Towards Trace-Level Evaluation and Improvement of Real-World Autonomous Agents

AI Agents Reinforcement Learning

Read original (arXiv cs.LG (Machine Learning)) ↗

arXiv cs.CL (Computation and Language) · 2026-07-30 EN Training & Fine-tuning

FinanceHarness: Autonomous Financial Deep Research Framework

AI Agents

Read original (arXiv cs.CL (Computation and Language)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-29 EN Agents & Tool Use

Can AI agents conduct open-ended AI research? Early evidence from two case studies

AI Agents Reinforcement Learning Software Engineering

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-29 EN New Model Releases

Partner Capability Estimation for Task-Agnostic Adaptation in Ad-Hoc Teamwork

AI Agents Deep Learning Neural Network

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

NVIDIA Developer Blog · 2026-07-29 EN Agents & Tool Use extract

How to Self-Host a Validated AI Coding Assistant with NVIDIA NeMo Guardrails

NVIDIA: self-host a validated AI coding assistant via NeMo Guardrails

AI Agents Generative AI NVIDIA

An NVIDIA developer-blog post on self-hosting a validated AI coding assistant using NeMo Guardrails, framed around agent operation, infrastructure and safety. Note: the raw excerpt was blocked by a content guard, so specific components, supported models and guardrail rules are inferred from the title and URL and remain unverified from the body.

Read original (NVIDIA Developer Blog) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-29 EN Agents & Tool Use

Scores Are Not Decisions: Cost-Aware Stopping for Tool Acquisition in LLM Agents

AI Agents Neural Network

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

Sakana AI Blog (ja) · 2026-07-29 JA New Model Releases extract

Sakana AI防衛・インテリジェンスチーム、「DIVER OSINT CTF 2026」で5位入賞　Fuguを活用したOSINTエージェントの可能性

Sakana AI's defense team places 5th at DIVER OSINT CTF 2026

AI Agents

Sakana AI's defense and intelligence team placed fifth at the DIVER OSINT CTF 2026 competition, using its Fugu tool to power an OSINT agent. The result highlights the potential of AI agents for open-source intelligence and information-analysis tasks.

Read original (Sakana AI Blog (ja)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-29 EN Agents & Tool Use

What Does It Take to Detect an AI Agent? Minimal Feature Sets for Behavioral Detection under Browser Automation

AI Agents Machine Learning Neural Network Transformer

Read original (arXiv cs.AI (Artificial Intelligence)) ↗

arXiv cs.AI (Artificial Intelligence) · 2026-07-29 EN Agents & Tool Use

SecRespond: Benchmarking AI Agents for Real-World Post-Compromise Incident Response

AI Agents Neural Network Reinforcement Learning

Read original (arXiv cs.AI (Artificial Intelligence)) ↗