Industry Adoption C
Showing 31–60 of 83
-
GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory AgentsGateMem: benchmarking memory governance in shared-memory agentsMemory benchmarks for LLM agents largely assume single-user settings, leaving shared-memory governance untested. GateMem benchmarks memory governance, such as access control and management, in multi-principal shared-memory agents.
-
ForecastBench-Sim: A Simulated-World Forecasting BenchmarkForecastBench-Sim: a simulated-world forecasting benchmarkForecasting benchmarks for general-purpose AI usually inherit real-world events, making evaluation hard to control. ForecastBench-Sim introduces a simulated-world forecasting benchmark, enabling controlled assessment of AI forecasting ability.
-
日立、OpenAIとの連携を本格化 「Codex」でレガシーシステム刷新、サイバー防衛もHitachi deepens OpenAI tie-up, using Codex to modernize legacy systemsHitachi is expanding its partnership with OpenAI, pairing the code-analysis AI "Codex" with its own systems-development expertise. It aims to establish an AI-driven workflow that visualizes upstream specifications from existing code through migration testing, and also cites cybersecurity defense as a use case.
-
PLaMo-3.0-Prime-β を LLM 開発の現場で使うPreferred Networks shows PLaMo-3.0-Prime-β in real LLM developmentPreferred Networks continues developing its large language model PLaMo and shares how to use the latest PLaMo-3.0-Prime-β in real development work. Beyond training large models, it covers the many surrounding tasks involved in building high-performance LLMs in practice.
-
LLM Serving Fairness: No more noisy neighboursCohere ensures fair compute sharing across LLM serving tenantsCohere details how it ensures every tenant gets a fair share of compute in LLM serving, tackling the 'noisy neighbour' problem where one user monopolizes resources. The design allocates capacity fairly across tenants to deliver stable, predictable multi-tenant performance.
-
セルフ給油、実はスタッフが手動で許可していた!? コスモ石油の「AI監視」は消えゆくガソリンスタンドを救うかCosmo Oil and ELEMENTS build AI to approve self-service refuelingAt self-service gas stations in Japan, staff still manually approve refueling after a safety check. Cosmo Oil Marketing and ELEMENTS have jointly developed a monitoring system in which AI judges whether to permit refueling, aiming to support that task. The article cites labor shortages and a declining number of service stations as background. Details are per the article and the companies.
-
Visual Verification Enables Inference-time Steering and Autonomous Policy ImprovementVERITAS steers and self-improves robot policies at inference timeThe paper proposes VERITAS, a generator-verifier framework pairing a pre-trained generalist robot policy with a gradient-free visual verifier that evaluates actions at inference time, improving performance without extra training and enabling self-improvement.
-
Analyzing and Encoding the Al-Mawrid Arabic-English Dictionary with the ISO Language Markup Framework and TEI Lex-0Encoding the Al-Mawrid Arabic-English dictionary with LMF and TEI Lex-0The paper presents a methodology to systematically digitize and encode the legacy print Al-Mawrid Arabic-English dictionary using the ISO Language Markup Framework and TEI Lex-0, addressing a gap in Arabic lexical infrastructure by producing a standardized computational lexicon.
-
All Smoke, No Alarm: Oracle Signals in Agent-Authored Test CodeStudy finds agent-authored test code often lacks real verification logicThe paper examines test code generated by AI coding agents in open-source pull requests, arguing that test files lacking explicit assertions verify no behavior, so presence-based quality gates overestimate verification strength.
-
Memory as a Wasting Asset: Pricing Flash Endurance for Embodied Agents, and the Limits of Doing SoPricing flash endurance as a wasting asset for embodied agentsA robot's flash endurance is a non-renewable stock: each persisted write spends one of a few thousand program/erase cycles and never refills. The paper frames flash endurance as a wasting asset, proposes pricing it for embodied agents, and examines the limits of doing so.
-
Knowledge Reutilization in Meta-Reinforcement LearningA meta-knowledge reutilization framework for meta-RL across agentsThe paper proposes a meta-knowledge reutilization framework for meta-reinforcement learning that learns task-level knowledge on a dynamics-simplified agent and transfers it to heterogeneous agents, using a Bayesian non-parametric prior to organize latent task modes.
-
Ternary Mamba: Grouped Quantization-Aware Training of W1.58A16 State Space ModelsTernary Mamba: grouped QAT for W1.58A16 state space modelsTernary Mamba applies grouped quantization-aware training to Mamba state space models with ternary (W1.58) weights and 16-bit activations, targeting efficient low-bit training and inference of sequence models while preserving accuracy.
-
PseudoBench: Measuring How Agentic Auto-Research Fuels PseudosciencePseudoBench measures how agentic auto-research fuels pseudoscienceAs LLM-based agents enter autonomous scientific research, resisting pseudoscience matters. PseudoBench is an adversarial benchmark measuring how such agents may rapidly generate plausible yet misleading studies that contaminate academic literature.
-
INI-VPINN: A Variational Physics-Informed Neural Network with Implicit Neumann and Interface Handling for Multi-Material Domains with Geometric SingularitiesINI-VPINN: a variational PINN for multi-material domainsINI-VPINN is a weak-form physics-informed neural network that naturally incorporates Neumann boundary and interface conditions into a variational formulation, targeting multi-material domains with geometric singularities.
-
Predictive Analytics in E-Commerce for CustomerBehavior Forecasting using hybrid Ret-DNN withXGBoost ModelHybrid Ret-DNN with XGBoost for e-commerce behavior forecastingE-commerce platforms struggle to understand customer behavior and predict future purchases. The study proposes predictive analytics using a hybrid Ret-DNN combined with an XGBoost model to forecast customer behavior.
-
Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning ModelsDynamic rollout editing reduces overthinking in RL reasoning modelsLong chain-of-thought reasoning helps, but models often keep generating unnecessary reasoning after reaching a correct answer. Framing this as overthinking in GRPO-style RL post-training, the paper proposes dynamic rollout editing to reduce it.
-
OpenAIの高度AIでソフトバンクの脆弱性を1万件発見 孫正義氏「大変な危機」 日本の重要インフラ企業へ診断サービス提供SoftBank unveils OpenAI-powered Patching-as-a-Service security offeringSoftBank Group announced "Patching as a Service" on June 16, a cybersecurity offering built on OpenAI technologies such as "GPT-5.5 Cyber." It simulates attacks on corporate systems to find vulnerabilities, then proposes remediation plans and implementation end-to-end. SoftBank says it will prioritize select firms supporting Japan's critical infrastructure, while chairman Masayoshi Son stressed the gravity of the cyber threat.
-
EnvRL: Learn from Environment Dynamics in Agentic Reinforcement LearningEnvRL learns from environment dynamics in agentic RLEnvRL is a method that learns from environment dynamics in agentic reinforcement learning, leveraging the structure of agent-environment interaction to improve learning efficiency and performance.
-
Prompt Perturbation for Reliable LLM Evaluation over Comparison GraphsPrompt perturbation for reliable LLM evaluation over comparison graphsEvaluating LLMs is important but can be fragile to small prompt changes. The paper proposes using prompt perturbation to achieve more reliable LLM evaluation over comparison graphs.
-
Predicting model behavior before release by simulating deploymentOpenAI unveils Deployment Simulation to predict model behavior pre-releaseOpenAI introduced Deployment Simulation, a method to predict an AI model's behavior before deployment by using real conversation data to simulate responses, aiming to improve safety and evaluation accuracy. The claims are OpenAI's own and not independently verified.
-
月2000時間のムダをなくす大阪ガスらのNotion×AI活用 「使われない情報」の生かし方Osaka Gas cuts 2,000 hours/month via Notion-plus-AI knowledge reuseTwo companies including Osaka Gas sharply reduced the burden of hunting for documents by combining Notion with AI. Achieving 2,000 hours of monthly savings, the case turns buried information into organizational knowledge assets and highlights how to build systems that prevent over-reliance on individuals.
-
生成AI×3D CADでどこまでできるか試してみたTesting generative AI with 3D CAD using Autodesk Fusion's AssistantGenerative AI is expanding beyond text, images, and video into 3D CAD, with environments emerging that draft 3D models from natural-language prompts alone. The article tries Autodesk Fusion's Autodesk Assistant to model a plastic bottle, illustrating both the promise and current limits of pairing generative AI with 3D CAD.
-
300億円は「ROI不問」 Olive、Trunkを仕掛けるSMBC、新規事業の神髄は「撤退」にアリSMBC plans 50B yen in generative-AI investment; key to ventures is exitSumitomo Mitsui Financial Group grew its Olive and Trunk services and unveiled a 50-billion-yen generative-AI investment plan. Once a bank lagging rivals in mobile a decade ago, it became an organization that repeatedly ships new ventures, finding the essence of new business in knowing when to exit.
-
The Value Axis: Language Models Encode Whether They're on the Right TrackLLMs encode a 'value axis' tracking if their strategy worksResearchers built a 'value axis' for Qwen3-8B that captures whether its current strategy is likely to reach its goal. The axis separates high- and low-confidence rollouts, backtracking, and correct vs. corrupted code; steering it up suppresses self-correction while steering down induces exploration. DPO can raise the internal value of rewarded behaviors.
-
ROVE: Unlocking Human Interventions for Humanoid Manipulation via Reinforcement LearningROVE: RL that learns humanoid manipulation from imperfect interventionsROVE is an RL framework for post-training humanoid Vision-Language-Action models from imperfect human interventions. It pairs a human-in-the-loop data pipeline with Optimistic Value Estimation to prioritize high-value behaviors in mixed-quality trajectories, and adds cross-embodiment human videos to robustify value estimation.
-
A Causal Model of Theory of Mind in Conflict for Artificial IntelligenceA structural causal model for when AI should engage theory of mind in conflictTheory of mind (ToM), ascribing mental states to others for prediction and inference, is widely assumed essential for human-machine integration. Existing AI-ToM models address how to mentalize but leave when largely unaddressed. The paper asks under what situational and agent-level conditions ToM engagement is causally warranted in conflict, presenting a structural causal model as a directed acyclic graph that treats ToM as a mechanism activated by conditions rather than an always-on capacity.
-
CrossMaps: Confidence-Aware Open-Vocabulary Semantic Mapping for Rover NavigationCrossMaps: confidence-aware open-vocabulary semantic mapping for roversRovers rely on perception to maintain spatial maps encoding objects and sensor quality (range reliability, lighting artifacts, data density) to guide fusion, embedding updates, and navigation under partial observability. The paper presents CrossMaps, a real-time confidence-aware open-vocabulary semantic mapping pipeline that builds language-queryable maps from RGB-D data, extending VLMaps-style approaches with multi-scale CLIP embeddings, confidence-aware fusion, and a dual-memory architecture.
-
Exploring Extrinsic and Intrinsic Properties for Effective Reasoning with Code InterpreterStudy probes extrinsic and intrinsic traits of code-interpreter reasoningThis paper studies reasoning with a Code Interpreter (CI) in LLMs from two angles: extrinsic properties (crucial tokens) and intrinsic properties (code-specific cognitive behaviors). It reports that stronger CI reasoning models show more crucial tokens and behaviors—especially verification, backtracking, and backward chaining—and explores leveraging these at inference and training time. Summarized neutrally from the abstract.
-
Beyond Models: Reflections on Engineering AI-enabled Systems in a Project-Based CourseReflections on teaching the engineering of AI-enabled systems in a courseThis paper reflects on a project-based master's course at the University of Bremen on engineering AI-enabled systems. It argues that machine learning courses emphasize model development while students lack experience in architectural design, deployment, and monitoring, and reports on the course's design and implementation.
-
Does Traversal Order Matter? A Systematic Study of Tree Traversal Methods in Transformer GrammarsPaper: compares tree traversal orders in Transformer GrammarsAn arXiv paper systematically studies tree linearization orders in Transformer Grammars, exploring breadth-first and a novel Production-Rule Traversal alongside the conventional depth-first approach. Summarized neutrally from the abstract.