New Model Releases A
Showing 1–30 of 269
-
理研、AI for Science向けスパコンの名前を「理究」(りきゅう)に決定 由来は?RIKEN names its AI-for-Science supercomputer 'Rikyu' (Rikyu)RIKEN, Japan's national research institute, has decided to name its supercomputer dedicated to AI for Science 'Rikyu' (理究). The announcement also explains the origin and reasoning behind the chosen name.
-
GMO傘下、Unitreeの国内正規代理店に 人型ロボの導入から保守まで一気通貫で支援GMO unit becomes Unitree's official robot distributor in JapanGMO AI & Robotics, a GMO Internet Group subsidiary, has signed an official distributor agreement with China's Unitree Robotics for the Japanese market. It will offer end-to-end support, from deployment to maintenance, aiming to accelerate humanoid robot adoption across Japan.
-
画面操作を“録画”→AIが作業代行 Codexに新機能「Record & Replay」OpenAI adds 'Record & Replay' to Codex to automate recorded UI stepsOpenAI has added a new 'Record & Replay' feature to its Codex coding agent. Users record on-screen operations, and the AI then reproduces those steps to carry out the task automatically, according to ITmedia.
-
Gartnerが警鐘 プライバシー法執行が本格化、CISOは何を見直すべきか?Gartner warns US privacy-law fines topped $3.4B in 2025Gartner reports that US state authorities imposed about $3.425 billion in privacy-law violation fines in 2025, exceeding the combined total of the previous five years. It expects enforcement to keep accelerating through 2028, urging CISOs to reconsider their privacy and compliance posture.
-
ChatGPTで広告テスト、日本でも開始 非表示にする方法は?OpenAI begins testing ads in ChatGPT in JapanOpenAI's Japan arm announced it has started testing ad displays within ChatGPT in Japan. The article explains how the ads appear and how users can hide them.
-
Datasette Apps: Host custom HTML applications inside DatasetteDatasette Apps lets you host custom HTML apps inside DatasetteSimon Willison introduced Datasette Apps, letting developers host custom HTML/JS applications inside a Datasette instance. The apps can read Datasette's databases, enabling lightweight, data-backed web apps served directly from the data exploration tool itself.
-
UNIEGO: Proxies as Mediators for Unified Egocentric Video Representation LearningUNIEGO: unified egocentric video encoder via multi-teacher distillationUNIEGO is a unified egocentric video encoder trained via a hierarchical multi-teacher distillation framework. Representation-specific proxy models translate knowledge from teachers spanning multiple viewpoints, modalities, and foundation models into a single egocentric space, while remaining deployable from egocentric video alone.
-
Predictability as a Fine-Grained Measure for PrivacyPrivacy via predictability, a fine-grained privacy measureThe paper introduces 'privacy via predictability,' a fine-grained privacy framework that explicitly incorporates an attacker's core prior knowledge. It aims to ease the costly privacy-accuracy tradeoff imposed by the worst-case guarantees of differential privacy.
-
Multi-Task Bayesian In-Context LearningMulti-task Bayesian inference via in-context learningThe paper studies multi-task Bayesian in-context learning, using in-context learning to perform Bayesian predictive inference across tasks. It targets the intractability of exact inference and the cost or restrictiveness of scalable approximations, aiming for uncertainty quantification and data efficiency.
-
Execution-State Capsules: Graph-Bound Execution-State Checkpoint and Restore for Low-Latency, Small-Batch, On-Device Physical-AI ServingExecution-State Capsules: checkpoint/restore for on-device AI servingThe paper introduces Execution-State Capsules, a graph-bound mechanism to checkpoint and restore execution state for low-latency, small-batch, on-device physical-AI serving. It targets scenarios beyond the high-throughput, high-concurrency regime that paged or radix KV caches mainly serve.
-
LedgerAgent: Structured State for Policy-Adherent Tool-Calling AgentsLedgerAgent: structured state for policy-adherent tool-calling agentsPolicy-adherent tool-calling agents in customer-service domains must track task state across turns while following rules. LedgerAgent introduces structured state to help such agents stay consistent and policy-compliant.
-
StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMsStylisticBias: few visual cues drive most social bias in MLLMsStylisticBias investigates the visual cues that shape how multimodal large language models judge people. The study finds that a small set of human visual cues drives most of the social biases exhibited by MLLMs, which are increasingly deployed in consequential settings.
-
DeepSWIP: Quotient-WMC Counterfactuals for Neural Probabilistic Logic ProgramsDeepSWIP: quotient-WMC counterfactuals for neural probabilistic logic programsNeurosymbolic systems such as DeepProbLog combine neural perception with probabilistic logic, but standard inference has limits. DeepSWIP introduces quotient-WMC counterfactuals to enable counterfactual reasoning in neural probabilistic logic programs.
-
Sovereign Execution Brokers: Enforcing Certificate-Bound Authority in Agentic Control PlanesSovereign Execution Brokers for agentic control planesAutonomous agents are increasingly wired into cloud, deployment, and data-control workflows, straining production security. This work proposes sovereign execution brokers that enforce certificate-bound authority within agentic control planes.
-
FlowEdit: Associative Memory for Lifelong Pronunciation Adaptation in Flow-Matching TTSFlowEdit: associative memory for lifelong pronunciation adaptation in TTSFlow-matching text-to-speech achieves strong zero-shot quality but stays static after deployment. FlowEdit uses associative memory to enable lifelong pronunciation adaptation without full retraining.
-
Multi-LCB: Extending LiveCodeBench to Multiple Programming LanguagesMulti-LCB: extending LiveCodeBench to multiple programming languagesLiveCodeBench has become a widely adopted benchmark for evaluating large language models on code. Multi-LCB extends it to multiple programming languages to assess multilingual code generation.
-
Probe-and-Refine Tuning of Repository Guidance for Coding AgentsProbe-and-Refine: tuning repository guidance for coding agentsThe paper presents Probe-and-Refine, a method for tuning the repository guidance (such as AGENTS.md files) that LLM-based coding agents rely on. It targets the higher-level operational knowledge—file layout, test workflows, and error-prone patterns—that is not contained in the code itself.
-
Efficient and Sound Probabilistic Verification for AI AgentsEfficient and sound probabilistic verification for AI agentsSecuring AI agents that operate in complex digital environments has become critical, motivating runtime verification. This paper presents an efficient and sound probabilistic verification approach for AI agents.
-
FreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA MiningFreeStyle: dual-reference style-content control via community LoRA miningStyle-content dual-reference generation aims to synthesize an image that preserves structure while adopting a reference style. FreeStyle leverages community LoRA mining to give free control over style and content.
-
Calibration Without Comprehension: Diagnosing the Limits of Fine-Tuning LLMs for Vulnerability Detection in Systems SoftwareDiagnosing whether fine-tuned LLMs comprehend software vulnerabilitiesIt is unclear whether LLMs that score well on vulnerability benchmarks truly reason about security or merely pattern-match. This work diagnoses the limits of fine-tuning LLMs for vulnerability detection in systems software.
-
Contagion Networks: Evaluator Bias Propagation in Multi-Agent LLM SystemsContagion Networks: evaluator bias propagation in multi-agent LLMsWhen large language models act as evaluators in multi-agent systems, their systematic evaluation biases can spread through the system. This work analyzes how such evaluator bias propagates across agents.
-
Beyond Global Replanning: Hierarchical Recovery for Cross-Device Agent SystemsHierarchical recovery for cross-device agent systemsThe paper proposes a hierarchical recovery mechanism for cross-device agent systems, moving beyond coarse-grained global replanning. It targets real-world computer-use tasks that span multiple applications and devices and must coordinate heterogeneous environments under dynamic runtime failures.
-
Optimal Order of Multi-Agent and General Many-Body SystemsOptimal order of multi-agent and general many-body systemsThis paper develops a general framework for analyzing multi-agent systems with feedback loops between agents, as well as general many-body systems, and characterizes their optimal order.
-
New usage analytics and updated spend controls for enterprisesOpenAI adds usage analytics and spend controls to ChatGPT EnterpriseOpenAI introduced new usage analytics and updated spend controls for ChatGPT Enterprise, helping organizations track and manage AI costs while scaling with confidence. Admins gain visibility into per-team consumption and can set limits to optimize spend.
-
Scalable Training of Spatially Grounded 2D Vision-Language Models for RadiologyRefRad2D: training spatially grounded radiology VLMs at scaleThe paper studies how to train spatially grounded vision-language models for radiology without manual spatial annotations. It introduces RefRad2D, a large-scale bilingual (German/English) dataset of 1.2M CT and MR image-text pairs derived from clinical practice, with VQA and spatial grounding subsets.
-
Sparsity, Superposition, and Forgetting: A Mechanistic Study of Representation Retention in Continual LearningA mechanistic study of forgetting in continual learningThe paper presents a mechanistic study of representation retention in continual learning, using a controlled toy-world framework to make the drivers of forgetting observable and testable. It examines how sparsity and superposition relate to forgetting, isolating mechanisms that real datasets usually entangle.
-
Neural network surrogates with uncertainty quantification for inverse problems in partial differential equationsNN surrogates with uncertainty quantification for PDE inverse problemsThe paper develops neural network surrogates with uncertainty quantification for inverse problems in partial differential equations. It targets the inference of unknown model parameters from noisy or incomplete observations, where traditional numerical methods are costly, particularly in Bayesian settings.
-
Pseudo-Feature Padding: A Lightweight Defense Against False Data Injection in Power GridsPseudo-Feature Padding: a defense against grid false-data injectionThe paper proposes Pseudo-Feature Padding, a lightweight defense against false data injection attacks in power grids. It targets the vulnerability of deep neural network detectors in cyber-physical systems, where attackers can craft inputs to evade detection during critical operations.
-
Direct Advantage Estimation for Scalable and Sample-efficient Deep Reinforcement LearningScalable, sample-efficient direct advantage estimation for deep RLThe paper improves Direct Advantage Estimation (DAE) for scalable and sample-efficient deep reinforcement learning. It addresses DAE's reliance on full environment observability and the computational overhead of modeling transition probabilities, which limit its use in realistic settings.
-
DataMagic: Transforming Tabular Data into Data Insight VideoDataMagic: turning tabular data into data-insight videosData videos combine dynamic charts, voice narration, and synchronized animation to convey insights. DataMagic automatically transforms tabular data into such data-insight videos.