Infrastructure & Hardware B
Showing 1–30 of 111
-
How Transparent is DiffusionGemma?Probing DiffusionGemma's reasoning transparency in latent spaceDiffusionGemma performs much of its computation in a continuous latent space, raising the question of whether this reduces reasoning transparency. The authors decompose transparency into variable transparency (understanding intermediate computational states) and algorithmic transparency (reconstructing the process behind a model's answer).
-
Optimal Deterministic Multicalibration and OmnipredictionA deterministic algorithm achieving optimal multicalibrationA minimax-optimal multicalibration algorithm that outputs a deterministic predictor, resolving the open question of whether randomization is needed for optimal sample complexity. The result is extended to deterministic predictors satisfying outcome indistinguishability and omniprediction.
-
Predictability as a Fine-Grained Measure for PrivacyPrivacy via predictability, a fine-grained privacy measureThe paper introduces 'privacy via predictability,' a fine-grained privacy framework that explicitly incorporates an attacker's core prior knowledge. It aims to ease the costly privacy-accuracy tradeoff imposed by the worst-case guarantees of differential privacy.
-
Execution-State Capsules: Graph-Bound Execution-State Checkpoint and Restore for Low-Latency, Small-Batch, On-Device Physical-AI ServingExecution-State Capsules: checkpoint/restore for on-device AI servingThe paper introduces Execution-State Capsules, a graph-bound mechanism to checkpoint and restore execution state for low-latency, small-batch, on-device physical-AI serving. It targets scenarios beyond the high-throughput, high-concurrency regime that paged or radix KV caches mainly serve.
-
How Do Instructions Shape Speech? Cross-Attention Attribution for Style-Captioned Text-to-SpeechCross-attention attribution for style-captioned text-to-speechStyle-captioned text-to-speech systems use natural language to control voice characteristics. This work uses cross-attention attribution to analyze how individual instruction words shape the generated speech.
-
Optimal Order of Multi-Agent and General Many-Body SystemsOptimal order of multi-agent and general many-body systemsThis paper develops a general framework for analyzing multi-agent systems with feedback loops between agents, as well as general many-body systems, and characterizes their optimal order.
-
UltraQuant: 4-bit KV Caching for Context-Heavy AgentsUltraQuant: 4-bit KV caching for context-heavy agentsContext-heavy agents put unusual pressure on the key-value cache as long prefixes are reused across calls. UltraQuant applies 4-bit quantization to compress the KV cache while preserving quality.
-
SSH-Net: A Deep Neural Network for Predicting Failure Time Distribution Functions under Competing Risks with Application to GPU DataSSH-Net: predicting failure-time distributions under competing risksThe paper proposes SSH-Net, a deep neural network for predicting failure-time distribution functions under competing risks. It targets time-to-event modeling in complex engineering settings and is demonstrated on GPU failure data, building on the flexibility of neural networks for competing-risk prediction.
-
HEPTv2: End-to-End Efficient Point Transformer for Charged Particle ReconstructionHEPTv2: an efficient point transformer for particle trackingThe paper presents HEPTv2, an end-to-end efficient point transformer for charged-particle reconstruction. It targets tracking—reconstructing trajectories from sparse detector measurements under extreme combinatorial ambiguity—aiming to stay accurate and efficient at the High-Luminosity LHC.
-
Multi-View Decompilation for LLM-Based Malware ClassificationMulti-view decompilation for LLM-based malware classificationMalware analysts often inspect compiled binaries through decompiled pseudo-C when source code is unavailable. This work uses multi-view decompilation to improve LLM-based malware classification.
-
Pseudo-Feature Padding: A Lightweight Defense Against False Data Injection in Power GridsPseudo-Feature Padding: a defense against grid false-data injectionThe paper proposes Pseudo-Feature Padding, a lightweight defense against false data injection attacks in power grids. It targets the vulnerability of deep neural network detectors in cyber-physical systems, where attackers can craft inputs to evade detection during critical operations.
-
On the Redundancy of Timestep Embeddings in Diffusion ModelsAre timestep embeddings redundant in diffusion models?The paper challenges the necessity of explicit timestep embeddings in diffusion models, which are typically used to modulate denoising across noise scales. Through empirical analysis of U-Net and Diffusion Transformer architectures, together with theoretical arguments, it examines whether these temporal signals are redundant.
-
Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 RecipeRethinking shrinkage bias in LLM FP4 pretraining with a UFP4 recipeFP4 training promises large memory and compute savings for LLM pretraining but suffers from shrinkage bias. This paper analyzes its geometric origin and systemic impact and proposes a UFP4 recipe to address it.
-
Judging to Improve: A De-biased VLM-as-3D-Judge Protocol for Single-Image 3D GenerationUsing a de-biased VLM 3D judge to improve single-image 3D generationThe paper presents a de-biased VLM-as-3D-judge protocol for single-image 3D generation. Building on a cross-model judge that ranks single-image-to-3D mesh quality where geometry and CLIP proxies fall short, it asks whether the judge's preferences can cheaply specialize a strong open generator, TRELLIS, on one asset class such as furniture without human labels.
-
Shifting-based Optimizable Linear Relaxations for General Activation FunctionsShifting-based linear relaxations for general activation functionsThe paper proposes a shifting-based method for constructing optimizable linear relaxations of general activation functions, used in formal verification of neural networks. It removes the need for hand-crafted relaxations tailored to each activation, supporting formal guarantees in safety- and security-critical settings.
-
Navigating Unreliable Parametric and Contextual Knowledge: Explicit Knowledge Conflict Resolution for LLM InferenceExplicit knowledge conflict resolution for LLM inferenceLarge language models perform strongly across language tasks but can hold conflicting parametric and contextual knowledge. This work proposes explicit knowledge conflict resolution to navigate unreliable knowledge during inference.
-
Beyond Accuracy: Measuring Logical Compliance of Predictive ModelsBeyond accuracy: measuring logical compliance of predictive modelsMachine learning models are mostly evaluated through predictive metrics such as accuracy. This work goes beyond accuracy to measure the logical compliance of predictive models.
-
Pitch Spelling Jazz Lead Sheets, Solo Transcriptions, Classical Piano and Monophonic ScoresAn algorithm for pitch spelling and key estimation from MIDIThe paper presents an algorithm for pitch spelling and key estimation across jazz lead sheets, solo transcriptions, classical piano, and monophonic scores. Given MIDI-like input with note pitches and bar boundaries, it estimates note names, a global key signature, and local scales.
-
HydraHead: From Head-Level Functional Heterogeneity to Specialized Attention HybridizationHydraHead: head-level hybridization of linear and full attentionThe paper proposes HydraHead, a hybrid attention design that exploits head-level functional heterogeneity to combine linear and full attention. It moves beyond the common layer-wise hybridization strategy, addressing the difficulty of integrating linear attention with full attention for efficient long-context processing.
-
GEMS: Geometric Constraints Enable Multi-Semantic Superposition in LLMsGEMS: geometric constraints for multi-semantic activation steeringThe paper introduces GEMS, which uses geometric constraints to enable superposing multiple semantic directions in LLM activation steering. It addresses the collapse that occurs when existing single-direction steering methods inject several semantic directions at once without constraints.
-
AMD silently removes memory encryption from consumer Ryzen CPUsAMD silently removes memory encryption from consumer Ryzen CPUsAn article reporting that AMD silently removed memory encryption from consumer Ryzen CPUs. It raises concerns about losing protection against physical attacks and the impact on security-conscious users.
-
Leverage Is Not Reach: A Control-Window Law for Single-Neuron Steering in Language ModelsA control-window law for single-neuron steering in LLMsThe paper develops a budget-normalized control-window framework for single-neuron steering in language models. It seeks to predict when intervening on one neuron coherently controls a behavior—such as refusal or language routing gated by sparse feed-forward neurons—rather than collapsing the output.
-
GLM-5.2 is probably the most powerful text-only open weights LLMGLM-5.2 may be the most powerful text-only open weights LLMChinese AI lab Z.ai released GLM-5.2 to coding-plan subscribers on June 13 and then published full open weights under an MIT license on June 16. Similar in size to GLM-5 and GLM-5.1, it may be the most powerful text-only open weights LLM, per Simon Willison.
-
Explaining Attention with Program SynthesisExplaining attention via program synthesis for interpretabilityA longstanding goal of interpretable deep learning is to replace opaque neural computations with human-meaningful symbolic descriptions. This paper approximates the behavior of attention components with synthesized programs, offering a route to explain attention and improve interpretability.
-
P-K-GCN: Physics-augmented Koopman-enhanced Graph Convolutional Network for Deep Spatiotemporal Super-resolutionP-K-GCN fuses physics and Koopman for spatiotemporal super-resolutionHigh-fidelity simulation of spatiotemporal dynamics is computationally prohibitive, demanding efficient super-resolution. P-K-GCN integrates physical constraints and Koopman operator theory into a graph convolutional network to reconstruct high-resolution spatiotemporal fields from coarse data.
-
Optimal scenario design for climate emulationOptimal scenario design improves climate emulation surrogatesAs deep learning for physical systems grows, efforts to improve generalizability have focused on architectures embedding physical constraints. This work instead studies optimal scenario design for machine-learning surrogate models of climate, improving generalization and predictive accuracy.
-
Detecting Hidden ML Training With Zero-Overhead TelemetryZero-overhead telemetry detects hidden ML training runsHardware-enabled monitoring of GPU workloads underpins many AI compute-governance proposals, but if developers can defeat monitoring, such schemes fail. This work evaluates detecting hidden ML training using zero-overhead telemetry, testing how robustly monitoring can support compute governance.
-
A Human-in-the-Loop Bayesian Optimization Framework for Constraint-Aware Bioprocess DevelopmentHuman-in-the-loop Bayesian optimization for bioprocess developmentThis work extends Pareto Front Guided Sampling (PFGS), a human-in-the-loop Bayesian optimization framework, by reformulating Gaussian-process surrogate quantities as objectives. It enables constraint-aware bioprocess development, blending expert input with efficient search for optimal conditions.
-
RECOM: A Validity Discrimination Tradeoff in Automatic Metrics for Open Ended Reddit Question AnsweringRECOM analyzes validity vs discrimination in automatic metricsAutomatic metrics are the default for evaluating LLM-generated text, yet a metric is quietly asked to do two jobs: tell genuine content alignment from surface coincidence (validity) and discriminate quality. Using open-ended Reddit QA, RECOM analyzes this validity–discrimination trade-off.
-
Language Models as Interfaces, Not Oracles: A Hybrid LLM-ML System for Pediatric AppendicitisHybrid LLM-ML system uses LLMs as interfaces for appendicitisLLMs can broaden clinical decision support by interpreting free-text documentation, but using them directly as diagnostic engines is limited by sensitivity to prompts and information order. This work treats LLMs as interfaces, not oracles, pairing them with ML for pediatric appendicitis diagnosis.