Developer Tools B
Showing 271–300 of 304
-
Arch Linux Now Believes Malware Incident Under Control: More Than 1,500 PackagesArch Linux says 1,500+ package malware incident is containedArch Linux now believes a malware incident affecting more than 1,500 packages is under control, an episode that highlights ongoing supply-chain security risks in software package ecosystems.
-
Statement on the US government directive to suspend access to Fable 5 and Mythos 5Willison on the US directive to suspend Fable 5 and Mythos 5Simon Willison comments on the US government's national-security export-control directive suspending all foreign-national access to Fable 5 and Mythos 5, calling the move extraordinary and questioning its rationale and impact.
-
OpenAI WebRTC Audio Session, now with document contextSimon Willison adds document context to his OpenAI WebRTC audio toolSimon Willison updated his browser tool for OpenAI's WebRTC realtime audio API. It now supports the newer realtime voice model touting GPT-5-class reasoning, and lets users paste document text as context for spoken conversations about it.
-
Ire identifies another LOTUSLITE specimenMicrosoft's Project Ire AI flags LOTUSLITE malware missed by EDR toolsMicrosoft Research reports its autonomous malware-analysis agent Project Ire reverse-engineered a new specimen and identified LOTUSLITE traits that most major EDR tools failed to detect, underscoring AI's expanding role in threat analysis.
-
Gaze Heads: How VLMs Look at What They Describe'Gaze heads' in VLMs track and steer described image regionsThe paper identifies a small set of attention heads, dubbed gaze heads, that track the image region a vision-language model is currently describing. Intervening on the top ~100 of them can steer the model to describe any chosen region.
-
Persona-Pruner: Sculpting Lightweight Models for Role-PlayingPersona-Pruner sculpts lightweight role-playing language modelsPersona-Pruner is a pruning approach that sculpts lightweight language models specialized for role-playing. It aims to retain consistent, persona-driven interaction while reducing model size.
-
A Complexity Measure for Active Learning in Multi-group Mean EstimationA complexity measure for active multi-group mean estimationThe paper studies active learning for multi-group mean estimation framed as a d-armed bandit minimizing max-risk. It introduces a complexity measure characterizing the difficulty of adaptive budget allocation.
-
CottonLeafVision: An Explainable and Robust Deep Learning Framework for Cotton Leaf Disease ClassificationCottonLeafVision: explainable, robust deep learning for cotton leaf diseaseCotton underpins the textile industry, so accurate detection of cotton leaf disease is crucial for economic stability. The paper proposes CottonLeafVision, an explainable and robust deep learning framework for classifying cotton leaf diseases.
-
HumP-KD: A Hybrid Uncertainty-Aware Multi-Stage Progressive Knowledge Distillation Framework for Efficient Fire ClassificationHumP-KD: uncertainty-aware distillation for efficient fire classificationHumP-KD is a hybrid, uncertainty-aware multi-stage progressive knowledge distillation framework for fire classification. It targets models that are simultaneously accurate and efficient for real-time use.
-
Optimal Hidden-Target Learning for Online Inventory Optimization on General Convex SetsOptimal hidden-target learning for online inventory optimizationThe work casts online inventory optimization as online convex optimization with memory, where carryover makes the feasible set history-dependent. It develops an optimal hidden-target learning method on general convex sets.
-
AgentSpec: Understanding Embodied Agent Scaffolds Through Controlled CompositionAgentSpec dissects embodied agent scaffolds via controlled compositionAgentSpec studies scaffolded LLM agents that combine reasoning, memory, reflection, and action through controlled composition. It aims to isolate how each component contributes to overall performance.
-
Giving AI a Headache: Acoustic Adversarial Attacks to Computer Vision ApplicationsAcoustic adversarial attacks that disrupt computer vision systemsAs AI automates real-world computer vision applications such as autonomous vehicle control, this paper demonstrates acoustic adversarial attacks that can disrupt CV systems, highlighting a new physical, sound-based attack surface.
-
Abstracting Cross-Domain Action Sequences into Interpretable WorkflowsAbstracting cross-domain action sequences into interpretable workflowsTime-stamped interaction logs objectively record digital app usage, but their granularity and noise obscure meaningful insights into work. The paper proposes abstracting cross-domain action sequences into interpretable workflows.
-
Graph Structured Combinatorial Semi-Bandit with Nonlinear Reward Associations through Separable SignalsGraph-structured combinatorial semi-bandits with nonlinear rewardsThe paper addresses combinatorial semi-bandit identification of optimal structures under nonlinear reward associations. It leverages separable signals to reduce sampling and computational cost.
-
Which Directions Matter? Sparse Design for Affine Robust OptimizationSparse design identifies which directions matter in robust optimizationThe work studies which uncertainty directions a model must cover in affine robust optimization defined by a finite dictionary and budget. It proposes a sparse design selecting the directions that matter.
-
Listening with Attention: Entropy-Guided Explainability for Transformer-Based Audio ModelsEntropy-guided explainability for Transformer-based audio modelsTransformer-based ASR models like Whisper are accurate but hard to interpret, and existing XAI methods lack faithfulness and temporal precision. The paper proposes an entropy-guided explainability approach for such audio models.
-
When Good Verifiers Go Bad: Self-Improving VLMs Can Regress on New TasksWhen good verifiers go bad: self-improving VLMs can regress on new tasksVerifier-driven self-DPO, where a frozen verifier scores candidates to form preference pairs, is a common recipe for self-improving vision-language models. The paper shows that under this setup VLMs can regress on new tasks when the verifier misbehaves.
-
Characterizing Cultural Localization in AI-Generated StoriesCharacterizing cultural localization in AI-generated storiesThe paper assesses how well AI generates culturally localized stories. It characterizes the ways cultural localization appears in generated narratives.
-
Neither Parallel Nor Sequential: How DiffusionGemma Actually Commits TokensHow DiffusionGemma actually commits tokens, neither parallel nor sequentialDiffusion language models are marketed as parallel decoders, yet their real token-commit order is rarely measured. Instrumenting DiffusionGemma, the paper shows it is neither purely parallel nor sequential.
-
Moonlight in Latent Space: Chirality and Structural Correspondence Between Beethoven's Op. 27 No. 2 and Machine Learning MechanismsStructural correspondence between Beethoven's Moonlight Sonata and MLThrough computational analysis, this paper argues that the three movements of Beethoven's Moonlight Sonata (Op. 27 No. 2) instantiate three distinct machine learning architectures by structural correspondence rather than mere analogy.
-
Expert-Driven Survival Machines: Improving Stratification and Interpretability in Multiple Clinical CohortsExpert-driven survival machines for stratification across clinical cohortsSurvival prediction is central for healthcare providers and clinical researchers. The paper introduces expert-driven survival machines that improve risk stratification and interpretability across multiple clinical cohorts.
-
A Comparative Study of Deep Learning Architectures for Multi-Horizon Behavioural Forecasting for Mobile HealthComparing deep learning for multi-horizon behavioural forecasting in mHealthWearables and smartphones generate rich behavioural time series for proactive health interventions, yet systematic comparisons of forecasting architectures are lacking. The paper benchmarks deep learning architectures for multi-horizon behavioural forecasting in mobile health.
-
LoSoNA: A Benchmark for Local Social Norm Adaptation in Group ConversationsLoSoNA benchmarks local social norm adaptation in group chatsOnline group chats have rarely-stated local conversational norms. LoSoNA is a benchmark measuring whether LLM-based agents can recognize and adapt to these local social norms.
-
AudioDER: A Deduplication-Enhanced Reasoning Dataset for Post-Training Large Audio-Language ModelsAudioDER: a deduplication-enhanced reasoning dataset for audio LLMsLarge audio-language models perform well on audio understanding yet still struggle with reasoning. The paper introduces AudioDER, a deduplication-enhanced reasoning dataset for post-training large audio-language models.
-
When Errors Become Narratives: A Longitudinal Taxonomy of Silent Failures in a Production LLM Agent RuntimeA longitudinal taxonomy of silent failures in a production LLM agent runtimeLLM agents increasingly run as long-lived autonomous runtimes that schedule jobs, call tools, maintain memory, and push results to humans. This longitudinal study of one persistent system presents a taxonomy of its silent failures.
-
Persuasion Index: A Theory-Guided Framework for Persuasion AnalysisPersuasion Index: a theory-guided framework for persuasion analysisIdentifying persuasive rhetorical cues matters for detecting manipulation, AI safety, and health communication. The paper proposes Persuasion Index, a theory-guided framework for persuasion analysis.
-
StreamMemBench: Streaming Evaluation of Agent Memory for Future-Oriented AssistanceStreamMemBench: streaming evaluation of agent memory for assistanceA core role of personal-agent memory is turning stored information and prior interactions into future-oriented assistance. StreamMemBench provides a streaming evaluation of agent memory using cues from what the agent observes and how users interact.
-
Regional Climate Model Emulation with Diffusion Approaches: What is the Added Value of Generative Machine Learning?Added value of diffusion-based generative ML for climate model emulationEmulators cheaply reproduce regional climate models' downscaling, linking global-model predictors to high-resolution fields. The paper assesses the added value of diffusion-based generative machine learning for regional climate model emulation.
-
CANN-EUCLID: unsupervised constitutive artificial neural network model discovery from full-field dataCANN-EUCLID: unsupervised constitutive model discovery from full-field dataCANNs offer interpretable material model discovery but have relied on stress-supervised data. CANN-EUCLID enables unsupervised constitutive model discovery directly from full-field measurement data.
-
ORCA: A Platform for Open-Source Dexterity ResearchORCA: an open-source platform for dexterity researchTwo-finger grippers dominate manipulation research but are limited by their form factor. ORCA is an open-source platform to support research on more dexterous robotic manipulation.