Multimodal A
Showing 91–101 of 101
-
CottonLeafVision: An Explainable and Robust Deep Learning Framework for Cotton Leaf Disease ClassificationCottonLeafVision: explainable, robust deep learning for cotton leaf diseaseCotton underpins the textile industry, so accurate detection of cotton leaf disease is crucial for economic stability. The paper proposes CottonLeafVision, an explainable and robust deep learning framework for classifying cotton leaf diseases.
-
HumP-KD: A Hybrid Uncertainty-Aware Multi-Stage Progressive Knowledge Distillation Framework for Efficient Fire ClassificationHumP-KD: uncertainty-aware distillation for efficient fire classificationHumP-KD is a hybrid, uncertainty-aware multi-stage progressive knowledge distillation framework for fire classification. It targets models that are simultaneously accurate and efficient for real-time use.
-
Giving AI a Headache: Acoustic Adversarial Attacks to Computer Vision ApplicationsAcoustic adversarial attacks that disrupt computer vision systemsAs AI automates real-world computer vision applications such as autonomous vehicle control, this paper demonstrates acoustic adversarial attacks that can disrupt CV systems, highlighting a new physical, sound-based attack surface.
-
Regulating the Machine Contributor: Governance and Policy Alignment in Open SourceGovernance and policy alignment for AI contributors in open sourceAI-assisted development has moved from autocomplete to agents that plan changes, edit files, and submit pull requests with limited supervision, while open source evolves through human processes. The paper examines governance and policy alignment for regulating such machine contributors.
-
AudioDER: A Deduplication-Enhanced Reasoning Dataset for Post-Training Large Audio-Language ModelsAudioDER: a deduplication-enhanced reasoning dataset for audio LLMsLarge audio-language models perform well on audio understanding yet still struggle with reasoning. The paper introduces AudioDER, a deduplication-enhanced reasoning dataset for post-training large audio-language models.
-
Sensitivity Shaping for Latent ModelingSensitivity shaping for detecting OOD transitions in dynamics modelsGenerative dynamics models enable planning in challenging robotic systems, but safe deployment requires reliably detecting policy-induced out-of-distribution transitions. The paper proposes sensitivity shaping for latent modeling to improve such OOD detection.
-
NEST3D: A High-Resolution Multimodal Dataset of Sociable Weaver Tree NestsNEST3D: a high-resolution multimodal dataset of weaver bird nestsSociable weaver nests are complex ecological structures providing thermoregulatory microhabitats. NEST3D is a high-resolution multimodal dataset of these tree nests to support ecological and structural study.
-
Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA Accelerated InfrastructureNVIDIA details deploying MiniMax M3 for long-context agentic workflowsNVIDIA's developer blog explains how to deploy MiniMax M3 on NVIDIA accelerated infrastructure for long-context reasoning and agentic workflows, addressing fragmented enterprise AI pipelines spanning text, vision, and other modalities.
-
Dense Coordinate-List Fine-Tuning Induces a Controllable Interference Surface in Vision-Language ModelsDense coordinate-list fine-tuning induces a controllable interference surfaceFine-tuning vision-language models to emit dense coordinate lists improves grounding but alters how they serialize, repeat, and terminate structured output. The paper shows this induces a controllable interference surface in VLMs.
-
From Chatbot to Digital Colleague: The Paradigm Shift Toward Persistent Autonomous AIFrom chatbot to digital colleague: the shift to persistent autonomous AILLMs are transforming from conversational generators into integrated systems capable of reasoning, action, memory, and self-improvement. The paper conceptualizes this as a paradigm shift from chatbot to digital colleague — persistent autonomous AI.
-
A Fixed-Point Neural Operator for Size- and Functional-Transferable Hamiltonian PredictionA fixed-point neural operator for transferable Hamiltonian predictionPredicting the Kohn-Sham Hamiltonian with ML can accelerate density functional theory while retaining orbitals and energy levels. The paper proposes a fixed-point neural operator for size- and functional-transferable Hamiltonian prediction.