Multimodal A

Showing 91–101 of 101
  • arXiv cs.AI (Artificial Intelligence) · EN Multimodal extract
    CottonLeafVision: An Explainable and Robust Deep Learning Framework for Cotton Leaf Disease Classification
    CottonLeafVision: explainable, robust deep learning for cotton leaf disease
    Deep Learning Neural Network Reinforcement Learning
    Cotton underpins the textile industry, so accurate detection of cotton leaf disease is crucial for economic stability. The paper proposes CottonLeafVision, an explainable and robust deep learning framework for classifying cotton leaf diseases.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Inference & Efficiency extract
    HumP-KD: A Hybrid Uncertainty-Aware Multi-Stage Progressive Knowledge Distillation Framework for Efficient Fire Classification
    HumP-KD: uncertainty-aware distillation for efficient fire classification
    Machine Learning Meta Neural Network Transformer
    HumP-KD is a hybrid, uncertainty-aware multi-stage progressive knowledge distillation framework for fire classification. It targets models that are simultaneously accurate and efficient for real-time use.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    Giving AI a Headache: Acoustic Adversarial Attacks to Computer Vision Applications
    Acoustic adversarial attacks that disrupt computer vision systems
    Computer Vision Deep Learning Reinforcement Learning
    As AI automates real-world computer vision applications such as autonomous vehicle control, this paper demonstrates acoustic adversarial attacks that can disrupt CV systems, highlighting a new physical, sound-based attack surface.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Policy & Regulation extract
    Regulating the Machine Contributor: Governance and Policy Alignment in Open Source
    Governance and policy alignment for AI contributors in open source
    AI Agents Retrieval-Augmented Generation (RAG) Software Engineering
    AI-assisted development has moved from autocomplete to agents that plan changes, edit files, and submit pull requests with limited supervision, while open source evolves through human processes. The paper examines governance and policy alignment for regulating such machine contributors.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Multimodal extract
    AudioDER: A Deduplication-Enhanced Reasoning Dataset for Post-Training Large Audio-Language Models
    AudioDER: a deduplication-enhanced reasoning dataset for audio LLMs
    Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning Software Engineering Speech Processing
    Large audio-language models perform well on audio understanding yet still struggle with reasoning. The paper introduces AudioDER, a deduplication-enhanced reasoning dataset for post-training large audio-language models.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN New Model Releases extract
    Sensitivity Shaping for Latent Modeling
    Sensitivity shaping for detecting OOD transitions in dynamics models
    Neural Network
    Generative dynamics models enable planning in challenging robotic systems, but safe deployment requires reliably detecting policy-induced out-of-distribution transitions. The paper proposes sensitivity shaping for latent modeling to improve such OOD detection.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.LG (Machine Learning) · EN Policy & Regulation extract
    NEST3D: A High-Resolution Multimodal Dataset of Sociable Weaver Tree Nests
    NEST3D: a high-resolution multimodal dataset of weaver bird nests
    Algorithms & Theory Deep Learning Neural Network Reinforcement Learning Transformer
    Sociable weaver nests are complex ecological structures providing thermoregulatory microhabitats. NEST3D is a high-resolution multimodal dataset of these tree nests to support ecological and structural study.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • NVIDIA Developer Blog · EN Industry Adoption extract
    Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA Accelerated Infrastructure
    NVIDIA details deploying MiniMax M3 for long-context agentic workflows
    Generative AI NVIDIA Retrieval-Augmented Generation (RAG)
    NVIDIA's developer blog explains how to deploy MiniMax M3 on NVIDIA accelerated infrastructure for long-context reasoning and agentic workflows, addressing fragmented enterprise AI pipelines spanning text, vision, and other modalities.
    Read original (NVIDIA Developer Blog) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    Dense Coordinate-List Fine-Tuning Induces a Controllable Interference Surface in Vision-Language Models
    Dense coordinate-list fine-tuning induces a controllable interference surface
    Computer Vision Fine-tuning Reinforcement Learning from Human Feedback (RLHF) Software Engineering
    Fine-tuning vision-language models to emit dense coordinate lists improves grounding but alters how they serialize, repeat, and terminate structured output. The paper shows this induces a controllable interference surface in VLMs.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    From Chatbot to Digital Colleague: The Paradigm Shift Toward Persistent Autonomous AI
    From chatbot to digital colleague: the shift to persistent autonomous AI
    AI Agents Inference Neural Network Retrieval-Augmented Generation (RAG) Software Engineering
    LLMs are transforming from conversational generators into integrated systems capable of reasoning, action, memory, and self-improvement. The paper conceptualizes this as a paradigm shift from chatbot to digital colleague — persistent autonomous AI.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    A Fixed-Point Neural Operator for Size- and Functional-Transferable Hamiltonian Prediction
    A fixed-point neural operator for transferable Hamiltonian prediction
    Fine-tuning Inference Machine Learning Neural Network
    Predicting the Kohn-Sham Hamiltonian with ML can accelerate density functional theory while retaining orbitals and energy levels. The paper proposes a fixed-point neural operator for size- and functional-transferable Hamiltonian prediction.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗