Developer Tools B

Showing 241–270 of 305
  • arXiv cs.CL (Computation and Language) · EN Developer Tools extract
    Understanding Scam Trends and Rail Paths from Reddit Self-Disclosure Narratives
    Tracking scam trends and rail paths from Reddit self-disclosures
    Neural Network Retrieval-Augmented Generation (RAG) Reinforcement Learning
    An arXiv paper studies online scams as multi-stage lifecycles of temporally ordered rails and events, tracking scam trends and rail paths from Reddit self-disclosure narratives. It introduces an annotated dataset to address the lack of open resources covering scam-type relations. Neutral, abstract-based summary.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Federated Medical Image Segmentation under Real-World Label Noise: A Benchmark Suite for Noisy Label Learning Method Selection
    Benchmark suite for federated noisy-label medical image segmentation
    Meta Reinforcement Learning
    Federated learning enables collaborative medical image segmentation without centralizing sensitive data, but real-world deployment faces label imperfections like contour disagreement and confused labels. The authors argue existing federated noisy-label learning relies on synthetic noise and simplified settings, and introduce a benchmark suite combining diverse real-world noisy datasets, deployment-relevant client-noise scenarios, and label-noise-targeted evaluation to guide method selection.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    Understanding the Behaviors of Environment-aware Information Retrieval
    Paper: RL adapts LLM query formulation per retriever
    Deep Learning Embeddings Retrieval-Augmented Generation (RAG) Reinforcement Learning
    An arXiv paper presents a systematic analysis of how LLMs can learn, via reinforcement learning, to adapt their query formulation strategies to different retrievers in retrieval-augmented generation. Summarized neutrally from the abstract.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN Multimodal extract
    A Perception vs. Distortion Perspective on Score-Based Generative Channel Estimation
    Score-based channel estimation analyzed via perception-distortion tradeoff
    Computer Vision Neural Network
    Score-based models are increasingly used for wireless physical-layer tasks, but it is unclear when they beat discriminative learning. Using channel estimation as a case study, the paper interprets score-based estimation through the perception-distortion tradeoff, identifying when score matching excels and quantifying the excess risk of distortion-minimizing approaches.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.CL (Computation and Language) · EN Multimodal extract
    Scaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a Lightweight Verifier
    Paper: semi-supervised LLM reasoning from minimal labels
    Neural Network Software Engineering
    An arXiv paper presents a semi-supervised framework that scales LLM reasoning from minimal supervision, using a lightweight reasoning-correctness classifier to turn verification into a data-creation mechanism. Summarized neutrally from the abstract.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Adaptive and Explicit safe: Triggering Latent Safety Awareness in Large Reasoning Models
    Triggering latent safety awareness to harden large reasoning models
    DeepSeek Fine-tuning Llama Retrieval-Augmented Generation (RAG) Reinforcement Learning from Human Feedback (RLHF)
    The paper observes that large reasoning models can recognize safety risks when re-presented with the original query alongside their own reasoning trace—a property it calls latent safety awareness. To exploit this without heavy manual annotation, it uses supervised fine-tuning to induce safe tags that trigger safety analysis.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Developer Tools extract
    LLM-based Visual Code Completion for Aerospace Geometric Design
    Paper: LLM visual-programming copilot for aerospace design
    GPT Inference Neural Network
    An arXiv paper presents an LLM-based visual programming copilot for aerospace geometric design tasks, using a visual-programming variant of the ReAct methodology. Summarized neutrally from the abstract; claims are the authors' and not independently verified.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • Lobste.rs (AI tagged) · EN Developer Tools extract
    Building llm-driven “ai” still requires domain knowledge
    Building LLM-driven tools still hinges on capturing domain knowledge
    Software Engineering
    A developer shares lessons from building an LLM-driven tool that answers user questions via a customer API. Capturing and writing down domain knowledge is much of the work, easier than earlier AI generations since it need not be rigidly structured, yet exactly where prior efforts foundered.
    Read original (Lobste.rs (AI tagged)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    The Art of Mixology: Mixup-based Obfuscation for Privacy-Preserving Split Learning in Large Language Models
    MIXGUARD: mixup-based privacy for LLM split learning
    Fine-tuning
    The paper presents MIXGUARD, a mixup-based privacy-preserving split-learning framework for LLMs combining token- and representation-level obfuscation with adaptive gradient perturbation to balance utility, privacy, and efficiency. Claims reflect the abstract.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.LG (Machine Learning) · EN Training & Fine-tuning extract
    Decision-Weighted Flow Matching for Contextual Stochastic Optimization
    DW-FM reweights flow matching toward decision-sensitive regions
    Computer Vision Neural Network Reinforcement Learning from Human Feedback (RLHF)
    Standard generative scenario models optimize uniform distributional fit rather than downstream decision quality. Decision-Weighted Flow Matching (DW-FM) reweights the velocity-regression objective using decision-sensitive endpoint information, linking downstream regret to pathwise velocity mismatch and providing regret-aligned objectives with guarantees.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.LG (Machine Learning) · EN Multimodal extract
    We Need Explanation Cards to Connect Explanation Algorithms to the Real World
    'Explanation Cards' add robustness and validity context to explanations
    Algorithms & Theory Neural Network Reinforcement Learning
    Algorithmic explanations often need expert knowledge to read and can be uninformative about complex decision functions. The authors propose Explanation Cards that augment explanations with robustness and validity information plus clear interpretation instructions, making otherwise uninformative explanations practically useful while flagging when they are not.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Training & Fine-tuning extract
    Skill-to-LoRA: From Using Skills to Learning Behaviors for Token-Efficient LLM Agents
    S2L replaces runtime SKILL.md text with skill-specific LoRA adapters
    AI Agents Deep Learning Software Engineering
    The paper proposes Skill-to-LoRA (S2L), a behavior-centric representation that replaces runtime skill text—commonly distributed as SKILL.md files—with skill-specific LoRA adapters. Rather than compressing the document, S2L models the behavioral change the skill text induces, aiming at more token-efficient LLM agents.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Automated jailbreak attack targeting multiple defense strategies
    UNIATTACK: a defense-oriented framework for automated black-box LLM jailbreaks
    Retrieval-Augmented Generation (RAG) Speech Processing
    The paper presents UNIATTACK, an adversarial testing framework that systematically builds effective black-box attack prompts on LLMs from a defense-oriented perspective. Unlike static templates or model-specific tuning, it extracts minimal but high-impact features from diverse existing attacks and optimizes them.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents
    MyPCBench: benchmarking personal computer-use agents
    AI Agents Claude Neural Network Reinforcement Learning
    MyPCBench evaluates computer-use agents as personal assistants on a Linux desktop with 17 simulated web apps and 184 persona-seeded tasks, benchmarking six closed and open-weight models. Reported scores reflect the paper and are not independently verified.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Developer Tools extract
    Revealing Artifacts via Noise Amplification: A Novel Perspective for AI-Generated Video Detection
    A noise-amplification perspective for detecting AI-generated videos
    Reinforcement Learning
    The paper proposes detecting AI-generated videos, especially those from text-to-video models, by amplifying noise to reveal subtle artifacts that distinguish them from authentic footage. It notes that prior work largely targeted GAN-generated samples and frames text-to-video detection as still underexplored.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Developer Tools extract
    From Affect Prediction to Affect Forecasting: Evidence for Distinct Information Sources in Longitudinal Text
    Distinguishing affect prediction from forecasting in longitudinal text
    The paper separates current-affect estimation from future affective-change forecasting in longitudinal self-report text, proposing the TSAP/E-TSAP and ACF-Hybrid frameworks. It reports textual representations differ in usefulness across the two tasks; figures reflect the abstract.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Inference & Efficiency extract
    Progressive Knowledge-Guided Large Language Model Framework for Bearing Fault Diagnosis
    Physics-guided multi-scale framework for bearing fault diagnosis
    Inference Reinforcement Learning
    The paper proposes a progressive, physics-guided multi-scale vibration-processing pipeline for bearing fault diagnosis, using a kinematics-derived descriptor for real-time screening and fault-adaptive segmentation. Reported figures reflect the abstract and are not independently verified.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Developer Tools extract
    Sycophancy as Material Failure under Pushback Loading: A Multi-Axis Characterization Across Three Loading Cases and up to Seventeen Material Charges
    Paper frames LLM sycophancy as material failure (title only)
    GPT Neural Network Retrieval-Augmented Generation (RAG)
    Note: the abstract was unavailable, so this is summarized neutrally from the title alone. The paper appears to characterize LLM sycophancy as a 'material failure under pushback loading,' across three loading cases and up to seventeen 'material charges.' Specific methods and findings cannot be confirmed from the title.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Agents & Tool Use extract
    SING: Synthetic Intention Graph for Scalable Active Tool Discovery in LLM Agents
    SING: synthetic intention graph for scalable active tool discovery
    AI Agents Neural Network Reinforcement Learning
    This arXiv paper addresses tool selection for LLM agents whose harnesses connect to hundreds or thousands of APIs, where exhaustive tool-schema injection is costly and imposes a closed-world assumption. Noting that one-shot retrieval often fails to align isolated tool descriptions with the agent's true intent—especially in long-horizon tasks—the authors propose SING, a Synthetic Intention Graph for scalable, active tool discovery.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Developer Tools extract
    Can LLM Agents Infer World Models? Evidence from Agentic Automata Learning
    Can LLM agents infer world models? Evidence from automata learning
    AI Agents Algorithms & Theory Deep Learning Neural Network Reinforcement Learning
    This arXiv paper proposes agentic automata learning to assess how well tool-calling LLM agents can uncover hidden environments through interaction. An agent must infer a hidden deterministic finite automaton (DFA) via membership and equivalence queries, yielding a scalable testbed with controlled task complexity. Evaluating state-of-the-art LLMs, the authors find performance drops sharply as DFA size grows, with reasoning models markedly stronger.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Developer Tools extract
    Fast When, Careful Who: Dual-Process Multiparty Turn-Taking with Diffusion Augmentation
    Audio-only two-stage pipeline for multiparty turn-taking prediction
    Reinforcement Learning
    This arXiv paper studies multiparty turn-taking—essential for spoken dialogue systems but hard amid overlap and rapid speaker changes—on the VoxConverse dataset. The authors propose an audio-only two-stage pipeline that separates when to trigger a turn boundary from whether the floor actually transfers: a fast trigger proposes candidate end-of-turn times, and a lightweight verifier runs only at those times to decide Hold or Shift.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN New Model Releases extract
    The BD-LSC Dataset: Facilitating the Benchmarking of Models for Lexical Semantic Change Detection in Slang and Standard Usage
    BD-LSC: a new benchmark dataset for lexical semantic change detection
    Embeddings GPT Machine Learning Neural Network Transformer
    This arXiv paper introduces two complementary benchmark datasets for computational lexical semantic change (LSC) detection. The Bi-Directional Lexical Semantic Change (BD-LSC) dataset captures sense gain, loss, and stability across three time periods, targeting cases—especially slang versus standard usage—where words simultaneously gain and lose senses, which existing benchmarks struggle to capture.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Training & Fine-tuning extract
    SkillWiki: A Living Knowledge Infrastructure for Agent Skills
    SkillWiki: a living knowledge infrastructure for agent skills
    While knowledge is managed via Wikipedia and software via GitHub, agent skills still lack infrastructure for large-scale production, governance, and evolution. SkillWiki is a living knowledge infrastructure turning heterogeneous knowledge into reusable skill assets linked to their originating evidence. It presents the full skill lifecycle, from knowledge ingestion to provenance-aware exploration, governance, and execution-driven evolution, with a live demo and source code available.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • Simon Willison's Weblog · EN Policy & Regulation extract
    Why AI hasn’t replaced software engineers, and won’t
    Essay argues AI hasn't replaced software engineers, and won't
    Software Engineering
    Arvind Narayanan and Sayash Kapoor examine AI-driven job loss through software engineering, a field unusually exposed to AI disruption. They argue the evidence rejects the narrative that AI will trigger mass layoffs once it crosses a capability threshold, and that more regulated, less exposed professions are likely even more cushioned.
    Read original (Simon Willison's Weblog) ↗
  • Lobste.rs (AI tagged) · EN Developer Tools extract
    AI, Gods and Selves: Incredibly Effective Illusions
    Talk frames AI, gods, and the self as strikingly effective illusions
    A shared video essay frames AI, gods, and the self as 'incredibly effective illusions' — constructs that shape human experience despite lacking real solidity, drawing a parallel between AI and long-standing notions of deity and identity.
    Read original (Lobste.rs (AI tagged)) ↗
  • Simon Willison's Weblog · EN Developer Tools extract
    Publishing WASM wheels to PyPI for use with Pyodide
    Publishing WASM wheels to PyPI for use with Pyodide
    Machine Learning Neural Network
    The Pyodide 314.0 release lets developers publish Python packages built for Pyodide, or any runtime compatible with the PyEmscripten platform defined in PEP 783, as WASM wheels on PyPI, a long-awaited step for browser-based Python.
    Read original (Simon Willison's Weblog) ↗
  • Simon Willison's Weblog · EN New Model Releases extract
    luau-wasm 0.1a0
    luau-wasm 0.1a0 released, bringing Luau to WebAssembly
    An early 0.1a0 release of luau-wasm packages Luau, Roblox's typed Lua dialect, for WebAssembly, built using the newly enabled approach for publishing WASM wheels to PyPI for Pyodide.
    Read original (Simon Willison's Weblog) ↗
  • Simon Willison's Weblog · EN Developer Tools extract
    Mapping SQLite result columns back to their source `table.column`
    Mapping SQLite result columns back to their source table.column
    Claude Machine Learning Neural Network
    A technical note exploring how to map columns in arbitrary SQLite query results back to their originating table.column, so that Datasette could render queries with richer, source-aware metadata.
    Read original (Simon Willison's Weblog) ↗
  • Hacker News (Front Page) · EN Developer Tools extract
    Police officer investigated for using AI to 'create evidence' in multiple cases
    Police officer investigated for using AI to 'create evidence'
    A police officer is under investigation for allegedly using AI to 'create evidence' in multiple cases, raising concerns about the misuse of generative AI within the justice system.
    Read original (Hacker News (Front Page)) ↗
  • IEEE Spectrum (AI section) · EN Developer Tools extract
    Visual Language Models Train Robots to Read Human Emotions
    Visual language models teach robots to read human emotions
    Gemini Robotics
    An IEEE Journal Watch piece on using visual language models to let robots interpret human emotions. As robots gain dexterity and increasingly work alongside people, reading facial and emotional cues could make human-robot collaboration safer and smoother.
    Read original (IEEE Spectrum (AI section)) ↗