Developer Tools B
Showing 241–270 of 305
-
Understanding Scam Trends and Rail Paths from Reddit Self-Disclosure NarrativesTracking scam trends and rail paths from Reddit self-disclosuresAn arXiv paper studies online scams as multi-stage lifecycles of temporally ordered rails and events, tracking scam trends and rail paths from Reddit self-disclosure narratives. It introduces an annotated dataset to address the lack of open resources covering scam-type relations. Neutral, abstract-based summary.
-
Federated Medical Image Segmentation under Real-World Label Noise: A Benchmark Suite for Noisy Label Learning Method SelectionBenchmark suite for federated noisy-label medical image segmentationFederated learning enables collaborative medical image segmentation without centralizing sensitive data, but real-world deployment faces label imperfections like contour disagreement and confused labels. The authors argue existing federated noisy-label learning relies on synthetic noise and simplified settings, and introduce a benchmark suite combining diverse real-world noisy datasets, deployment-relevant client-noise scenarios, and label-noise-targeted evaluation to guide method selection.
-
Understanding the Behaviors of Environment-aware Information RetrievalPaper: RL adapts LLM query formulation per retrieverAn arXiv paper presents a systematic analysis of how LLMs can learn, via reinforcement learning, to adapt their query formulation strategies to different retrievers in retrieval-augmented generation. Summarized neutrally from the abstract.
-
A Perception vs. Distortion Perspective on Score-Based Generative Channel EstimationScore-based channel estimation analyzed via perception-distortion tradeoffScore-based models are increasingly used for wireless physical-layer tasks, but it is unclear when they beat discriminative learning. Using channel estimation as a case study, the paper interprets score-based estimation through the perception-distortion tradeoff, identifying when score matching excels and quantifying the excess risk of distortion-minimizing approaches.
-
Scaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a Lightweight VerifierPaper: semi-supervised LLM reasoning from minimal labelsAn arXiv paper presents a semi-supervised framework that scales LLM reasoning from minimal supervision, using a lightweight reasoning-correctness classifier to turn verification into a data-creation mechanism. Summarized neutrally from the abstract.
-
Adaptive and Explicit safe: Triggering Latent Safety Awareness in Large Reasoning ModelsTriggering latent safety awareness to harden large reasoning modelsThe paper observes that large reasoning models can recognize safety risks when re-presented with the original query alongside their own reasoning trace—a property it calls latent safety awareness. To exploit this without heavy manual annotation, it uses supervised fine-tuning to induce safe tags that trigger safety analysis.
-
LLM-based Visual Code Completion for Aerospace Geometric DesignPaper: LLM visual-programming copilot for aerospace designAn arXiv paper presents an LLM-based visual programming copilot for aerospace geometric design tasks, using a visual-programming variant of the ReAct methodology. Summarized neutrally from the abstract; claims are the authors' and not independently verified.
-
Building llm-driven “ai” still requires domain knowledgeBuilding LLM-driven tools still hinges on capturing domain knowledgeA developer shares lessons from building an LLM-driven tool that answers user questions via a customer API. Capturing and writing down domain knowledge is much of the work, easier than earlier AI generations since it need not be rigidly structured, yet exactly where prior efforts foundered.
-
The Art of Mixology: Mixup-based Obfuscation for Privacy-Preserving Split Learning in Large Language ModelsMIXGUARD: mixup-based privacy for LLM split learningThe paper presents MIXGUARD, a mixup-based privacy-preserving split-learning framework for LLMs combining token- and representation-level obfuscation with adaptive gradient perturbation to balance utility, privacy, and efficiency. Claims reflect the abstract.
-
Decision-Weighted Flow Matching for Contextual Stochastic OptimizationDW-FM reweights flow matching toward decision-sensitive regionsStandard generative scenario models optimize uniform distributional fit rather than downstream decision quality. Decision-Weighted Flow Matching (DW-FM) reweights the velocity-regression objective using decision-sensitive endpoint information, linking downstream regret to pathwise velocity mismatch and providing regret-aligned objectives with guarantees.
-
We Need Explanation Cards to Connect Explanation Algorithms to the Real World'Explanation Cards' add robustness and validity context to explanationsAlgorithmic explanations often need expert knowledge to read and can be uninformative about complex decision functions. The authors propose Explanation Cards that augment explanations with robustness and validity information plus clear interpretation instructions, making otherwise uninformative explanations practically useful while flagging when they are not.
-
Skill-to-LoRA: From Using Skills to Learning Behaviors for Token-Efficient LLM AgentsS2L replaces runtime SKILL.md text with skill-specific LoRA adaptersThe paper proposes Skill-to-LoRA (S2L), a behavior-centric representation that replaces runtime skill text—commonly distributed as SKILL.md files—with skill-specific LoRA adapters. Rather than compressing the document, S2L models the behavioral change the skill text induces, aiming at more token-efficient LLM agents.
-
Automated jailbreak attack targeting multiple defense strategiesUNIATTACK: a defense-oriented framework for automated black-box LLM jailbreaksThe paper presents UNIATTACK, an adversarial testing framework that systematically builds effective black-box attack prompts on LLMs from a defense-oriented perspective. Unlike static templates or model-specific tuning, it extracts minimal but high-impact features from diverse existing attacks and optimizes them.
-
MyPCBench: A Benchmark for Personally Intelligent Computer-Use AgentsMyPCBench: benchmarking personal computer-use agentsMyPCBench evaluates computer-use agents as personal assistants on a Linux desktop with 17 simulated web apps and 184 persona-seeded tasks, benchmarking six closed and open-weight models. Reported scores reflect the paper and are not independently verified.
-
Revealing Artifacts via Noise Amplification: A Novel Perspective for AI-Generated Video DetectionA noise-amplification perspective for detecting AI-generated videosThe paper proposes detecting AI-generated videos, especially those from text-to-video models, by amplifying noise to reveal subtle artifacts that distinguish them from authentic footage. It notes that prior work largely targeted GAN-generated samples and frames text-to-video detection as still underexplored.
-
From Affect Prediction to Affect Forecasting: Evidence for Distinct Information Sources in Longitudinal TextDistinguishing affect prediction from forecasting in longitudinal textThe paper separates current-affect estimation from future affective-change forecasting in longitudinal self-report text, proposing the TSAP/E-TSAP and ACF-Hybrid frameworks. It reports textual representations differ in usefulness across the two tasks; figures reflect the abstract.
-
Progressive Knowledge-Guided Large Language Model Framework for Bearing Fault DiagnosisPhysics-guided multi-scale framework for bearing fault diagnosisThe paper proposes a progressive, physics-guided multi-scale vibration-processing pipeline for bearing fault diagnosis, using a kinematics-derived descriptor for real-time screening and fault-adaptive segmentation. Reported figures reflect the abstract and are not independently verified.
-
Sycophancy as Material Failure under Pushback Loading: A Multi-Axis Characterization Across Three Loading Cases and up to Seventeen Material ChargesPaper frames LLM sycophancy as material failure (title only)Note: the abstract was unavailable, so this is summarized neutrally from the title alone. The paper appears to characterize LLM sycophancy as a 'material failure under pushback loading,' across three loading cases and up to seventeen 'material charges.' Specific methods and findings cannot be confirmed from the title.
-
SING: Synthetic Intention Graph for Scalable Active Tool Discovery in LLM AgentsSING: synthetic intention graph for scalable active tool discoveryThis arXiv paper addresses tool selection for LLM agents whose harnesses connect to hundreds or thousands of APIs, where exhaustive tool-schema injection is costly and imposes a closed-world assumption. Noting that one-shot retrieval often fails to align isolated tool descriptions with the agent's true intent—especially in long-horizon tasks—the authors propose SING, a Synthetic Intention Graph for scalable, active tool discovery.
-
Can LLM Agents Infer World Models? Evidence from Agentic Automata LearningCan LLM agents infer world models? Evidence from automata learningThis arXiv paper proposes agentic automata learning to assess how well tool-calling LLM agents can uncover hidden environments through interaction. An agent must infer a hidden deterministic finite automaton (DFA) via membership and equivalence queries, yielding a scalable testbed with controlled task complexity. Evaluating state-of-the-art LLMs, the authors find performance drops sharply as DFA size grows, with reasoning models markedly stronger.
-
Fast When, Careful Who: Dual-Process Multiparty Turn-Taking with Diffusion AugmentationAudio-only two-stage pipeline for multiparty turn-taking predictionThis arXiv paper studies multiparty turn-taking—essential for spoken dialogue systems but hard amid overlap and rapid speaker changes—on the VoxConverse dataset. The authors propose an audio-only two-stage pipeline that separates when to trigger a turn boundary from whether the floor actually transfers: a fast trigger proposes candidate end-of-turn times, and a lightweight verifier runs only at those times to decide Hold or Shift.
-
The BD-LSC Dataset: Facilitating the Benchmarking of Models for Lexical Semantic Change Detection in Slang and Standard UsageBD-LSC: a new benchmark dataset for lexical semantic change detectionThis arXiv paper introduces two complementary benchmark datasets for computational lexical semantic change (LSC) detection. The Bi-Directional Lexical Semantic Change (BD-LSC) dataset captures sense gain, loss, and stability across three time periods, targeting cases—especially slang versus standard usage—where words simultaneously gain and lose senses, which existing benchmarks struggle to capture.
-
SkillWiki: A Living Knowledge Infrastructure for Agent SkillsSkillWiki: a living knowledge infrastructure for agent skillsWhile knowledge is managed via Wikipedia and software via GitHub, agent skills still lack infrastructure for large-scale production, governance, and evolution. SkillWiki is a living knowledge infrastructure turning heterogeneous knowledge into reusable skill assets linked to their originating evidence. It presents the full skill lifecycle, from knowledge ingestion to provenance-aware exploration, governance, and execution-driven evolution, with a live demo and source code available.
-
Why AI hasn’t replaced software engineers, and won’tEssay argues AI hasn't replaced software engineers, and won'tArvind Narayanan and Sayash Kapoor examine AI-driven job loss through software engineering, a field unusually exposed to AI disruption. They argue the evidence rejects the narrative that AI will trigger mass layoffs once it crosses a capability threshold, and that more regulated, less exposed professions are likely even more cushioned.
-
AI, Gods and Selves: Incredibly Effective IllusionsTalk frames AI, gods, and the self as strikingly effective illusionsA shared video essay frames AI, gods, and the self as 'incredibly effective illusions' — constructs that shape human experience despite lacking real solidity, drawing a parallel between AI and long-standing notions of deity and identity.
-
Publishing WASM wheels to PyPI for use with PyodidePublishing WASM wheels to PyPI for use with PyodideThe Pyodide 314.0 release lets developers publish Python packages built for Pyodide, or any runtime compatible with the PyEmscripten platform defined in PEP 783, as WASM wheels on PyPI, a long-awaited step for browser-based Python.
-
luau-wasm 0.1a0luau-wasm 0.1a0 released, bringing Luau to WebAssemblyAn early 0.1a0 release of luau-wasm packages Luau, Roblox's typed Lua dialect, for WebAssembly, built using the newly enabled approach for publishing WASM wheels to PyPI for Pyodide.
-
Mapping SQLite result columns back to their source `table.column`Mapping SQLite result columns back to their source table.columnA technical note exploring how to map columns in arbitrary SQLite query results back to their originating table.column, so that Datasette could render queries with richer, source-aware metadata.
-
Police officer investigated for using AI to 'create evidence' in multiple casesPolice officer investigated for using AI to 'create evidence'A police officer is under investigation for allegedly using AI to 'create evidence' in multiple cases, raising concerns about the misuse of generative AI within the justice system.
-
Visual Language Models Train Robots to Read Human EmotionsVisual language models teach robots to read human emotionsAn IEEE Journal Watch piece on using visual language models to let robots interpret human emotions. As robots gain dexterity and increasingly work alongside people, reading facial and emotional cues could make human-robot collaboration safer and smoother.