Developer Tools B
Showing 181–210 of 292
-
INI-VPINN: A Variational Physics-Informed Neural Network with Implicit Neumann and Interface Handling for Multi-Material Domains with Geometric SingularitiesINI-VPINN: a variational PINN for multi-material domainsINI-VPINN is a weak-form physics-informed neural network that naturally incorporates Neumann boundary and interface conditions into a variational formulation, targeting multi-material domains with geometric singularities.
-
LegalHalluLens: Typed Hallucination Auditing and Calibrated Multi-Agent Debate for Trustworthy Legal AILegalHalluLens audits typed legal-AI hallucinations with calibrated debateLegal-AI systems hallucinate at aggregate rates near 52%, but averages hide where and how errors concentrate. LegalHalluLens is an auditing framework pairing typed hallucination auditing with calibrated multi-agent debate to give compliance officers actionable signals for trustworthy legal AI.
-
A T-API-Compliant ReAct Agentic Loop for Optical Networks: Generic vs. Domain-Specific Tool AbstractionsA T-API-compliant ReAct agentic loop for optical networksThe paper presents the first T-API-compliant ReAct loop for intent-driven, closed-loop optical network management, reporting that domain-specific composite tools achieve 90% oracle-validated correctness with threefold token savings versus generic tools.
-
Differential Privacy of Gaussian Process Posterior SamplingDifferential privacy of Gaussian process posterior samplingThe paper studies privacy when releasing posterior sample paths from a Gaussian process where the entire training set is private. Unlike DP mechanisms that add external noise, it shows the intrinsic randomness of posterior sampling itself yields differential-privacy guarantees.
-
Recover Semantics First, Generate Better: Improved Latent Modeling for 3D MRI Reconstruction and Cross-Contrast SynthesisImproved latent modeling for 3D MRI reconstruction and synthesisThe paper proposes an improved latent modeling approach for 3D MRI reconstruction and cross-contrast synthesis, addressing the heavy computational cost of large 3D volumes by recovering semantics first to better infer absent MRI contrasts.
-
STAR: SpatioTemporal Adaptive Reward Allocation for Text-to-Image RL Post-TrainingSTAR: spatiotemporal adaptive reward allocation for text-to-image RLThe paper proposes STAR, a spatiotemporal adaptive reward allocation method for text-to-image RL post-training, replacing a single scalar advantage applied uniformly with rewards that account for the temporal and spatial structure of generation.
-
Learning task-specific subspaces via interventional post-training of speech foundation modelsLearning task-specific subspaces in speech foundation modelsSpeech foundation models produce general-purpose representations encoding salient variables in a distributed way, while downstream tasks use only some variability. The paper learns task-specific subspaces via interventional post-training of speech foundation models.
-
Beyond Visual Cues: CoT-Enhanced Reasoning for Semi-supervised Medical Image SegmentationCoT-enhanced reasoning for semi-supervised medical image segmentationSemi-supervised medical image segmentation mitigates annotation scarcity via consistency regularization but relies mostly on pixel-level visual matching. The paper adds chain-of-thought-enhanced reasoning to go beyond visual cues for segmentation.
-
Predictive Analytics in E-Commerce for CustomerBehavior Forecasting using hybrid Ret-DNN withXGBoost ModelHybrid Ret-DNN with XGBoost for e-commerce behavior forecastingE-commerce platforms struggle to understand customer behavior and predict future purchases. The study proposes predictive analytics using a hybrid Ret-DNN combined with an XGBoost model to forecast customer behavior.
-
ChLogic: Evaluating Robustness of Logical Reasoning in Chinese ExpressionsChLogic evaluates logical reasoning robustness in ChineseLLMs do well on standardized logical reasoning benchmarks, but whether this holds beyond English is unclear. ChLogic is an English-Chinese aligned benchmark testing whether models preserve logical reasoning when the same latent structure is expressed in Chinese.
-
Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning ModelsDynamic rollout editing reduces overthinking in RL reasoning modelsLong chain-of-thought reasoning helps, but models often keep generating unnecessary reasoning after reaching a correct answer. Framing this as overthinking in GRPO-style RL post-training, the paper proposes dynamic rollout editing to reduce it.
-
Dimensionality Controls When Modularity Helps in Continual LearningDimensionality controls when modularity helps in continual learningCompositional learning systems must balance plasticity and stability. The paper analyzes when modularity helps in continual learning and shows that the dimensionality of representations controls whether modular structure is beneficial.
-
GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?GameCraft-Bench: can agents build playable games end-to-end?Game generation is an emerging coding-agent application requiring natural-language specs to become playable interactive systems. GameCraft-Bench evaluates whether agents can build games end-to-end inside a real game engine, where scripts, scenes, assets, rendering and runtime must cohere.
-
SpaceX Is Buying CursorReport: SpaceX is said to be acquiring AI coding tool CursorA headline report states that SpaceX is acquiring Cursor, the AI-assisted code editor. No article body, deal terms, timing, or rationale is available, so this is noted neutrally as an unverified report rather than a confirmed transaction.
-
Meta-classification of one-class classification models using ranking correlation and nearest neighborMeta-classification of one-class models via ranking correlation and kNNML has been applied widely, but applying ML to ML models is underexplored. Treating models as approximable by one-class classification (OCC), the paper proposes meta-classification of OCC models using ranking correlation and nearest-neighbor methods.
-
WallZero: Mastering the Game of WallGo with Strategic AnalysisWallZero masters the board game WallGo with strategic analysisWallGo is a recently introduced strategic board game. WallZero masters WallGo through an approach incorporating strategic analysis, demonstrating game-playing performance and strategic insights.
-
Perceptual compensation for tonal context in self-supervised speech modelsPerceptual compensation for tonal context in self-supervised speech modelsThe study examines the extent to which self-supervised speech models exhibit perceptual compensation for tonal context, analyzing how context effects seen in human speech perception are reflected in the models' learned representations.
-
When Multiple Scripts Matter: Evaluating ASR in Clinical SettingsEvaluating ASR in clinical settings when multiple scripts matterAutomatic speech recognition in non-English clinical settings faces multiscript variability, where a term appears in multiple valid orthographies. String-matching metrics treat variants as errors and underestimate performance; the paper studies ASR evaluation when multiple scripts matter.
-
A Framework for Evaluating Agentic Skills at ScaleA framework for evaluating agentic skills at scaleAgent skills, structured reusable knowledge artifacts that augment LLM agents, have been rapidly adopted, yet their cross-domain impact and a reusable methodology for evaluating individual skills are lacking. The paper presents a framework for evaluating agentic skills at scale.
-
Beyond Native Success: Auditing Deployment-Interface Exposure of CLIP BackdoorsAuditing deployment-interface exposure of CLIP backdoorsCLIP models are reused across downstream interfaces including feature extraction, retrieval, reranking and selection. Existing CLIP backdoors are validated on small attack-native tasks; the paper audits backdoor exposure across deployment interfaces beyond native success.
-
The Slop Paradox: How Synthetic Standardization Erodes Clinical Uncertainty and Cross-Modal Alignment in AI-Rewritten Radiology ReportsThe Slop Paradox: AI-rewritten radiology reports erode clinical uncertaintyAI clinical documentation tools increasingly summarize and reformat radiology reports with LLMs. Using 450 chest X-ray reports from the Indiana University dataset, the paper measures resulting information degradation, showing erosion of clinical uncertainty and cross-modal alignment in AI-rewritten reports.
-
Toward Accessible Psychotherapy Training Using AI-Driven Interactive Patient AvatarsAI-driven patient avatars for more accessible psychotherapy trainingTraining psychotherapists in evidence-based interventions like Acceptance and Commitment Therapy needs repeated practice with feedback, limited by ethical, logistical and resource constraints. The paper introduces AI-driven interactive patient avatars to make such training more accessible.
-
SpaceX to buy Cursor for $60BSpaceX to acquire Cursor (Anysphere) for $60B, per ReutersA Reuters report that SpaceX plans to acquire the AI coding tool Cursor (Anysphere) for $60 billion, drawing attention to the deal's scale and rationale. Deal value and details are per reporting and unverified.
-
LLMs Infer Cultural Context but Fail to Apply It When RespondingLLMs infer cultural context but fail to apply it when respondingLLMs are known to overrepresent dominant, often Western cultures while marginalizing others. The paper evaluates how this affects culturally adapted response generation, finding that models can infer cultural context but fail to apply it when responding.
-
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent ReasoningFrom trainee to trainer: LLM-designed RL training environmentsRL pipelines for LLM training often rely on manually redesigned environments between stages, forcing heuristic guesses about good configurations. The paper has the LLM itself design training environments for reinforcement learning with multi-agent reasoning, moving from trainee to trainer.
-
Prompt Perturbation for Reliable LLM Evaluation over Comparison GraphsPrompt perturbation for reliable LLM evaluation over comparison graphsEvaluating LLMs is important but can be fragile to small prompt changes. The paper proposes using prompt perturbation to achieve more reliable LLM evaluation over comparison graphs.
-
Why adding ontologies to LLMs won't yield machine intelligenceTalk: adding ontologies to LLMs won't yield machine intelligenceA video shared via the lobste.rs AI feed argues that adding ontologies—explicit symbolic knowledge structures—onto LLMs will not yield genuine machine intelligence. It treats symbolic augmentation and LLMs' statistical processing as fundamentally distinct, concluding that ontology integration alone is insufficient for intelligence. Neutral summary based on title and context, as the excerpt is minimal.
-
Cloudflare CAPTCHA on at least one ampersandCloudflare WAF: fire CAPTCHA only on search URLs with an ampersandA Simon Willison TIL: to stop crawlers hammering his faceted search engine he used Cloudflare's WAF Managed Challenge, but plain single-term searches kept triggering it. Working with Claude Code, he added a custom rule so the CAPTCHA only fires when a search URL contains at least one ampersand, letting single-keyword queries through.
-
Fine-Tuning Biological Foundation Models with LoRA Using NVIDIA BioNeMo RecipesNVIDIA details LoRA fine-tuning of biological foundation models via BioNeMoAn NVIDIA developer blog post explains how to efficiently fine-tune biological foundation models—pretrained on large protein or genomic sequence corpora, such as the ESM2 protein language model—using LoRA, illustrated with the company's BioNeMo Recipes. A technical piece on applying foundation models in computational biology.
-
The Value Axis: Language Models Encode Whether They're on the Right TrackLLMs encode a 'value axis' tracking if their strategy worksResearchers built a 'value axis' for Qwen3-8B that captures whether its current strategy is likely to reach its goal. The axis separates high- and low-confidence rollouts, backtracking, and correct vs. corrupted code; steering it up suppresses self-correction while steering down induces exploration. DPO can raise the internal value of rewarded behaviors.