Funding & M&A C
Showing 31–60 of 73
-
The State of Fable, The Jailbreak Problem, SpaceX Acquires CursorStratechery on Fable's state, jailbreaks, and SpaceX buying CursorA Stratechery column by Ben Thompson on three topics: the state of Anthropic's Fable model, the AI jailbreak problem, and SpaceX's acquisition of Cursor. Thompson argues the administration is likely wrong about Fable but that responsibility ultimately lies with Anthropic. Views are the author's; deal specifics are unverified.
-
Aligning Implied Statements for Implicit Hate Speech Generalizability with Context-Bounded Semi-hard Negative MiningAligning implied statements for generalizable implicit hate detectionClassifying implicit hate speech is hard because intent is rarely explicit. This work aligns implied statements and applies context-bounded semi-hard negative mining to improve the generalizability of implicit hate speech detection.
-
Cursor、Gitホスティング「Origin」発表 SpaceXによる買収発表直後にCursor unveils 'Origin' Git hosting, seen as a GitHub rivalCursor, the AI coding tool, announced 'Origin', a Git hosting service that the article frames as aimed at rivaling GitHub. The reveal reportedly came right after news of SpaceX acquiring Cursor. Acquisition terms and Origin's features are article-based, and third-party verification is unconfirmed.
-
LegalWorld: A Life-Cycle Interactive Environment for Legal AgentsLegalWorld: a life-cycle interactive environment for legal agentsCivil litigation is inherently a life-cycle process where documents connect across stages. LegalWorld provides an interactive environment covering the full litigation life cycle, enabling legal agents to be evaluated and trained within that flow.
-
LLMs Struggle to Measure What Distinguishes Students of Different Proficiency Levels: A Study of Item Discrimination in Reading Comprehension AssessmentLLMs struggle to measure item discrimination in reading assessmentItem discrimination is a fundamental psychometric property that distinguishes students of different proficiency. This study shows that large language models struggle to measure item discrimination in reading comprehension assessment, exposing limits of automated evaluation.
-
SpaceX、AIコーディング「Cursor」を9.6兆円で買収 「近く大幅な改善」へSpaceX reported to acquire AI coding tool Cursor for 9.6 trillion yenSpaceX is reported to be acquiring the AI coding tool "Cursor" for 9.6 trillion yen. Cursor said on its official X account that "major improvements are coming soon," according to the article. Deal details and the headline figure are based on the report and remain unverified by third parties.
-
The Stanford EDGAR Filings Dataset: Reconstructing U.S. Corporate and Financial Disclosures into Layout-Faithful and Token-Efficient Pretraining DataSEFD: an open, layout-faithful reconstruction of SEC filings for LLMsThe paper introduces the Stanford EDGAR Filings Dataset (SEFD), an open reconstruction of SEC filings into layout-faithful MultiMarkdown, providing audited financial disclosures as token-efficient pretraining and evaluation data for financial language modeling.
-
A Diffusion Approximation for Temporal-Difference Learning with Linear Features under Markovian NoiseA diffusion approximation for TD learning under Markovian noiseThe classical continuous-time description of temporal-difference learning with linear features is an ODE capturing asymptotic mean dynamics but neglecting stochasticity. This work provides a diffusion approximation for TD learning under Markovian noise to capture those fluctuations.
-
Evaluating Open-Source LLMs for Multi-Label ATT&CK Technique Classification on CTI ReportsEvaluating open-source LLMs for ATT&CK multi-label CTI classificationThe paper evaluates open-source LLMs on multi-label classification of cyber threat intelligence (CTI) reports using MITRE ATT&CK techniques. Summary is title-based and neutral; details and figures are as presented by the source and not independently verified.
-
The Measurement Gap in the Automation of EU Law: Benchmarking Doctrinal Legal Reasoning under the EU AI ActBenchmarking doctrinal legal reasoning under the EU AI ActLLMs produce legal text of at least median quality, yet no benchmark evaluates doctrinal legal reasoning, the interpretive core of legal work. The paper benchmarks doctrinal reasoning under the EU AI Act and discusses the measurement gap in legal automation.
-
Embedded Machine Learning for Microcontroller-Class Edge Devices: Data, Feature, Evaluation, and Deployment PipelinesA pipeline survey of embedded ML for microcontroller-class devicesEmbedded machine learning moves inference from the cloud to resource-constrained devices. This practice-oriented synthesis lays out data, feature, evaluation and deployment pipelines for an embedded ML workflow on microcontroller-class platforms.
-
Edge Flow: A Tractable and Predictive Continuous-Time Model for Gradient Descent at the Edge of StabilityEdge Flow: a tractable continuous-time model for GD at the edge of stabilityGradient descent in deep learning can operate at the edge of stability, where the loss Hessian's top eigenvalue hovers near the stability threshold. Classical tools fail there, so Edge Flow offers a tractable, predictive continuous-time model of this regime.
-
When LLMs Analyze Scars: From Images to Clinically-Meaningful FeaturesWhen LLMs analyze scars: images to clinically-meaningful featuresMedical image classification excels at scale but real clinics face data scarcity from annotation cost, privacy and disease rarity. Focusing on pathological scar classification, the paper uses LLMs to derive clinically-meaningful features from images.
-
C2FL: Clustered Continual Federated Learning under Spatial and Temporal DriftC2FL: clustered continual federated learning under driftCollective adaptive systems let nodes learn from locally sensed data, but privacy-sensitive data and node mobility hinder scaling. C2FL proposes clustered continual federated learning that handles spatial and temporal drift.
-
Multiple cyclicity and Wavelet Decomposition with Channel Correlation for Long-term Time Series ForecastingMultiple cyclicity and wavelet decomposition for long-term forecastingCyclicity and trend are key components of time series, but prior work often neglects real-world inter-channel correlations. The paper combines multiple cyclicity with wavelet decomposition and channel correlation to improve long-term time series forecasting.
-
Dimensionality Controls When Modularity Helps in Continual LearningDimensionality controls when modularity helps in continual learningCompositional learning systems must balance plasticity and stability. The paper analyzes when modularity helps in continual learning and shows that the dimensionality of representations controls whether modular structure is beneficial.
-
A Framework for Evaluating Agentic Skills at ScaleA framework for evaluating agentic skills at scaleAgent skills, structured reusable knowledge artifacts that augment LLM agents, have been rapidly adopted, yet their cross-domain impact and a reusable methodology for evaluating individual skills are lacking. The paper presents a framework for evaluating agentic skills at scale.
-
Fox Buys Roku, The Problem With Fox’s Smart Strategy, Streaming That WorksStratechery analyzes Fox's acquisition of Roku and its strategyTech-analysis newsletter Stratechery examines Fox's acquisition of Roku. While the market reacted negatively, the piece argues Fox is trading direct extraction from rights holders for the leverage of operating as a distribution 'renter,' and reflects on streaming business models and media-company positioning.
-
Your Privacy My Cloak: Backdoor Attacks on Differentially Private Federated LearningDP can hide backdoors in federated learning, enabling RING attackChallenging the belief that differential privacy (DP) makes federated learning robust to backdoors, the authors show empirically that complying with DP masks the statistical signatures defenses rely on, rendering them ineffective. They exploit this with RING, an attack that uses DP to conceal malicious contributions while maximizing impact, acting as a perturbation layer agnostic to the underlying backdoor technique.
-
Learning the Geometry of Data: A Mathematical Review of Shape Space AnalysisA mathematical review of shape space analysis for geometric dataThis survey synthesizes the fast-growing literature on shape space analysis, a framework for data whose observations carry rich geometric form across biology, medicine, anthropology and vision. Drawing on differential geometry, statistics and ML, it organizes the work around a shared pipeline of shape representation, parameterization and metric construction.
-
A Multi-Center Benchmark for Abdominal Disease Diagnosis and Report Generation from Non-Contrast CTMulti-center benchmark diagnoses abdominal disease from non-contrast CTThe paper introduces a multi-center benchmark for multi-organ abdominal disease diagnosis and automated radiology report generation that synthesizes contrast-enhanced findings from single-phase non-contrast CT, aiming to cut contrast risks and radiologist workload. Using paired NCCT-CECT studies from two centers, it benchmarks five deep-learning architectures under a unified protocol.
-
Latent space mapping of interpretable structural coordinates from stochastic single-molecule signalsContrastive latent mapping of nanopore signals into molecular coordinatesNanopores are versatile single-molecule sensors, but stochastic translocation dynamics warp encoded information, limiting their utility. The paper shifts from time-domain analysis to a learned latent-space mapping via a contrastive encoder trained only on simulated signals from a physics-informed model. It maps nanopore signals of engineered DNA barcodes into an interpretable molecular coordinate system that responds to structural parameters but stays invariant to acquisition conditions.
-
A Unified Causal-Origin Taxonomy of Distributional Shifts in Reinforcement LearningA unified causal-origin taxonomy of distributional shifts in RLReinforcement learning systems degrade when operating conditions diverge from training, reflecting distributional shifts in the data-generating process. These shifts arise between training and evaluation (ID vs. OOD generalization) or in non-stationary settings where dynamics evolve, yet their formal relationship is unclear and prior work emphasizes mitigation over causes. The paper proposes a unified taxonomy of the causal origins of shift within the agent-environment interaction.
-
IMPACTeen: Intentions, Manipulation, Persuasion, Annotations, and Consequences in Teen Communication DatasetIMPACTeen: a teen-context dataset of social-influence scenarios and labelsThe paper introduces IMPACTeen, a dataset of textual social-influence scenarios in adolescent interpersonal, media, and digital settings. It contains 1,021 texts and 5,100 annotation records labeled from five perspectives (teens, parents, psychologists, communication experts, teachers), built via constrained LLM generation plus two-step human editing, with Polish and English versions. Summarized neutrally from the abstract.
-
Upper Bounds on the Generalization Error of Deep Learning Models via Local Robustness and StabilityTighter deep-learning generalization bounds via local robustnessRobustness-based generalization bounds are often vacuous in practice. The authors trace much of the looseness to the robustness term itself, especially for 0-1 loss, which is usually treated as a global measure. They propose a bound that scales the robustness term by the number of stable and unstable samples across input sub-regions, yielding tighter estimates.
-
HawkesNest: A Multi-Axis Synthetic Benchmark for Spatiotemporal Pattern ComplexityHawkesNest: a synthetic benchmark for spatiotemporal point process modelsEvaluating spatiotemporal point process (STPP) models relies on opaque real datasets where failures are hard to attribute. HawkesNest is a generator-aligned synthetic benchmark built on a multivariate Hawkes backbone, defining four complexity axes with deterministic indices so models can be stress-tested under known structural difficulty.
-
Beyond Models: Reflections on Engineering AI-enabled Systems in a Project-Based CourseReflections on teaching the engineering of AI-enabled systems in a courseThis paper reflects on a project-based master's course at the University of Bremen on engineering AI-enabled systems. It argues that machine learning courses emphasize model development while students lack experience in architectural design, deployment, and monitoring, and reports on the course's design and implementation.
-
Robust Spoofed Speech Detection via Temporal Pyramid ModelingTemporal Pyramid modeling for robust, generalizable spoofed-speech detectionThe paper proposes a Temporal Pyramid Adapter for spoofed speech detection, using parallel temporal convolutions with varying receptive fields to capture multi-scale cues from local artifacts to global prosodic irregularities. It combines self-supervised XLS-R representations with front-end adapters to improve cross-dataset generalization.
-
ATOM-Bench: A Real-World Benchmark for Atomic Skills and Compositional Generalization in Manipulation PoliciesATOM-Bench evaluates atomic skills and compositional generalization in robotsThe paper presents ATOM-Bench, a real-world benchmark for evaluating both atomic skills and compositional generalization in robotic manipulation policies. It factorizes tabletop manipulation into motor and instruction atoms, noting that a policy may succeed on demonstrated tasks yet fail to execute fine-grained skills or recombine them in new structures.
-
LabOSBench: Benchmarking Computer Use Agents for Scientific Instrument ControlLabOSBench: a simulated testbed for computer-use agents controlling instrumentsThe paper proposes LabOSBench, a simulated yet realistic testbed for evaluating computer-use agents on scientific instrument control. It notes that existing benchmarks focus on software tasks in virtual systems, while real instruments require coordinated interface control and feedback-driven parameter tuning that are costly and risky to evaluate directly.