Funding & M&A C
Showing 1–30 of 74
-
融資の決め手、決算書→「データ」「未来のシナリオ」へ 中小企業が資金調達に成功するための最大のポイントは?SME lending shifts from balance sheets to data and future scenariosExecutives from 01 Bank, a lending-focused digital bank, and Hokkoku Bank discuss how small and midsize firms' fundraising is changing. They argue the decisive factor in lending is shifting from financial statements toward 'data' and 'future scenarios'.
-
The founder of Craigslist has given away half a billion dollarsCraigslist founder has given away half a billion dollarsCraig Newmark, founder of the classifieds site Craigslist, has donated half a billion dollars over the years. He is known for philanthropy supporting causes such as journalism, cybersecurity, and military veterans.
-
Analyzing Defensive Misdirection Against Model-Guided Automated Attacks on Agentic AI SystemsAnalyzing defensive misdirection against attacks on agentic AIAgentic AI systems increasingly rely on language-model components to interpret instructions, exposing them to attacks. This paper analyzes defensive misdirection as a countermeasure against model-guided automated attacks.
-
Topological Data Analysis for High-Dimensional Dynamic Process MonitoringTopological data analysis for high-dimensional process monitoringThe paper presents a process-monitoring approach that combines topological data analysis with machine learning to extract actionable information from high-dimensional time-series. It represents multivariate time-series data for real-time monitoring of dynamic processes.
-
Sparsity, Superposition, and Forgetting: A Mechanistic Study of Representation Retention in Continual LearningA mechanistic study of forgetting in continual learningThe paper presents a mechanistic study of representation retention in continual learning, using a controlled toy-world framework to make the drivers of forgetting observable and testable. It examines how sparsity and superposition relate to forgetting, isolating mechanisms that real datasets usually entangle.
-
CATCH-ME if you RAG: a dataset of Contextually Annotated multi-Turn Counterspeech against Hate and Misinformation ExchangesCATCH-ME: a counterspeech dataset against hate and misinformationThe paper introduces CATCH-ME, a dataset of contextually annotated multi-turn counterspeech against overlapping hate speech and misinformation. It addresses NLP's tendency to treat the two threats in isolation and the tendency of zero-shot LLMs to produce repetitive, vague counterspeech.
-
QMFOL: Benchmarking Large Language Model Reasoning via Quantifiable Monadic First-Order Logic Test Case GenerationQMFOL: benchmarking LLM reasoning via first-order logic test generationLarge language models have advanced in reasoning, especially deduction. QMFOL benchmarks LLM reasoning through quantifiable monadic first-order logic test-case generation.
-
Learner-based Concept Drift Detection: Analysis and EvaluationLearner-based concept drift detection: analysis and evaluationMachine learning deployed in evolving streaming environments must handle non-stationarity. This work analyzes and evaluates learner-based approaches to concept drift detection.
-
CzechDocs: A Multiway Parallel Dataset of Formatted Documents for Minority Languages in CzechiaCzechDocs: a parallel formatted-document MT dataset for CzechiaThe paper presents CzechDocs, a multiway parallel dataset of formatted documents in HTML, DOCX, and PDF covering Czech and minority languages used in Czechia—primarily Ukrainian and English, with smaller amounts of Vietnamese, Russian, and others. It is designed to support evaluation of machine translation.
-
IHUBERT: Vector-Based Semantic Deduplication and Domain-Balanced Pretraining for Persian ResourcesIHUBERT: a Persian language model with semantic dedup pretrainingThe paper presents IHUBERT, a monolingual Persian pretrained language model trained from scratch on a RoBERTa-base encoder. It uses vector-based semantic deduplication and domain-balanced pretraining to address the scarcity of large, high-quality Persian corpora and limited evaluation.
-
Improving health intelligence in ChatGPTOpenAI improves ChatGPT health responses with GPT-5.5 InstantOpenAI says GPT-5.5 Instant strengthens ChatGPT's health and wellness responses through better reasoning, richer context, and clearer communication. The work is backed by physician-informed evaluations aimed at delivering more reliable, trustworthy health guidance.
-
Source-Grounded Data Generation for Text-to-JSON LearningSource-grounded data generation for text-to-JSON extractionThe paper proposes source-grounded data generation for text-to-JSON learning, where models extract information from long unstructured documents into structured, machine-readable JSON. It targets domains such as financial filings and clinical records that store high-value information in unstructured text.
-
Generative Engine Optimization at Scale: Measuring Brand Visibility Across AI Search EnginesMeasuring brand visibility across AI search engines at scaleThe paper studies generative engine optimization at scale, measuring how brands are represented, cited, and recommended across AI search engines such as ChatGPT, Claude, Perplexity, and Gemini. It frames the shift from traditional SEO as users increasingly get answers directly from these engines.
-
Large Language Models Do Not Always Need Readable LanguageLLMs don't always need human-readable languageThe paper investigates whether semantic information can be encoded in compact, non-standard text that sacrifices human readability while remaining usable by models. It argues large language models do not always need human-readable language, especially when the intended reader is another model.
-
Prompt, Plan, Extract: Zero-Shot Agentic LLMs Workflows for Lung Pathology Extraction from Clinical NarrativesZero-shot agentic LLM workflows for lung pathology extractionThe paper presents Prompt, Plan, Extract, a zero-shot agentic LLM workflow for extracting lung pathology information from clinical narrative reports. It targets the labor-intensive, error-prone manual extraction needed for cancer staging and tumor registries, avoiding fully supervised NLP pipelines.
-
JAMER: Project-Level Code Framework Dataset and Benchmark on Professional Game EnginesJAMER: a project-level code benchmark on game enginesThe paper introduces JAMER, a project-level code framework dataset and benchmark for professional game engines. It addresses the lack of large-scale datasets and deterministic evaluation for project-level code engineering, which has remained underexplored despite progress in AI-driven game asset and gameplay generation.
-
CREDENCE: Claim Reduction for Decomposition & Enhanced Credibility -- Semantic Metrics and Convergence AnalysisCREDENCE: semantic metrics for claim decomposition in fact-checkingThe paper presents CREDENCE, an approach to decomposing compound sentences into atomic, verifiable claims for automated fact-checking. It introduces semantic metrics that avoid token-overlap measures, which underestimate quality for paraphrastic claims, and adds convergence and termination analysis.
-
AgentFinVQA: A Deployable Multi-Agent Pipeline for Auditable Financial Chart QAAgentFinVQA: an auditable multi-agent pipeline for financial chart QAThe paper presents AgentFinVQA, a deployable multi-agent pipeline for auditable financial chart question answering. It targets regulated settings where practitioners must know which answers to trust and cannot send client data to external model providers, unlike existing accuracy-focused, opaque chart-QA agents.
-
Leaked financial docs show OpenAI is losing billions of dollars a yearLeaked financial docs show OpenAI is losing billions a yearAn article citing leaked financial documents indicating that OpenAI is losing billions of dollars a year. It feeds the debate over the enormous costs of developing and running generative AI and the challenge of turning a profit.
-
Learning User Simulators with Turing RewardsUser simulators learned with Turing rewards for agent trainingSimulating human users in interactive settings could advance training of agent assistants, evaluation of personalization systems, and social-science research. This work learns user simulators using Turing rewards, aiming to reproduce more realistic user behavior.
-
NeSyCat Torch: A Differentiable Tensor Implementation of Categorical Semantics for Neurosymbolic LearningNeSyCat Torch unifies neurosymbolic semantics via category theoryNeurosymbolic semantics is fragmented: classical, fuzzy, probabilistic, and neural systems each define truth by their own rules. Extending ULLER, NeSyCat subsumes them under a single inductive definition of truth, delivered as a differentiable tensor implementation for neurosymbolic learning.
-
Beyond Algorithms: Conceptual Innovation in Medical Imaging AIBeyond algorithms: the case for conceptual innovation in medical imaging AIAI has driven rapid progress in medical imaging, yielding ever more sophisticated algorithms and steady benchmark gains. Yet this algorithm-centric trajectory reveals limits. This work argues for conceptual innovation beyond algorithms to achieve clinically meaningful advances in medical imaging AI.
-
SCAN: Enhance Time Series Anomaly Detection via Multi-Scale Neighborhood-Centered ClusteringSCAN boosts time-series anomaly detection via neighborhood clusteringTime-series anomaly detection is crucial across applications, and reconstruction-based methods dominate but suffer from over-generalization that reconstructs anomalies too well. SCAN uses multi-scale neighborhood-centered clustering to curb this over-generalization and improve detection.
-
A Taxonomy of Mental Health and Technology Needs for Alzheimer's and Dementia CaregiversA taxonomy of mental-health and tech needs for dementia caregiversFamily members caring for people with Alzheimer's and related dementias form the foundation of long-term care worldwide; in 2023 over 11 million U.S. relatives provided unpaid care. This work presents a taxonomy of caregivers' mental-health and technology needs to guide supportive design.
-
When AUC Misleads: Polarization-Aware Evaluation of Deepfake Detectors under Domain ShiftPolarization-aware evaluation of deepfake detectors under domain shiftAdvances in diffusion models and face-swapping enable highly realistic deepfakes and real-world harm. This work shows AUC can mislead when evaluating detectors under domain shift, and proposes a polarization-aware evaluation that better reflects deepfake detector performance across domains.
-
A Clinician-Centered Pipeline for Annotation and Evaluation in Ultrasound AI StudiesA clinician-centered annotation and evaluation pipeline for ultrasound AIClinician-centered evaluation is critical for validating medical AI, especially in ultrasound imaging where quantitative metrics do not always capture clinical usability. This work proposes a clinician-centered pipeline for annotation and evaluation in ultrasound AI studies to ground validation clinically.
-
Dango: A Strictly L1-Only Large Language Model for Studying Second Language AcquisitionDango: an L1-only 1.8B LLM for studying second-language acquisitionThe authors introduce Dango, a 1.8B-parameter language model designed for controlled studies of L1-to-L2 (Japanese-to-English) transfer in second language acquisition. By training strictly on L1 only, Dango enables controlled experiments on transfer phenomena that prior SLA model studies could not.
-
Human-AI Coevolution Dynamics: A Formal Theory of Social Intelligence Emergence Through Long-Term InteractionA formal theory of human-AI coevolution and social intelligenceConversational AI has advanced in language generation, personalization, and long-context interaction, but most methods model social behavior through isolated components. This work offers a formal theory of human-AI coevolution dynamics, explaining how social intelligence emerges through long-term interaction.
-
Urdu Katib Handwritten Dataset: A Historical Document Dataset for Offline Urdu Handwritten Text Recognition with CRNN-Based Baseline EvaluationUrdu Katib: a historical dataset for offline Urdu handwriting recognitionAutomatic handwritten text recognition is challenging, especially for cursive scripts. This work introduces the Urdu Katib Handwritten Dataset, a historical-document dataset for offline Urdu handwritten text recognition, providing resources to advance recognition research on cursive scripts.
-
Mitigating Scoring Errors and Compensating for Nonverbal Subtests in Speech-Based Dementia AssessmentMitigating scoring errors in speech-based dementia assessmentEarly detection of cognitive impairment relies on neuropsychological tests whose scoring is subjective. This work mitigates scoring errors and compensates for nonverbal subtests in speech-based dementia assessment, aiming for more objective and reliable screening.