Funding & M&A C

Showing 61–73 of 73
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    P3B3: A Multi-Turn Conversational Benchmark for Measuring European and Brazilian Portuguese Variety Bias in LLMs
    P3B3: a benchmark for Portuguese variety bias in LLMs
    The paper introduces P3B3, an expert-curated benchmark and framework for measuring European versus Brazilian Portuguese variety bias in LLMs. It reports most models lean strongly toward pt-BR and argues for more balanced multilingual representation.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Automated jailbreak attack targeting multiple defense strategies
    UNIATTACK: a defense-oriented framework for automated black-box LLM jailbreaks
    Retrieval-Augmented Generation (RAG) Speech Processing
    The paper presents UNIATTACK, an adversarial testing framework that systematically builds effective black-box attack prompts on LLMs from a defense-oriented perspective. Unlike static templates or model-specific tuning, it extracts minimal but high-impact features from diverse existing attacks and optimizes them.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗
  • arXiv cs.CL (Computation and Language) · EN Funding & M&A extract
    Multimodal Evaluator Preference Collapse: Cross-Modal Contagion in Self-Evolving Agents
    Paper on evaluator preference collapse in self-evolving agents
    AI Agents DeepSeek GPT
    An arXiv paper reportedly examining preference collapse in multimodal evaluators and its cross-modal contagion within self-evolving agent systems. The source excerpt was unavailable (content filter), so this summary is based on the title only; see the original for methods and findings.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Funding & M&A extract
    SCAR: Semantic Continuity-Aware Retrieval for Efficient Context Expansion in RAG
    SCAR: semantic continuity-aware retrieval for RAG context expansion
    Embeddings Retrieval-Augmented Generation (RAG)
    Note: the abstract was unavailable, so this is summarized neutrally from the title alone. The paper proposes SCAR, a 'semantic continuity-aware retrieval' method aimed at efficient context expansion in retrieval-augmented generation (RAG). Specific mechanisms and evaluation results cannot be confirmed from the title.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    Islamic Large Language Models: From Knowledge Acquisition to Trustworthy and Hallucination-Resistant AI
    Survey reviews Islamic LLMs and trustworthy, hallucination-resistant AI
    Natural Language Processing (NLP) Retrieval-Augmented Generation (RAG) Reinforcement Learning Software Engineering
    This survey reviews the emerging field of Islamic LLMs and trustworthy Islamic AI, spanning Arabic NLP, Qur'anic question answering, knowledge benchmarks, retrieval-augmented generation, and legal reasoning. It argues that Arabic fluency alone is insufficient, and that reliable systems need curated sources, verification modules, and citation-aware generation.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Safety & Evaluation extract
    How Far Can Machine Translation Quality Take You? Extrinsic Discourse Evaluation in Goal-Oriented Setups
    Extrinsic discourse evaluation of machine translation quality
    This arXiv paper argues that standard machine-translation (MT) metrics assess quality intrinsically and miss the downstream consequences of translation errors. Under a static regime, the authors propose an entity-counting task probing referential consistency and show high intrinsic MT quality does not reliably predict downstream discourse success. Under an interactive regime, they use the goal-oriented multi-agent Welfare Diplomacy game as a probe.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • arXiv cs.CL (Computation and Language) · EN Funding & M&A extract
    Can LLM Coding Agents Reason About Time Series?
    Can LLM coding agents reason about time series? A benchmark study
    AI Agents Software Engineering
    This arXiv study tests whether LLM agents can analyze ubiquitous time series data used in finance, healthcare, and environmental monitoring. Comparing three approaches—raw numerical data, the LLM as a coding agent, and a combination—the authors find that agents with code access can outperform models processing raw data by up to 10%, though even the best agent still answers roughly 22-34% incorrectly.
    Read original (arXiv cs.CL (Computation and Language)) ↗
  • Cohere Blog · EN Funding & M&A extract
    Cohere triples UK footprint with new London office to support R&D growth
    Cohere triples its UK footprint with a new London R&D office
    Neural Network Reinforcement Learning
    Cohere announced it will move to a larger London office at 100 New Oxford Street, nearly tripling its UK footprint. The expansion backs the city's AI talent and R&D base and supports growing demand for secure, enterprise-grade sovereign AI across the UK and Europe.
    Read original (Cohere Blog) ↗
  • Hacker News (Front Page) · EN Funding & M&A extract
    How to Earn a Billion Dollars
    Paul Graham essay lays out how to build a billion-dollar fortune
    In a new essay, Y Combinator co-founder Paul Graham examines how to earn a billion dollars, arguing the most reliable path is founding a fast-growing startup that makes something people genuinely want. He frames wealth creation in terms of scale, market choice, and entrepreneurial ambition, in his characteristically plain-spoken style.
    Read original (Hacker News (Front Page)) ↗
  • Simon Willison's Weblog · EN Funding & M&A extract
    Quoting Andrew Singleton
    Simon Willison shares a quote from Andrew Singleton
    Meta Neural Network Software Engineering
    Simon Willison's blog features a quotation from Andrew Singleton. Based on its tags, the post touches on Meta, software engineering, and neural networks; see the original post for the full quote and context.
    Read original (Simon Willison's Weblog) ↗
  • arXiv cs.LG (Machine Learning) · EN Safety & Evaluation extract
    Which Directions Matter? Sparse Design for Affine Robust Optimization
    Sparse design identifies which directions matter in robust optimization
    Machine Learning Retrieval-Augmented Generation (RAG)
    The work studies which uncertainty directions a model must cover in affine robust optimization defined by a finite dictionary and budget. It proposes a sparse design selecting the directions that matter.
    Read original (arXiv cs.LG (Machine Learning)) ↗
  • Hugging Face Blog · EN Safety & Evaluation extract
    olmo-eval: An evaluation workbench for the model development loop
    AllenAI releases olmo-eval, evaluation workbench for model dev loop
    Allen Institute for AI published olmo-eval, an evaluation workbench for the model development loop. The tool appears to support continuous evaluation of models during training, building on the team's OLMo open-model development work.
    Read original (Hugging Face Blog) ↗
  • arXiv cs.AI (Artificial Intelligence) · EN Safety & Evaluation extract
    Rethinking Global Average Pooling: Your Classifier Is Secretly a Multi-Instance Learner
    Rethinking GAP: your classifier is secretly a multi-instance learner
    Retrieval-Augmented Generation (RAG)
    Modern image classifiers widely use global average pooling followed by a linear head. The paper shows this linearity makes GAP-based classifiers behave as multi-instance learners, prompting a rethink of global average pooling.
    Read original (arXiv cs.AI (Artificial Intelligence)) ↗