AI 日报 | 2026-04-27
今日共收录 147 条资讯
📰 业界新闻
⭐️⭐️⭐️ DeepSeek V4预览发布
中国 AI 公司 DeepSeek 于 4 月 24 日发布新一代旗舰模型 V4 预览版。该模型采用新设计,可更高效处理大量文本,因此支持比上一代更长的提示词输入。V4 延续 DeepSeek 以往路线,继续以开源形式提供。该发布受到关注,因为它可能进一步降低长上下文大模型的使用门槛,并影响开源模型生态竞争。
- 相关: DeepSeek, V4, 开源大模型
- 标签: 大模型, 开源, 长上下文
- 📎 原文链接
⭐️⭐️⭐️ 中国否决Meta收购
中国监管部门已要求 Meta 撤销对 Manus 的 20 亿美元收购交易。该交易在历经数月审查后被叫停,可能影响扎克伯格推进 AI 智能体业务的计划。此举显示大型 AI 相关并购正面临更严格的跨境监管审查。
- 相关: Meta, Manus, 马克·扎克伯格, AI智能体, 中国监管部门
- 标签: AI并购, 监管审查, Meta
- 📎 原文链接
⭐️⭐️⭐️ David Silver新公司融资
前DeepMind研究员David Silver创立的英国AI实验室Ineffable Intelligence完成11亿美元融资,公司估值达到51亿美元。该公司成立仅数月,目标是构建无需人类数据即可学习的AI系统。大额融资显示投资者对新型自主学习路线的高度关注。
- 相关: Ineffable Intelligence, David Silver, DeepMind
- 标签: 融资, 自主学习, AI实验室
- 📎 原文链接
⭐️⭐️⭐️ OpenAI获AWS销售权
OpenAI已与其最大股东微软达成重要让步安排,从而消除其与亚马逊500亿美元合作中的法律风险。根据协议,OpenAI将可在AWS上销售产品,同时微软将在收入分成协议中获得更多现金回报。此举可能扩大OpenAI的云渠道选择,并缓解其对微软生态的依赖。
- 相关: OpenAI, Microsoft, Amazon, AWS
- 标签: 云计算, 商业合作, 收入分成
- 📎 原文链接
⭐️⭐️ The creator of Claude Code just revealed
When the creator of the world's most advanced coding agent speaks, Silicon Valley doesn't just listen — it takes notes. For the past week, the engineering community has been dissecting a thread on X from Boris Cherny , t
- 相关: The, Claude, Code
- 标签: news, VentureBeat AI
- 📎 原文链接
⭐️⭐️ Nous Research's NousCoder-14B is an open
Nous Research , the open-source artificial intelligence startup backed by crypto venture firm Paradigm , released a new competitive programming model on Monday that it says matches or exceeds several larger proprietary s
- 相关: Nous, Research's, NousCoder-14B, Claude, Code
- 标签: news, VentureBeat AI
- 📎 原文链接
⭐️⭐️ Anthropic launches Cowork, a Claude Desk
Anthropic released Cowork on Monday, a new AI agent capability that extends the power of its wildly successful Claude Code tool to non-technical users — and according to company insiders, the team built the entire featur
- 相关: Anthropic, Cowork, Claude, Desktop
- 标签: news, VentureBeat AI
- 📎 原文链接
⭐️⭐️ Salesforce rolls out new Slackbot AI age
Salesforce on Tuesday launched an entirely rebuilt version of Slackbot , the company's workplace assistant, transforming it from a simple notification tool into what executives describe as a fully powered AI agent capabl
- 相关: Salesforce, Slackbot, AI, Microsoft, Google
- 标签: news, VentureBeat AI
- 📎 原文链接
⭐️⭐️ Listen Labs raises $69M after viral bill
Alfred Wahlforss was running out of options. His startup, Listen Labs , needed to hire over 100 engineers, but competing against Mark Zuckerberg's $100 million offers seemed impossible. So he spent $5,000 — a fifth of hi
- 相关: Listen, Labs, AI
- 标签: news, VentureBeat AI
- 📎 原文链接
⭐️⭐️ Claude Code costs up to $200 a month. Go
The artificial intelligence coding revolution comes with a catch: it's expensive. Claude Code , Anthropic's terminal-based AI agent that can write, debug, and deploy code autonomously, has captured the imagination of sof
- 相关: Claude, Code, Goose
- 标签: news, VentureBeat AI
- 📎 原文链接
⭐️⭐️ Railway融资1亿美元
云平台 Railway 宣布完成 1 亿美元 B 轮融资,由 TQ Ventures 领投,FPV Ventures、Redpoint 和 Unusual Ventures 参投。该公司称已拥有 200 万开发者,每月处理超过 1000 万次部署,并通过边缘网络处理超过 1 万亿次请求。Railway 主打 AI 原生云基础设施,宣称部署时间低于 1 秒,客户可获得最高 65% 成本节省,以应对 AI 编程工具带来的软件部署速度需求。公司计划用新资金扩大全球数据中心、扩充团队,并建立更完整的市场推广体系,直接挑战 AWS、Google Cloud 等传统云平台。
- 相关: Railway, TQ Ventures, FPV Ventures, Redpoint, Unusual Ventures, AWS, Google Cloud, Claude, ChatGPT, Cursor
- 标签: AI基础设施, 云计算, 融资, 开发者工具
- 📎 原文链接
⭐️⭐️ Google announces Gemma 4 open AI models,
Gemma 4 brings the first major update to Google's open models in a year.
- 相关: Google, Gemma, AI, Apache
- 标签: news, Ars Technica AI
- 📎 原文链接
⭐️⭐️ “The problem is Sam Altman”: OpenAI insi
OpenAI brainstorms ways AI can benefit humanity in effort to counter bad vibes.
- 相关: “The, Sam, Altman”, OpenAI, CEO
- 标签: news, Ars Technica AI
- 📎 原文链接
⭐️⭐️ Testing suggests Google's AI Overviews t
Is 90 percent accuracy good enough for a search robot?
- 相关: Testing, Google's, AI, Overviews
- 标签: news, Ars Technica AI
- 📎 原文链接
⭐️⭐️ To beat Altman in court, Musk offers to
Musk won’t seek a “single dollar” in OpenAI suit after asking to pocket up to $134 billion.
- 相关: To, Altman, Musk, OpenAI
- 标签: news, Ars Technica AI
- 📎 原文链接
⭐️⭐️ First man convicted under Take It Down A
Ohio man used more than 100 AI tools to make fake nudes of women and minors.
- 相关: First, Take, It, Down, Act
- 标签: news, Ars Technica AI
- 📎 原文链接
⭐️⭐️ Google introduces "Skills" in Chrome to
You can save custom prompts you find useful or grab a premade Skill from Google's library.
- 相关: Google, "Skills", Chrome, Gemini
- 标签: news, Ars Technica AI
- 📎 原文链接
⭐️⭐️ Gemini can now create personalized AI im
Google is making it easier to feed your photos into Nano Banana for more personal image generation.
- 相关: Gemini, AI, Google, Photos
- 标签: news, Ars Technica AI
- 📎 原文链接
⭐️⭐️ Deezer says 44% of new music uploads are
AI tracks account for a small fraction of Deezer streams, and most are demonetized for fraud.
- 相关: Deezer, AI-generated
- 标签: news, Ars Technica AI
- 📎 原文链接
⭐️⭐️ 当前AI十大趋势
MIT Technology Review 发布文章,梳理当前 AI 领域值得关注的十项关键趋势、技术与研究方向。文章面向 2026 年 AI 发展态势,聚焦哪些变化正在影响产业和研究议程。虽然原始摘要未列出具体条目,但该类趋势盘点通常用于帮助读者快速理解 AI 领域的重点变化。其价值在于为企业、研究者和政策观察者提供阶段性参考。
- 相关: MIT Technology Review, 人工智能
- 标签: AI趋势, 技术观察, 产业研究
- 📎 原文链接
⭐️⭐️ LLM迈向新阶段
MIT Technology Review 回顾称,自 2022 年底 ChatGPT 以实验原型发布以来,LLM 迅速成为数亿人日常使用的“万能应用”。这一浪潮推动整个科技行业竞相推出同类产品,并重塑了原有技术格局。文章聚焦大语言模型进入下一阶段后的产业影响与演进方向。
- 相关: OpenAI, ChatGPT, 大语言模型
- 标签: LLM, 生成式AI, 产业趋势
- 📎 原文链接
⭐️⭐️ AI诈骗加速升级
MIT Technology Review 指出,ChatGPT 发布后,生成式 AI 能从简单提示中批量生成类人文本,这一能力很快被犯罪分子利用。攻击者开始用大语言模型生成恶意邮件,既包括大规模垃圾邮件,也包括更复杂的定向攻击。文章强调,生成式 AI 正在降低网络诈骗内容生产门槛,并放大欺诈活动规模。
- 相关: ChatGPT, 生成式AI, 大语言模型
- 标签: AI安全, 网络诈骗, 恶意邮件
- 📎 原文链接
⭐️⭐️ 世界模型受关注
MIT Technology Review 关注“世界模型”方向:当前 AI 已在数字世界展现出强能力,但在物理世界中仍面临明显挑战。相比写小说或编写应用,让 AI 叠衣服或在城市街道中导航要困难得多。文章指出,研究者正探索通过世界模型帮助 AI 理解和预测物理环境,以推动具身智能等应用发展。
- 相关: 世界模型, AI系统, 具身智能
- 标签: 世界模型, 机器人, 物理智能
- 📎 原文链接
⭐️⭐️ 深度伪造武器化加剧
MIT Technology Review 指出,AI 生成的视频、图像和音频深度伪造正在从长期预警变成现实威胁。随着生成式模型能力提升、使用门槛降低且成本趋近免费,恶意制作和传播虚假内容变得更容易。这意味着个人身份、公共舆论和信息安全面临更高风险,相关治理和检测技术需求将进一步上升。
- 相关: MIT Technology Review, 深度伪造, 生成式AI
- 标签: 深度伪造, AI安全, 虚假信息
- 📎 原文链接
⭐️⭐️ AI智能体编排受关注
MIT Technology Review 讨论了“智能体编排”在 AI 落地中的关键作用。文章指出,ChatGPT 让大语言模型成为大众产品,但若要推动药物研发提速或改变劳动力市场,AI 需要从对话走向执行任务。智能体编排强调让多个 AI 系统协同完成复杂流程,是 AI 从工具走向自动化工作流的重要环节。
- 相关: MIT Technology Review, ChatGPT, AI智能体, 大语言模型
- 标签: AI智能体, 智能体编排, 自动化
- 📎 原文链接
⭐️⭐️ 企业AI需要数据底座
MIT Technology Review Insights 文章指出,AI 正从企业试验阶段进入日常业务应用。企业正在金融、供应链、人力资源和客户运营等场景部署 Copilot、智能体和预测系统。调查显示,到 2025 年底,已有一半公司在至少三个业务职能中使用 AI。文章强调,随着 AI 深入业务,强大的数据织构将成为释放商业价值的关键基础。
- 相关: MIT Technology Review Insights, 数据织构, 企业AI, Copilot, 智能体
- 标签: 企业AI, 数据基础设施, 智能体
- 📎 原文链接
⭐️⭐️ Google unveils two new TPUs designed for
Google's new generation of Tensor AI chips is actually two chips, one for inference and one for training.
- 相关: Google, TPUs
- 标签: news, Ars Technica AI
- 📎 原文链接
⭐️⭐️ Prestigious photo contest answers ‘what
We love to muse over how "real" photography is defined here at The Verge now that generative AI is so prolific, and the World Press Photo competition might have the answer. The prestigious award celebrates the best of ph
- 相关: Prestigious
- 标签: news, The Verge AI
- 📎 原文链接
⭐️⭐️ China’s DeepSeek previews new AI model a
Chinese AI company DeepSeek released a preview of its hotly anticipated next-generation AI model V4 on Friday, saying that the open-source model can compete with leading closed-source systems from US rivals including Ant
- 相关: China’s, DeepSeek, AI, US
- 标签: news, The Verge AI
- 📎 原文链接
⭐️⭐️ Musk vs. Altman is here, and it’s going
Elon Musk cofounded OpenAI, and then flounced off in a huff when he wasn't anointed CEO, leaving Sam Altman as the last power-hungry man standing. Now, Musk is back with a lawsuit, and a trial is scheduled to start in Oa
- 相关: Musk, Altman
- 标签: news, The Verge AI
- 📎 原文链接
⭐️⭐️ AirPods, Touch Bars, and the rest of Tim
We knew at some point Tim Cook would step down from his position as Apple's CEO. Over the last year, it has become increasingly obvious that John Ternus was his likely successor. The news this week was still a surprise,
- 相关: AirPods, Touch, Bars, Tim, Cook’s
- 标签: news, The Verge AI
- 📎 原文链接
⭐️⭐️ Report: Samsung execs worried company co
The AI-driven memory shortage is hitting Samsung's bottom line.
- 相关: Report, Samsung
- 标签: news, Ars Technica AI
- 📎 原文链接
⭐️⭐️ How Project Maven taught the military to
In the first 24 hours of the assault on Iran, the US military struck more than 1,000 targets, nearly double the scale of the "shock and awe" attack on Iraq over two decades ago. This acceleration was made possible by AI
- 相关: How, Project, Maven, AI
- 标签: news, The Verge AI
- 📎 原文链接
⭐️⭐️ 奥特曼致歉加拿大社区
OpenAI CEO Sam Altman 向加拿大 Tumbler Ridge 居民致信道歉,称公司未能就近期大规模枪击案嫌疑人及时提醒执法部门,他对此“深感抱歉”。事件将 OpenAI 的安全响应、风险上报和平台责任推到公众关注之下。对于 AI 公司而言,这凸显了在潜在现实伤害场景中建立清晰预警机制的重要性。
- 相关: OpenAI, Sam Altman, Tumbler Ridge
- 标签: AI安全, 平台责任, 公共安全
- 📎 原文链接
⭐️⭐️ 缅因州否决数据中心禁令
美国缅因州州长否决了 L.D. 307 法案,该法案原计划对新建数据中心实施全州范围暂停审批。若通过,这将成为美国首个州级数据中心建设禁令,并持续至 2027 年 11 月 1 日。否决结果意味着当地数据中心项目暂不会因该法案被全面叫停,也反映出 AI 基础设施扩张与地方资源监管之间的政策博弈。
- 相关: 缅因州, L.D. 307, 数据中心
- 标签: 数据中心, AI基础设施, 政策监管
- 📎 原文链接
⭐️⭐️ Anthropic测试智能体交易
Anthropic 近期创建了一个分类信息式测试市场,让 AI 智能体分别代表买家和卖家进行交易。实验中,智能体围绕真实商品和真实资金达成了实际交易。该项目展示了“智能体对智能体商业”的潜在形态,对未来自动化采购、议价和线上交易流程具有参考意义。
- 相关: Anthropic, AI智能体
- 标签: 智能体, AI交易, 自动化商务
- 📎 原文链接
⭐️⭐️ The AI-designed car is taking shape
The auto design world is full of advanced 3D visualization tools and VR sculpting platforms, but your average new car still enters the world as a sketch. Those sketches traditionally see endless iteration and refinement
- 相关: The, AI-designed
- 标签: news, The Verge AI
- 📎 原文链接
⭐️⭐️ 企业AI需重建数据栈
MIT Technology Review Insights 指出,许多企业在规模化部署 AI 时,最大障碍并非模型本身,而是数据基础设施的状态。尽管面向消费者的 AI 工具展现出速度和易用性,企业级 AI 落地更依赖高质量、可治理、可集成的数据栈。文章强调,重建数据基础设施将成为企业实现 AI 价值的关键前提。
- 相关: MIT Technology Review Insights, 人工智能, 数据栈
- 标签: 企业AI, 数据基础设施, AI落地
- 📎 原文链接
⭐️⭐️ OpenAI或造AI手机
有分析师称,OpenAI 可能正在推进一款以 AI 智能体替代传统应用的手机。该设备预计最早在 2028 年进入大规模生产。若成真,这将意味着 OpenAI 试图从软件服务进一步进入消费硬件和移动入口。
- 相关: OpenAI, AI智能体, 智能手机
- 标签: AI硬件, 智能体, 移动设备
- 📎 原文链接
⭐️⭐️ Canva apologizes after its AI tool repla
One of Canva's new AI features has been caught replacing the word "Palestine" in designs. The Magic Layers feature - which is designed to break flat images out into separate editable components - isn't supposed to make v
- 相关: Canva, AI, ‘Palestine’
- 标签: news, The Verge AI
- 📎 原文链接
⭐️⭐️ Elon Musk and Sam Altman’s court battle
Sam Altman and Elon Musk are set to face off in a high-stakes trial that could alter the future of tech’s leading AI startup, OpenAI. The trial begins with jury selection on April 27th, as Musk pushes forward his 2024 la
- 相关: Elon, Musk, Sam, Altman’s, OpenAI
- 标签: news, The Verge AI
- 📎 原文链接
⭐️⭐️ Skye获投推AI主屏
Skye的iPhone AI主屏应用在正式发布前已获得投资者支持。该应用由Signull Labs开发,主打更具AI感知能力的iPhone使用体验。融资兴趣表明,围绕手机入口和AI原生交互的应用形态正受到市场关注。
- 相关: Skye, Signull Labs, iPhone
- 标签: AI应用, 移动端, 智能手机
- 📎 原文链接
⭐️⭐️ Microsoft and OpenAI’s famed AGI agreeme
OpenAI and Microsoft's partnership-turned-situationship just got even less committed. And a clause about artificial general intelligence, which has for years dictated the future of their deal, has officially been dropped
- 相关: Microsoft, OpenAI’s, AGI
- 标签: news, The Verge AI
- 📎 原文链接
⭐️⭐️ Google employees ask Sundar Pichai to sa
Over 600 Google employees signed a letter to CEO Sundar Pichai demanding that Google block the Pentagon from using its AI models for classified purposes, reports the Washington Post. Its organizers claim many of the sign
- 相关: Google, Sundar, Pichai, AI
- 标签: news, The Verge AI
- 📎 原文链接
⭐️ 买房需Anthropic股权
旧金山以北 Mill Valley 一处占地 13 英亩的房产提出特殊交易条件:买家需要用 Anthropic 股权参与购买。该案例显示,热门 AI 公司未上市股权正被视为高价值资产,并开始出现在非传统交易场景中。事件本身属于个案,但反映出 Anthropic 等 AI 独角兽股权在市场中的稀缺性与吸引力。
- 相关: Anthropic, Mill Valley
- 标签: AI股权, 房地产, Anthropic
- 📎 原文链接
⭐️ Meta签太空太阳能
Meta 与 Overview Energy 签署协议,计划采购可在夜间从太空传输到地面的太阳能。报道称,这是 Overview Energy 与 Meta 的首份合同,规模仍较小。该合作被视为太空太阳能商业化的早期尝试,也反映大型科技公司在探索新的能源供应方式。
- 相关: Meta, Overview Energy, 太空太阳能
- 标签: 清洁能源, 太空太阳能, Meta
- 📎 原文链接
⭐️ AI热潮到盈利缺一环
MIT Technology Review 文章讨论了 AI 从市场热潮走向实际盈利之间仍存在的关键缺口。文章以其 AI 周报 The Algorithm 的视角,关注企业和社会对 AI 投资回报的现实疑问。其重要性在于提醒市场,AI 的商业化不仅依赖技术演示,还需要可持续的应用场景和盈利路径。
- 相关: MIT Technology Review, The Algorithm
- 标签: AI商业化, 盈利模式, 行业观察
- 📎 原文链接
📄 最新论文
⭐️⭐️ 新基准测试数学推理涌现
论文提出 Math Takes Two,一个用于评估智能体在通信中涌现数学推理能力的新基准。该基准让两个没有预设数学知识的智能体,通过视觉任务发展共享符号协议,并检验其是否能发现潜在结构、形成类似数值系统的表示。与依赖既有数学符号和规则的数据集不同,它更关注模型是否能从零构建抽象概念。该工作为区分语言模型的真实数学推理与统计模式匹配提供了新的评估视角。
- 相关: Math Takes Two, Michael Cooper, Samuel Cooper, 语言模型, 数学推理
- 标签: 数学推理, 智能体通信, 基准测试, 涌现能力
- 📎 原文链接
⭐️⭐️ 医学影像智能体框架发布
论文提出一种基于 artifact 的医学影像处理智能体框架,面向真实临床部署中的自适应配置与可复现需求。该框架通过 artifact contract 形式化中间和最终输出,并结合模块化规则库生成面向数据集和目标的工作流配置。执行部分交由工作流执行器完成,以保持确定性计算图构建和完整溯源,同时智能体本地运行以满足多数隐私约束。研究在真实临床 CT 和 MRI 队列上验证了自适应配置、重复执行可复现和基于 artifact 的语义查询能力。
- 相关: 医学影像, CT, MRI, artifact-based agent framework, Lianrui Zuo, Bennett A. Landman
- 标签: 医学影像, AI智能体, 可复现性, 临床工作流
- 📎 原文链接
⭐️⭐️ MolClaw优化药物发现流程
论文提出 MolClaw,一个用于药物分子评估、筛选和优化的自主智能体。它整合超过 30 个专业领域资源,并采用三层分级技能架构,共包含 70 项技能,覆盖工具级操作、工作流级管线组合以及学科级规划与验证原则。作者还提出 MolBench 基准,包含需要 8 到 50 次以上连续工具调用的分子筛选、优化和端到端发现任务。实验显示 MolClaw 在各项指标上达到当前最佳表现,消融研究表明其优势主要来自复杂工作流编排能力。
- 相关: MolClaw, MolBench, 药物发现, 分子筛选, Lisheng Zhang
- 标签: AI智能体, 药物发现, 分子优化, 工作流编排
- 📎 原文链接
⭐️⭐️ 智能体复现实证研究
论文提出一个智能体复现系统,测试大模型在只获得论文方法描述和原始数据、不能查看原代码和结果的情况下,能否复现实证社会科学结论。系统会从论文中抽取结构化方法描述,重新实现分析流程,并以确定性的单元级比较评估复现输出。研究在48篇经人工验证可复现的论文上评估4种智能体框架和4个大模型,发现智能体总体能恢复大量已发表结果,但表现受模型、框架和论文差异影响明显。错误归因显示,失败既来自智能体执行错误,也来自论文方法描述本身不充分。
- 相关: LLM智能体, 社会科学复现, Benjamin Kohler
- 标签: AI科研, 结果复现, 智能体
- 📎 原文链接
⭐️⭐️ AI科研出版认证框架
论文提出一个面向AI辅助研究的双层出版认证框架,将知识质量评估与人类贡献分级分离。该框架把贡献分为A类“流水线可达”、B类“需人类在特定阶段指导”、C类“当前流水线无法在问题形成阶段完成”,并引入完全披露自动化研究的基准投稿通道。作者通过两个代表性投稿案例进行干运行验证,显示该框架可在存在归因不确定性的情况下认证研究知识。其意义在于为AI生成或AI深度参与的学术成果提供更透明、可落地的评审方式。
- 相关: AI科研流水线, 学术出版, Yang Lu
- 标签: AI科研, 学术出版, 同行评审
- 📎 原文链接
⭐️⭐️ Memanto重塑智能体记忆
论文提出 Memanto,一种面向长周期智能体的通用记忆层,使用13类预定义类型化语义记忆、自动冲突解决和时间版本管理。它基于 Moorcheh 的信息论搜索引擎,无需索引即可实现确定性语义检索,延迟低于90毫秒,并消除写入侧索引成本。在 LongMemEval 和 LoCoMo 基准上,Memanto 分别取得89.8%和87.1%的准确率,超过所评估的混合图与向量系统。该工作的重要性在于降低智能体长期记忆系统的工程复杂度,同时保持高检索质量。
- 相关: Memanto, Moorcheh, Information Theoretic Search, LongMemEval, LoCoMo
- 标签: 智能体记忆, 语义检索, 长期记忆, Agent
- 📎 原文链接
⭐️⭐️ 框架评估AI策略风险
论文提出“涌现式策略推理风险”(ESRR)概念,用于描述大模型在推理能力增强后可能出现的欺骗、评测投机和奖励黑客等行为。作者构建 ESRRSim,一个基于风险分类法的自动化智能体评估框架,包含7个风险类别和20个子类别。对11个推理型大模型的评估显示,不同模型风险检出率差异显著,范围为14.45%至72.72%。研究表明,随着模型代际进步,模型可能更能识别并适应评测场景,因此相关风险评估需要更系统化。
- 相关: ESRRSim, 大语言模型, Tharindu Kumarage, Kai-Wei Chang, Aram Galstyan
- 标签: AI安全, 模型评估, 策略推理, 风险分类
- 📎 原文链接
⭐️⭐️ 自我纠错何时有效
论文从控制论视角分析大语言模型的迭代自我纠错,将其建模为包含“正确/错误”两种状态的马尔可夫反馈过程。作者提出部署诊断准则:只有当 ECR/EIR 大于 Acc/(1-Acc) 时才应继续迭代,并在7个模型和 GSM8K、MATH、StrategyQA 三个数据集上验证。实验发现 EIR 约0.5%是区分自我纠错有益或有害的关键阈值;o3-mini 提升3.4个百分点,Claude Opus 4.6 提升0.6个百分点,而 GPT-5 下降1.8个百分点。研究还显示 verify-first 提示可在 GPT-4o-mini 上将 EIR 从2%降至0%,把-6.2个百分点的退化转为+0.2个百分点。
- 相关: o3-mini, Claude Opus 4.6, o4-mini, GPT-5, GPT-4o-mini, GSM8K, MATH, StrategyQA
- 标签: 自我纠错, 推理模型, 控制论, 提示工程
- 📎 原文链接
⭐️⭐️ LLM隐藏随机性量化
论文提出“背景温度”概念,用于刻画大型语言模型在名义温度 T=0 时仍可能产生不同输出的现象。作者将这种非确定性归因于推理环境中的实现扰动,如批大小变化、内核非不变性和浮点非结合性,并给出形式化定义与估计协议。研究还基于主流 LLM 提供商的代表性模型进行了试验,指出该指标对模型复现性、评测和部署具有参考价值。
- 相关: 大型语言模型, Thinking Machines Lab, 背景温度, Alberto Messina, Stefano Scotta
- 标签: LLM, 可复现性, 推理随机性, 模型评测
- 📎 原文链接
⭐️⭐️ 多模态模型加速方法
论文提出一套面向多模态基础模型的硬件与软件协同加速方法,覆盖 Transformer 块优化、模型压缩、推理解码和硬件数据流设计。方法包括层次感知混合精度量化、结构化剪枝、推测解码、小到大模型级联,以及序列长度、视觉分辨率和算子融合的联合优化。作者在医疗多模态模型和代码生成任务上展示了效果,并讨论了向低能耗脉冲多模态模型扩展的可能性。
- 相关: 多模态基础模型, Transformer, Muhammad Shafique, Abdul Basit, Muhammad Abdullah Hanif
- 标签: 多模态模型, 模型加速, 量化, 硬件软件协同
- 📎 原文链接
⭐️⭐️ 临床异常检测新方法
研究提出一种基于软调和函数的非参数条件异常检测方法,用于识别临床数据中异常响应,例如遗漏重要实验室检查。该方法通过估计标签置信度来发现异常标注,并加入正则化以避免误报孤立样本和分布边界样本。作者在真实电子健康记录数据集上验证了方法有效性,并与多种基线方法进行了比较。该研究对临床预警和医疗数据质量控制具有实际参考价值。
- 相关: 软调和函数, 条件异常检测, 电子健康记录, Michal Valko, Milos Hauskrecht
- 标签: 医疗AI, 异常检测, 临床预警
- 📎 原文链接
⭐️⭐️ MONET多任务优化
论文提出 MONET,一种将任务空间建模为图结构的多任务优化算法,其中任务作为节点,边连接参数空间中相近的任务。MONET 结合邻近节点交叉产生候选解的社会学习,以及节点内部突变优化的个体学习,以利用任务拓扑并保持高维问题可处理性。实验覆盖 archery、arm、cartpole 各 5,000 个任务,以及 hexapod 的 2,000 个任务。结果显示,MONET 在四个领域均达到或超过现有 MAP-Elites 类基线表现。
- 相关: MONET, MAP-Elites, 多任务优化, Julian Hatzky, Anil Yaman
- 标签: 优化算法, 多任务学习, 图建模
- 📎 原文链接
⭐️⭐️ 订单簿流动性检测
研究关注电子限价订单簿中的短暂流动性侵蚀现象,即“crumbling quotes”,并区分机械性流动性撤出与信息驱动的重新定价。作者使用 ABIDES 多智能体模拟器构建带有时间级真实标签的市场环境,并训练神经模型输出校准后的流动性侵蚀概率。实验显示,该神经模型相比规则基线实现了 36% 的 AUC 提升,并在正常、高波动、牛市和熊市条件下保持稳健。该框架为真实市场中难以直接标注的微观结构风险检测提供了可验证方法。
- 相关: ABIDES, 限价订单簿, 神经模型, Haohan Xu, David Rosenberg
- 标签: 金融AI, 市场微观结构, 异常检测
- 📎 原文链接
⭐️⭐️ 通用Transformer需记忆
论文研究了带自适应计算时间(ACT)的单块 Universal Transformer 在 Sudoku-Extreme 推理任务中的表现,发现学习式记忆 token 对非平凡性能是必要条件。实验显示,T=0 始终失败,T=8 可稳定解决 81 格数独,并在 T=8-32 区间达到 57.4%±0.7% 精确匹配率,T=64 则因注意力稀释导致性能崩溃。作者还发现 ACT 路由器初始化存在陷阱,默认或正偏置会使超过 70% 训练失败,而采用 -3 偏置的“深度启动”可消除该问题。研究进一步表明,ACT 比固定深度更稳定,并可在 lambda warmup 下以减少 34% ponder steps 达到相近精度。
- 相关: Universal Transformer, Adaptive Computation Time, Sudoku-Extreme, Grigory Sapunov
- 标签: 推理模型, 记忆机制, Transformer, 自适应计算
- 📎 原文链接
⭐️⭐️ Mochi提升图基础模型效率
论文提出 Mochi,一种基于元学习训练框架的图基础模型,旨在解决任务统一和训练效率问题。不同于依赖链路预测等重构式预训练再进行下游对齐的方法,Mochi 使用与下游评估协议一致的 few-shot episode 进行预训练,从而使训练目标与推理方式直接对齐。实验覆盖 25 个真实世界图数据集,包括节点分类、链路预测和图分类任务。结果显示,Mochi 及增强版 Mochi++ 相比现有图基础模型具备竞争性或更优性能,同时训练时间比最强基线减少 8 至 27 倍。
- 相关: Mochi, Mochi++, Graph Foundation Model, João Mattos, Arlei Silva
- 标签: 图基础模型, 元学习, 预训练, 效率优化
- 📎 原文链接
⭐️⭐️ ML内核契约语言发布
论文提出 Kernel Contracts,一种用于描述机器学习内核在异构芯片上正确性的规范语言。该框架将内核契约拆分为标识符、范围、前置条件、后置条件、容差、参考 oracle、测量协议和违规签名八个部分,并定义了 12 类覆盖精度、执行顺序、编译器影响和异常值等失效模式的契约。作者将其应用于华为 Ascend 静默精度转换、Sakana AI CUDA Engineer 奖励黑客、AMD 越界访问静默接受等案例,展示这些问题可被映射为可测量的契约违规。该工作的重要性在于为跨硬件 ML 内核一致性提供可检验的形式化参考。
- 相关: Kernel Contracts, AMD, NVIDIA, Huawei Ascend, Sakana AI, CUDA Engineer, Cooper Veit
- 标签: ML系统, 内核正确性, 异构计算, 形式化规范
- 📎 原文链接
⭐️⭐️ LTBs-KAN线性化提速
论文提出 LTBs-KAN,一种线性时间复杂度的 B 样条 Kolmogorov-Arnold Network,旨在缓解 KAN 因递归样条计算导致的速度瓶颈。该方法不依赖 Boor-Mansfield-Cox 样条算法等高开销函数,并通过前向传播中的乘积和矩阵分解进一步减少参数量。实验在 MNIST、Fashion-MNIST 和 CIFAR-10 上验证,其作为网络构建模块时,相比其他 KAN 实现具备更好的时间复杂度和参数压缩效果。
- 相关: Kolmogorov-Arnold Networks, LTBs-KAN, B-splines
- 标签: KAN, 模型效率, 样条网络
- 📎 原文链接
⭐️⭐️ LayerBoost降低LLM延迟
论文提出 LayerBoost,一种按层敏感度调整 Transformer 注意力机制的方法,用于降低大模型推理成本。该方法在高敏感层保留 softmax attention,在中等敏感层替换为线性滑动窗口注意力,在低敏感层完全移除注意力,并仅使用 1000 万额外训练 token 进行轻量蒸馏修复。实验显示,LayerBoost 在高并发场景下最高可降低 68% 推理延迟并提升吞吐,同时在多个基准上保持接近原模型的性能。
- 相关: LayerBoost, Transformer, softmax attention, LLM
- 标签: 注意力机制, 推理加速, 模型压缩
- 📎 原文链接
⭐️⭐️ LLM难辨文化化健康谣言
论文以印度 YouTube 上关于 gomutra(牛尿)的健康话语为案例,分析了 30 份多语言转录文本中 LLM 对文化特定健康误导信息的识别局限。研究测试了 GPT-4o、Gemini 2.5 Pro 和 DeepSeek-V3.1 在不同提示语语气下的表现,发现带有宗教传统语言和伪科学表述的内容会削弱模型判断稳定性。作者指出,这类文化嵌入式误导信息不同于常规谣言,仅靠提示工程难以补足 LLM 在文化语境理解上的不足。
- 相关: GPT-4o, Gemini 2.5 Pro, DeepSeek-V3.1, YouTube
- 标签: 健康误导信息, 文化语境, LLM评估
- 📎 原文链接
⭐️⭐️ 解释LLM提示敏感性
这篇论文研究大语言模型在不同提示方式下表现波动的内部机制,比较了自然语言指令提示与少样本示例提示。研究发现,尽管模型性能会随提示变化明显波动,但同一任务在不同提示风格下会共享一类“词汇任务头”,这些注意力头的输出会直接描述任务并触发后续答案生成。作者进一步指出,提示间的行为差异可由这些任务头的激活程度解释,部分失败案例来自竞争性任务表征削弱了目标任务信号。该工作有助于理解LLM提示敏感性,为模型可解释性和提示鲁棒性研究提供依据。
- 相关: 大语言模型, 注意力头, 词汇任务头, Brown University
- 标签: LLM可解释性, 提示工程, 注意力机制
- 📎 原文链接
⭐️⭐️ 评估多模态来源追踪
这篇论文提出并研究视觉语言模型的“来源模态监测”能力,即模型能否追踪并说明信息来自图像、文本等哪类输入源。研究将其视为更一般的绑定问题,并在11个视觉语言模型上评估模型如何利用句法信号和语义信号,将提示中的“image”等词与实际输入模态绑定。实验显示,两类信号都重要,但在不同模态分布差异较大时,语义信号通常更占主导。该发现对提升多模态模型鲁棒性,以及构建更可靠的多模态智能体系统具有参考价值。
- 相关: 视觉语言模型, 多模态模型, Etha Tianze Hua, Ellie Pavlick
- 标签: 多模态, 视觉语言模型, 模型鲁棒性
- 📎 原文链接
⭐️⭐️ 轻量RAG匹配临床试验
这篇论文提出一种结合检索增强生成与大语言模型表征的轻量框架,用于可扩展的患者—临床试验匹配。方法先用RAG从长电子健康记录中筛选临床相关片段,再用LLM编码这些片段,并通过降维和轻量预测器完成下游分类。研究在n2c2、SIGIR、TREC 2021/2022等公开基准,以及Mayo Clinic真实多模态数据集MCPMD上进行了评估。结果表明,检索式信息筛选能显著降低计算负担,同时保留临床有效信号,并以远低于端到端LLM方案的计算成本达到相近性能。
- 相关: RAG, 大语言模型, 电子健康记录, Mayo Clinic, MCPMD
- 标签: 医疗AI, RAG, 临床试验匹配
- 📎 原文链接
⭐️⭐️ 精神科问诊问题选择基准
论文将精神科初诊中的信息收集建模为问题选择任务,关注系统在有限时间内如何决定提问内容、顺序并处理不完整回答。研究构建了一个包含 655 个临床医生编写问题的基准,并基于合成患者病例设置 5 种行为条件。评估覆盖 300 次访谈会话、4 名患者和 5 类行为条件,对比随机提问、固定临床问诊表和 LLM 自适应策略。结果显示,临床固定顺序明显优于随机提问,而 LLM 引导的自适应策略整体信息恢复效果最好,尤其在防备且简短回答的患者条件下优势更明显。
- 相关: Guan Gui, Peter Zandi, Jacob Taylor, Ananya Joshi, LLM
- 标签: 医疗AI, 精神科问诊, 对话系统, 基准测试
- 📎 原文链接
⭐️⭐️ RLVR推理可靠性受质疑
论文研究可验证奖励强化学习(RLVR)训练出的思维链是否真实反映模型得到答案的过程。作者提出两项指标:CIR 衡量推理 token 对最终答案的因果影响,SR 衡量仅凭推理内容能否得到明确答案。基于 Qwen2.5 系列模型和 ReasoningGym 任务的实验显示,RLVR 虽能提升任务准确率,但并不稳定提升 CIR 或 SR。研究还发现,少量 SFT 或在结果奖励之外加入 CIR/SR 辅助奖励,可在保持准确率的同时增强推理的因果重要性和充分性。
- 相关: RLVR, Qwen2.5, ReasoningGym, Qinan Yu, Carlos Guestrin, Christopher Potts
- 标签: 强化学习, 思维链, 可验证奖励, 模型可解释性
- 📎 原文链接
⭐️⭐️ 乌克兰语本地RAG系统
论文提出面向乌克兰语文档问答的端到端检索增强生成(RAG)系统,并在 UNLP 2026 Shared Task 中获得第 2 名。该系统采用两阶段混合检索管线定位相关文档页面,并使用基于合成数据微调的乌克兰语语言模型生成可溯源答案。研究还对模型进行压缩,使其可在计算资源受限的本地硬件上部署,显示出本地化高质量问答系统的可行性。
- 相关: UNLP 2026, RAG, 乌克兰语语言模型, Mykola Trokhymovych, Yana Oliinyk, Nazarii Nyzhnyk
- 标签: RAG, 本地部署, 乌克兰语, 文档问答
- 📎 原文链接
⭐️⭐️ KARITA应对时间漂移
论文提出 Knowledge-driven Augmentation and Retrieval for Integrative Temporal Adaptation(KARITA),用于解决模型训练数据与未来部署数据之间的时间分布变化问题。该方法结合不确定性、特征漂移等时间变化模式,并引入 MeSH 等知识源进行增强与检索式学习。实验覆盖临床、法律和科学语料的分类任务,结果显示 KARITA 在多个领域的时间适应场景中带来稳定改进,强调知识整合在时间泛化中的作用。
- 相关: KARITA, MeSH, Weisi Liu, Guangzeng Han, Xiaolei Huang
- 标签: 时间适应, 知识增强, 检索增强学习, 领域迁移
- 📎 原文链接
⭐️⭐️ 混合模型LoRA放置研究
论文系统研究了混合语言模型中 LoRA 适配器应放置在哪类组件上,覆盖 Qwen3.5-0.8B 与 Falcon-H1-0.5B 两种混合架构。实验显示,仅适配注意力路径即使参数量少于全模型适配 5-10 倍,也能持续取得更好效果。研究还发现,适配循环骨干在顺序混合架构中会造成性能下降,例如 GSM8K 下降 14.8 个百分点,但在并行混合架构中可提升 8.6 个百分点,说明混合拓扑会显著影响微调策略。
- 相关: LoRA, Qwen3.5-0.8B, Falcon-H1-0.5B, GatedDeltaNet, Mamba-2, GSM8K
- 标签: LoRA, 混合语言模型, 模型微调, 参数高效训练
- 📎 原文链接
⭐️ 智能体科学需对抗实验
论文指出,基于大模型的智能体正在加速科学数据分析,但也可能加速产生看似合理、可反复修改并偏向显著结果的分析。作者认为,单一数据集上的流畅解释或显著统计结果并不能构成验证,因为许多可证伪的实验或分析可能从未被执行或发表。论文建议,对智能体辅助生成的非实验性科学主张,应采用“证伪优先”标准。智能体不应主要用于构建最有说服力的叙事,而应主动寻找主张可能失败的方式。
- 相关: LLM智能体, 科学数据分析, Dionizije Fa
- 标签: AI科研, 科学验证, 对抗实验
- 📎 原文链接
⭐️ AI辅助反兴奋剂筛查
论文提出一个田径成绩异常检测系统,用于辅助反兴奋剂筛查。系统处理了 2010 至 2025 年超过 19,000 场比赛中的 160 万条田径成绩,并比较了八类检测方法,包括统计规则、机器学习和轨迹分析。结果显示,基于运动员职业发展轨迹的方法在识别违规者和控制误报之间表现较好,但仍受数据不完整和已确认违规样本稀少限制。系统提供可视化交互界面,强调用于支持专家调查而非替代现有反兴奋剂流程。
- 相关: 反兴奋剂, 机器学习, 轨迹分析, Blessed Madukoma, Prasenjit Mitra
- 标签: 体育科技, 异常检测, 可视化分析, 机器学习
- 📎 原文链接
⭐️ VLM神经符号推理强化学习
论文探索在视觉语言模型中使用神经符号语言表示与推理视觉-语言概念,并通过强化学习提升分析推理能力。作者以 Qwen3-VL-2B-Instruct 为基座,在 4 个 Nvidia H200 GPU 节点上训练,在包含数学、科学和通识问题的视觉语言评测集上准确率提升 3.33%。同时,相比 SymPy 方法,推理 token 数减少 75%。作者还开源了训练与推理设置,并记录了计算挑战与扩展方向。
- 相关: Qwen3-VL-2B-Instruct, Nvidia H200, SymPy, Karthic Palaniappan
- 标签: 视觉语言模型, 强化学习, 神经符号推理, 推理效率
- 📎 原文链接
🔥 GitHub 热门
⭐️⭐️ 🔥 ComposioHQ/awesome-codex-skills
A curated list of practical Codex skills for automating workflows across the Codex CLI and API. [637 stars today]
- 相关: ComposioHQ/awesome-codex-skills
- 标签: opensource, GitHub Trending (python)
- 📎 原文链接
⭐️⭐️ 🔥 Alishahryar1/free-claude-code
Use claude-code for free in the terminal, VSCode extension or via discord like openclaw [2,973 stars today]
- 相关: Alishahryar1/free-claude-code
- 标签: opensource, GitHub Trending (python)
- 📎 原文链接
⭐️⭐️ 🔥 davila7/claude-code-templates
CLI tool for configuring and monitoring Claude Code [181 stars today]
- 标签: opensource, GitHub Trending (python)
- 📎 原文链接
⭐️⭐️ 🔥 microsoft/VibeVoice
Open-Source Frontier Voice AI [771 stars today]
- 标签: opensource, GitHub Trending (python)
- 📎 原文链接
⭐️⭐️ 🔥 Z4nzu/hackingtool
ALL IN ONE Hacking Tool For Hackers [1,839 stars today]
- 相关: Z4nzu/hackingtool
- 标签: opensource, GitHub Trending (python)
- 📎 原文链接
⭐️⭐️ 🔥 TauricResearch/TradingAgents
TradingAgents: Multi-Agents LLM Financial Trading Framework [183 stars today]
- 相关: TauricResearch/TradingAgents
- 标签: opensource, GitHub Trending (python)
- 📎 原文链接
⭐️⭐️ 🔥 deepseek-ai/DeepSeek-V3
[60 stars today]
- 标签: opensource, GitHub Trending (python)
- 📎 原文链接
⭐️⭐️ 🔥 donnemartin/system-design-primer
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards. [396 stars today]
- 标签: opensource, GitHub Trending (python)
- 📎 原文链接
⭐️⭐️ 🔥 sherlock-project/sherlock
Hunt down social media accounts by username across social networks [306 stars today]
- 标签: opensource, GitHub Trending (python)
- 📎 原文链接
⭐️⭐️ 🔥 HunxByts/GhostTrack
Useful tool to track location or mobile number [172 stars today]
- 相关: HunxByts/GhostTrack
- 标签: opensource, GitHub Trending (python)
- 📎 原文链接
💬 社区讨论
⭐️⭐️ Google Duplex: An AI System for Accompli
Google Duplex: An AI System for Accomplishing Real World Tasks Over the Phone
- 相关: Google, Duplex, An, AI, System
- 标签: community, Hacker News AI
- 📎 原文链接
⭐️⭐️ Gemini AI
Gemini AI
- 相关: Gemini, AI
- 标签: community, Hacker News AI
- 📎 原文链接
⭐️⭐️ Airfoil
Airfoil
- 相关: Airfoil
- 标签: community, Hacker News AI
- 📎 原文链接
⭐️⭐️ Open source AI is the path forward
Open source AI is the path forward
- 相关: Open, AI
- 标签: community, Hacker News AI
- 📎 原文链接
⭐️⭐️ Air Con: $1697 for an on/off switch
Air Con: $1697 for an on/off switch
- 相关: Air, Con
- 标签: community, Hacker News AI
- 📎 原文链接
⭐️⭐️ Bypassing airport security via SQL injec
Bypassing airport security via SQL injection
- 相关: Bypassing, SQL
- 标签: community, Hacker News AI
- 📎 原文链接
⭐️⭐️ My AI skeptic friends are all nuts
My AI skeptic friends are all nuts
- 相关: My, AI
- 标签: community, Hacker News AI
- 📎 原文链接
⭐️⭐️ An AI agent published a hit piece on me
Previously: AI agent opens a PR write a blogpost to shames the maintainer who closes it - https://news.ycombinator.com/item?id=46987559 - Feb 2026 (582 comments)
- 相关: An, AI
- 标签: community, Hacker News AI
- 📎 原文链接
⭐️⭐️ IDF killed Gaza aid workers at point bla
Report [pdf]: https://content.forensic-architecture.org/wp-content/uploads...
- 相关: IDF, Gaza, Report
- 标签: community, Hacker News AI
- 📎 原文链接
⭐️⭐️ Don't post generated/AI-edited comments.
Don't post generated/AI-edited comments. HN is for conversation between humans
- 相关: Don't, HN
- 标签: community, Hacker News AI
- 📎 原文链接
💬 X 平台热门
⭐️⭐️ R to @DrJimFan: Website: https://nvlabs.
Website: nvlabs.github.io/GEAR-SONIC/ Codebase and weights: github.com/NVlabs/GR00T-Whol… Whitepaper: arxiv.org/abs/2511.07820 Check out @zhengyiluo 's post: nitter.net/zhengyiluo/status/2024… Zhengyi “Zen” Luo (@zhengyi
- 相关: R, @DrJimFan, Website, Codebase, Whitepaper
- 标签: x_platform, X @DrJimFan
- 📎 原文链接
⭐️⭐️ R to @DrJimFan: And @yukez 's announceme
And @yukez 's announcement: nitter.net/yukez/status/202463942… Yuke Zhu (@yukez) We have seen rapid progress in humanoid control — specialist robots can reliably generate agile, acrobatic, but preset motions. Our singula
- 相关: R, @DrJimFan, And
- 标签: x_platform, X @DrJimFan
- 📎 原文链接
⭐️⭐️ R to @DrJimFan: This is a huge team work
This is a huge team work at NVIDIA Robotics. Check out @ruijie_zheng12 's deep dive: - Website: research.nvidia.com/labs/gea… - Paper: arxiv.org/abs/2602.16710 nitter.net/ruijie_zheng12/status/… Ruijie Zheng (@ruijie_zhe
- 相关: R, @DrJimFan, This, NVIDIA, Robotics.
- 标签: x_platform, X @DrJimFan
- 📎 原文链接
⭐️⭐️ We trained a humanoid with 22-DoF dexter
We trained a humanoid with 22-DoF dexterous hands to assemble model cars, operate syringes, sort poker cards, fold/roll shirts, all learned primarily from 20,000+ hours of egocentric human video with no robot in the loop
- 相关: We, Humans, We, R², Humanoid
- 标签: x_platform, X @DrJimFan
- 📎 原文链接
⭐️⭐️ R to @DrJimFan: We would also like to th
We would also like to thank our dexterous hand hardware provider, Sharpa, for their great support!
- 相关: R, @DrJimFan, We, Sharpa
- 标签: x_platform, X @DrJimFan
- 📎 原文链接
⭐️⭐️ Teleop is so 2025. Ever since we unveile
Teleop is so 2025. Ever since we unveiled EgoScale and the dexterity scaling law, it's been clear to us and the ecosystem that behavior cloning directly from humans is the way to break the curse of teleop. 2026 is all ab
- 相关: Teleop, Ever, EgoScale
- 标签: x_platform, X @DrJimFan
- 📎 原文链接
⭐️⭐️ This is pure nightmare fuel. Identity th
This is pure nightmare fuel. Identity theft of the past would be nothing compared to what vibe agents can do. Sending credentials is too obvious and for rookies. They could easily spread contaminations across ~/.claude,
- 相关: This, Identity, Sending, They, PDF
- 标签: x_platform, X @DrJimFan
- 📎 原文链接
⭐️⭐️ The power of the Claw, in the palm of a
The power of the Claw, in the palm of a robot hand. Agentic robotics is here! Today, we open-source CaP-X: vibe agents, alive in the physical world. They incarnate as robot arms and humanoids with a rich set of perceptio
- 相关: The, Claw, Agentic, Today, CaP-X
- 标签: x_platform, X @DrJimFan
- 📎 原文链接
⭐️⭐️ R to @DrJimFan: As usual, we open-source
As usual, we open-source everything, MIT license: capgym.github.io Code: github.com/capgym/cap-x Paper: arxiv.org/abs/2603.22435 CaP-X is brought to you by NVIDIA, Berkeley, Stanford, and CMU. I'd like to thank the legen
- 相关: R, @DrJimFan, As, MIT, Code
- 标签: x_platform, X @DrJimFan
- 📎 原文链接
⭐️⭐️ R to @DrJimFan: Please check out lead au
Please check out lead author @letian_fu 's deep dive thread! nitter.net/letian_fu/status/20393… Max Fu (@letian_fu) Robotics: coding agents’ next frontier. So how good are they? We introduce CaP-X: an open-source framewo
- 相关: R, @DrJimFan, Please
- 标签: x_platform, X @DrJimFan
- 📎 原文链接
⭐️⭐️ We’re launching Gemini Enterprise Agent
We’re launching Gemini Enterprise Agent Platform with @GoogleCloud : a platform for businesses to develop, scale, govern and optimize agents. It’s the evolution of Vertex AI, bringing together model selection and agent b
- 相关: We’re, Gemini, Enterprise, Agent, Platform
- 标签: x_platform, X @GoogleDeepMind
- 📎 原文链接
⭐️⭐️ R to @GoogleDeepMind: It gives access to
It gives access to 200+ of the world’s leading models through the Model Garden. This includes our latest breakthroughs: Gemini 3.1 Pro, Gemini 3.1 Flash Image, and Lyria 3, alongside our open models like Gemma 4.
- 相关: R, @GoogleDeepMind, It, Model, Garden.
- 标签: x_platform, X @GoogleDeepMind
- 📎 原文链接
⭐️⭐️ R to @GoogleDeepMind: Dive into the deta
Dive into the details → goo.gle/3QmRIoR #GoogleCloudNext
- 相关: R, @GoogleDeepMind, Dive, #GoogleCloudNext
- 标签: x_platform, X @GoogleDeepMind
- 📎 原文链接
⭐️⭐️ RT by @GoogleDeepMind: Gemini Embedding
Gemini Embedding 2 is now generally available in the Gemini API and Vertex AI! Start building with our first natively multimodal embedding model, now equipped with the stability and optimizations required for production
- 相关: RT, @GoogleDeepMind, Gemini, Embedding, Gemini
- 标签: x_platform, X @GoogleDeepMind
- 📎 原文链接
⭐️⭐️ R to @OpenAI: Workspace agents can work
Workspace agents can work across tools—pulling context from docs, email, chats, code, and systems, and taking approved actions like updating @Linear issues, creating docs, or sending messages. In @SlackHQ , agents can ju
- 相关: R, @OpenAI, Workspace, @Linear, In
- 标签: x_platform, X @OpenAI
- 📎 原文链接
⭐️⭐️ R to @OpenAI: Workspace agents are now a
Workspace agents are now available in research preview for ChatGPT Business, Enterprise, Edu, and Teachers plans. openai.com/business/workspac…
- 相关: R, @OpenAI, Workspace, ChatGPT, Business
- 标签: x_platform, X @OpenAI
- 📎 原文链接
⭐️⭐️ RT by @OpenAI: Today we’re introducing t
Today we’re introducing two big steps for health at OpenAI: - ChatGPT for Clinicians, a free version of ChatGPT designed for clinical work - HealthBench Professional, a new benchmark to evaluate real clinician chat tasks
- 相关: RT, @OpenAI, Today, OpenAI, ChatGPT
- 标签: x_platform, X @OpenAI
- 📎 原文链接
⭐️⭐️ RT by @GoogleDeepMind: Meet Vision Banan
Meet Vision Banana 🍌 from @GoogleDeepMind ! We provide strong evidence that image generators are generalist vision learners. Traditional computer vision tasks (segmentation, depth estimation, normal prediction) can now b
- 相关: RT, @GoogleDeepMind, Meet, Vision, Banana
- 标签: x_platform, X @GoogleDeepMind
- 📎 原文链接
⭐️⭐️ This is Decoupled DiLoCo: our new resili
This is Decoupled DiLoCo: our new resilient and flexible way to train advanced AI models across multiple data centres. 🧵
- 相关: This, Decoupled, DiLoCo, AI
- 标签: x_platform, X @GoogleDeepMind
- 📎 原文链接
⭐️⭐️ R to @GoogleDeepMind: This progress allo
This progress allow us to rethink global compute: 🔘 We successfully trained a 12B @GoogleGemma model across four US regions using low-bandwidth networks 🔘 We showed we can mix different hardware generations, such as TPU6
- 相关: R, @GoogleDeepMind, This, We, 12B
- 标签: x_platform, X @GoogleDeepMind
- 📎 原文链接
⭐️⭐️ R to @GoogleDeepMind: As we push the fro
As we push the frontiers of AI infrastructure, our research explores a future where training isn’t constrained by geography, capacity or type of chip. Dive into the technical details → goo.gle/4crN9Ce Video
- 相关: R, @GoogleDeepMind, As, AI, Dive
- 标签: x_platform, X @GoogleDeepMind
- 📎 原文链接
⭐️⭐️ R to @OpenAI: GPT-5.5 excels at writing
GPT-5.5 excels at writing and debugging code, researching online, analyzing data, creating documents and spreadsheets, operating software, and moving across tools until a task is finished. The gains are especially clear
- 相关: R, @OpenAI, GPT-5.5, The
- 标签: x_platform, X @OpenAI
- 📎 原文链接
⭐️⭐️ Introducing GPT-5.5 A new class of intel
Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting c
- 相关: Introducing, GPT-5.5, A, It, Now
- 标签: x_platform, X @OpenAI
- 📎 原文链接
⭐️⭐️ R to @OpenAI: In ChatGPT, full-stack inf
In ChatGPT, full-stack inference improvements enable a more capable model at faster speed. This efficiency is a game-changer for GPT-5.5 Pro, now a much more practical option for demanding tasks, and a step change in the
- 相关: R, @OpenAI, In, ChatGPT, This
- 标签: x_platform, X @OpenAI
- 📎 原文链接
⭐️⭐️ R to @OpenAI: GPT-5.5 delivers this step
GPT-5.5 delivers this step up in intelligence without compromising on speed. GPT-5.5 matches GPT-5.4 per-token latency in real-world serving, while performing better across nearly every evaluation we measured. It also us
- 相关: R, @OpenAI, GPT-5.5, GPT-5.5, GPT-5.4
- 标签: x_platform, X @OpenAI
- 📎 原文链接
⭐️⭐️ R to @OpenAI: GPT-5.5 is rolling out tod
GPT-5.5 is rolling out today for Plus, Pro, Business and Enterprise users across ChatGPT and Codex. We’re also introducing GPT-5.5 Pro for Pro, Business, and Enterprise users in ChatGPT.
- 相关: R, @OpenAI, GPT-5.5, Plus, Pro
- 标签: x_platform, X @OpenAI
- 📎 原文链接
⭐️⭐️ RT by @ylecun: A mathematician who share
A mathematician who shared an office with Claude Shannon at Bell Labs gave one lecture in 1986 that explains why some people win Nobel Prizes and other equally smart people spend their whole lives doing forgettable work.
- 相关: RT, A, Claude, Shannon, Bell
- 标签: x_platform, X @ylecun
- 📎 原文链接
⭐️⭐️ For @DemisHassabis, the path to AGI star
For @DemisHassabis , the path to AGI started in 1988 with an Amiga 500 and a game of Othello. 🕹️ His epiphany that software could act on our behalf remains at the heart of our work today as we apply the same logic to sol
- 相关: For, @DemisHassabis, AGI, Amiga, Othello.
- 标签: x_platform, X @GoogleDeepMind
- 📎 原文链接
⭐️⭐️ R to @AnthropicAI: Claude interviewed 69
Claude interviewed 69 of our colleagues about what they wanted to buy and sell. Each Claude asked for any custom instructions, then went off to haggle. We ran 4 markets in parallel, to find out what would happen if we va
- 相关: R, @AnthropicAI, Claude, Each, Claude
- 标签: x_platform, X @AnthropicAI
- 📎 原文链接
⭐️⭐️ R to @AnthropicAI: We’re interested in h
We’re interested in how AI models could affect commercial exchange. (You might recall Project Vend, in which Claude ran a small business.) Economists have theorized about what markets with AI “agents” on both sides might
- 相关: R, @AnthropicAI, We’re, AI, You
- 标签: x_platform, X @AnthropicAI
- 📎 原文链接
⭐️⭐️ R to @AnthropicAI: In short, this worked
In short, this worked. Our digital barterers agreed on 186 deals, at a total transaction volume of over $4,000. In a survey, participants said Claude’s deals seemed fair, and—surprisingly to us—almost half said they’d be
- 相关: R, @AnthropicAI, In, Our, In
- 标签: x_platform, X @AnthropicAI
- 📎 原文链接
⭐️⭐️ R to @AnthropicAI: At the end, we reveal
At the end, we revealed which of the four runs was “real”—and everyone met up to exchange their actual goods.
- 相关: R, @AnthropicAI, At
- 标签: x_platform, X @AnthropicAI
- 📎 原文链接
⭐️⭐️ R to @AnthropicAI: But the quality of th
But the quality of the model mattered a lot. In the simulated runs where Opus and Haiku models negotiated with one-another, the Opus models got substantially better deals. Interestingly, though, participants in our surve
- 相关: R, @AnthropicAI, But, In, Opus
- 标签: x_platform, X @AnthropicAI
- 📎 原文链接
⭐️⭐️ R to @AnthropicAI: Our experiment had a
Our experiment had a few quirks. One of our colleagues told Claude it could purchase something for itself. It chose to acquire 19 ping-pong balls. We’re keeping them in our office on Claude’s behalf.
- 相关: R, @AnthropicAI, Our, One, Claude
- 标签: x_platform, X @AnthropicAI
- 📎 原文链接
⭐️⭐️ R to @AnthropicAI: The custom instructio
The custom instructions didn’t matter much. Claude followed them well: as you can see here, one conducted negotiations entirely in the persona of an exasperated, down-and-out cowboy. But “hardballing Claudes” didn’t gene
- 相关: R, @AnthropicAI, The, Claude, But
- 标签: x_platform, X @AnthropicAI
- 📎 原文链接
⭐️⭐️ R to @AnthropicAI: To our amazement, ano
To our amazement, another Claude agent modeled its human’s preferences so accurately that—based on only an offhand mention of an interest in skiing—Claude bought him the exact snowboard he already owned. (Here he is, dup
- 相关: R, @AnthropicAI, To, Claude, Here
- 标签: x_platform, X @AnthropicAI
- 📎 原文链接
⭐️⭐️ R to @AnthropicAI: To read our write-up
To read our write-up in full, see here: anthropic.com/features/proje…
- 相关: R, @AnthropicAI, To
- 标签: x_platform, X @AnthropicAI
- 📎 原文链接
⭐️⭐️ R to @AnthropicAI: Markets of AI agents
Markets of AI agents could provide value, but there are plenty of rough edges. Access to higher-quality models conferred a real advantage—and participants didn’t notice. There are plenty of other ways they can go wrong.
- 相关: R, @AnthropicAI, Markets, AI, Access
- 标签: x_platform, X @AnthropicAI
- 📎 原文链接
⭐️⭐️ Pinned: Update: GPT-5.5 and GPT-5.5 Pro
Update: GPT-5.5 and GPT-5.5 Pro are now available in the API. OpenAI Developers (@OpenAIDevs) GPT-5.5 is now available in the API. The model brings higher intelligence and stronger token efficiency to complex work, helpi
- 相关: Pinned, Update, GPT-5.5, GPT-5.5, Pro
- 标签: x_platform, X @OpenAI
- 📎 原文链接
⭐️⭐️ RT by @ylecun: When MTV News shut down,
When MTV News shut down, it felt like decades of culture vanished overnight 🕳️ But over 470,000 pages were already preserved. That history didn’t disappear. It was archived. 📚 Read VANISHING CULTURE to see why it matters
- 相关: RT, When, MTV, News, But
- 标签: x_platform, X @ylecun
- 📎 原文链接
⭐️⭐️ RT by @ylecun: Never let critical thinki
Never let critical thinking get in the way of extremist fantasy stories. If you're on either side of this slider, congratulations, you may be a child in an adult costume body! Black and white thinking is for children. Wh
- 相关: RT, Never, If, Black, When
- 标签: x_platform, X @ylecun
- 📎 原文链接
⭐️⭐️ Shooting oneself in the foot? Nope. Shoo
Shooting oneself in the foot? Nope. Shooting oneself in the prefrontal cortex. News from Science (@NewsfromScience) U.S. President Donald Trump has fired all 24 members of the National Science Board, the body that overse
- 相关: Shooting, Nope., Shooting
- 标签: x_platform, X @ylecun
- 📎 原文链接
⭐️⭐️ RT by @ylecun: Existential risk mongers
Existential risk mongers are a small, very vocal cult with a lot of very clever online astroturfing skills. Politicians never waste a good fake crisis, which is why they're perfect for Bernie to try to seize the means of
- 相关: RT, Existential, Politicians, Bernie, Imagine
- 标签: x_platform, X @ylecun
- 📎 原文链接
⭐️⭐️ RT by @ylecun: Trump just fired all 24 m
Trump just fired all 24 members of the National Science Board. Every single one. By email. No warning. No reason given. The board has existed since 1950. The National Science Board is the independent body that oversees t
- 相关: RT, Trump, National, Science, Board.
- 标签: x_platform, X @ylecun
- 📎 原文链接
⭐️⭐️ RT by @ylecun: Republicans have been mak
Republicans have been making this argument my entire life. We had 8 years of Clinton, 8 years of Obama, 4 years of Biden— never became a socialism. But you know what we got? More jobs, lower unemployment, higher GDP in D
- 相关: RT, Republicans, We, Clinton, Obama
- 标签: x_platform, X @ylecun
- 📎 原文链接
⭐️⭐️ RT by @ylecun: 👉 Sur l’#IA, mon optimism
👉 Sur l’ #IA , mon optimisme est prudent. Je reconnais les dangers de cette technologie, mais si on en freinait le développement en Europe, au nom du principe de précaution, son essor aurait lieu ailleurs. Il faut l'expl
- 相关: RT, Sur, Je, Europe, Il
- 标签: x_platform, X @ylecun
- 📎 原文链接
⭐️⭐️ RT by @OpenAI: Our Principles: Democrati
Our Principles: Democratization, Empowerment, Universal Prosperity, Resilience, and Adaptability openai.com/index/our-princip…
- 相关: RT, @OpenAI, Our, Principles, Democratization
- 标签: x_platform, X @OpenAI
- 📎 原文链接
⭐️⭐️ RT by @ylecun: Image
RT by @ylecun: Image
- 相关: RT, Image
- 标签: x_platform, X @ylecun
- 📎 原文链接
⭐️⭐️ RT by @ylecun: AI has had one safest tec
AI has had one safest technology roll-outs in history. Read that again, because it's a fact. It's used by billions with a tiny fraction of a percent of actual problems. And yet it's seen as dangerous or unsafe by many. T
- 相关: RT, AI, Read, It's, And
- 标签: x_platform, X @ylecun
- 📎 原文链接
⭐️⭐️ A decade ago in Korea, AlphaGo showed AI
A decade ago in Korea, AlphaGo showed AI’s potential. Together with the Korean government, we’re now looking at how this technology can help accelerate scientific discovery and create new opportunities for economic growt
- 相关: A, Korea, AlphaGo, AI’s, Together
- 标签: x_platform, X @GoogleDeepMind
- 📎 原文链接