FT商学院

AI groups rush to redesign model testing and create new benchmarks

Rapidly advancing technology is surpassing current methods of evaluating and comparing large language models
OpenAI, Microsoft, Meta and Anthropic all have plans to build AI agents that can execute tasks for humans autonomously on their behalf

Tech groups are rushing to redesign how they test and evaluate their artificial intelligence models, as the fast advancing technology surpasses current benchmarks.

OpenAI, Microsoft, Meta and Anthropic have all recently announced plans to build AI agents that can execute tasks for humans autonomously on their behalf. To do this effectively, the systems must be able to perform increasingly complex actions, using reasoning and planning.

您已阅读10%(572字),剩余90%(5433字)包含更多重要信息,订阅以继续探索完整内容,并享受更多专属服务。
版权声明:本文版权归manbetx20客户端下载 所有,未经允许任何单位或个人不得转载,复制或以任何其他方式使用本文全部或部分,侵权必究。
设置字号×
最小
较小
默认
较大
最大
分享×