面向工业领域的大语言模型评测基准——ID-Eval

Benchmark for Evaluating Large Language Models in Industrial Domain-ID-Eval

  • 摘要: 随着人工智能技术的快速发展, 大语言模型在工业领域的应用日益广泛, 特别是在智能制造、智能监控等重要场景具有不可替代的作用, 成为新型工业化的核心驱动力。基准测试作为评估工业领域大语言模型性能的重要工具, 对于促进人工智能技术进步与产业应用至关重要。为应对工业领域大语言模型评估的复杂性问题, 提出Industrial Domain-Evaluation (ID-Eval)评测基准, 聚焦于工业领域大语言模型性能验证与优化, 从通用能力、应用能力、可信能力三个方面全面评估该模型在知识理解、常识问答、工程建模、采购分析等场景中的应用, 旨在推动人工智能技术从通用场景向工业领域特定场景转化, 促进其在设计、生产、检测等关键环节的深度应用, 加速推动工业领域智能化进程。

     

    Abstract: With the rapid development of artificial intelligence technology, large language models are increasingly being applied in the industrial domain, particularly in critical scenarios such as smart manufacturing and intelligent monitoring, where play an irreplaceable role and have become a core driver of new industrialization. Benchmark testing, as a crucial tool for evaluating the performance of large language models in the industrial domain, is essential for advancing artificial intelligence technology and its industrial applications. To address the complexity of evaluating large language models in the industrial domain, the industrial domain-evaluation (ID-Eval) benchmark is proposed, focusing on the performance validation and optimization of large language models in the industrial domain. ID-Eval comprehensively assesses the application of large language models in scenarios such as knowledge understanding, commonsense question answering, engineering modeling, and procurement analysis from three perspectives:industrial general capability, industrial application capability, and industrial trustworthiness capability. The aim is to drive the transformation of artificial intelligence technology from general scenarios to specific industrial contexts, promote its deep integration into key processes such as design, production, and inspection, and accelerate the intelligent advancement of the industrial domain.

     

/

返回文章
返回