Abstract:
With the rapid development of artificial intelligence technology, large language models are increasingly being applied in the industrial domain, particularly in critical scenarios such as smart manufacturing and intelligent monitoring, where play an irreplaceable role and have become a core driver of new industrialization. Benchmark testing, as a crucial tool for evaluating the performance of large language models in the industrial domain, is essential for advancing artificial intelligence technology and its industrial applications. To address the complexity of evaluating large language models in the industrial domain, the industrial domain-evaluation (ID-Eval) benchmark is proposed, focusing on the performance validation and optimization of large language models in the industrial domain. ID-Eval comprehensively assesses the application of large language models in scenarios such as knowledge understanding, commonsense question answering, engineering modeling, and procurement analysis from three perspectives:industrial general capability, industrial application capability, and industrial trustworthiness capability. The aim is to drive the transformation of artificial intelligence technology from general scenarios to specific industrial contexts, promote its deep integration into key processes such as design, production, and inspection, and accelerate the intelligent advancement of the industrial domain.