SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading Dinh, T. A.; Mullov, C.; Bärmann, L.; Li, Z.; Liu, D.; Reiß, S.; Lee, J.; Lerzer, N.; Gao, J.; Peller-Konrad, F.; Röddiger, T.; Waibel, A.; Asfour, T.; Beigl, M.; Stiefelhagen, R.; Dachsbacher, C.; Böhm, K.; Niehues, J. 2024. Y. Al-Onaizan, M. Bansal & Y.-N. Chen (Hrsg.), Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, 12th-16th November 2024, Hrsg.: Al-Onaizan, Y., Bansal, M., Chen, Y.-N., 11592–11610, Association for Computational Linguistics (ACL). doi:10.18653/v1/2024.emnlp-main.647