Detection of GenAI-produced and student-written C# code: A comparative study of classifier algorithms and code stylometry features

Adewuyi Adetayo Adegbite; Eduan Kotzé

doi:10.23962/ajic.i35.21309

Authors

Adewuyi Adetayo Adegbite Department of Computer Science and Informatics, University of the Free State, Bloemfontein, South Africa; and Adekunle Ajasin University, Akunba Akoko, Ondo State, Nigeria https://orcid.org/0000-0001-8195-1382
Eduan Kotzé Department of Computer Science and Informatics, University of the Free State, Bloemfontein, South Africa https://orcid.org/0000-0002-5572-4319

DOI:

https://doi.org/10.23962/ajic.i35.21309

Keywords:

C# code, generative AI (GenAI) code, student-written code, machine-learning, code classification, code stylometry features

Abstract

The prevalence of students using generative artificial intelligence (GenAI) to produce program code is such that certain courses are rendered ineffective because students can avoid learning the required skills. Meanwhile, detecting GenAI code and differentiating between GenAI-produced and human-written code are becoming increasingly challenging. This study tested the ability of six classifier algorithms to detect GenAI C# code and to distinguish it from C# code written by students at a South African university. A large dataset of verified student-written code was collated from first-year students at South Africa’s University of the Free State, and corresponding GenAI code produced by Blackbox.AI, ChatGPT and Microsoft Copilot was generated and collated. Code metric features were extracted using modified Roslyn APIs. The data was organised into four sets with an equal number of student-written and AI-generated code, and a machine- learning model was deployed with the four sets using six classifiers: extreme gradient boosting (XGBoost), k-nearest neighbors (KNN), support vector machine (SVM), AdaBoost, random forest, and soft voting (with XGBoost, KNN and SVM as inputs). It was found that the GenAI C# code produced by Blackbox.AI, ChatGPT, and Copilot could, with a high degree of accuracy, be identified and distinguished from student-written C# code through use of the classifier algorithms, with XGBoost performing strongest in detecting GenAI code and random forest performing best in identification of student-written code.

References

Benzebouchi, N. E., Azizi, N., Hammami, N. E., Schwab, D., Khelaifia, M. C. E., & Aldwairi, M. (2019). Authors’ writing styles based authorship identification system using the text representation vector. In 16th International Multi- Conference on Systems, Signals and Devices (SSD 2019) (pp. 371–376). https://doi.org/10.1109/SSD.2019.8894872 DOI: https://doi.org/10.1109/SSD.2019.8894872

Bukhari, S., Tan, B., & De Carli, L. (2023). Distinguishing AI- and human-generated code: A case study. In SCORED 2023 – Proceedings of the 2023 Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses (pp. 17–25). https://doi.org/10.1145/3605770.3625215 DOI: https://doi.org/10.1145/3605770.3625215

Caliskan, A., Yamaguchi, F., Dauber, E., Harang, R., Rieck, K., Greenstadt, R., & Narayanan, A. (2018). When coding style survives compilation: De-anonymizing programmers from executable binaries. In 25th Annual Network and Distributed System Security Symposium (NDSS 2018). https://doi.org/10.14722/ndss.2018.23304 DOI: https://doi.org/10.14722/ndss.2018.23304

Cao, Y., Li, S., Liu, Y., Yan, Z., Dai, Y., Yu, P. S., & Sun, L. (2023). A comprehensive survey of AI-generated content (AIGC): A history of generative AI from GAN to ChatGPT. Journal of the ACM, 37(4). http://arxiv.org/abs/2303.04226

Cheers, H., Lin, Y., & Smith, S. P. (2021). Academic source code plagiarism detection by measuring program behavioral similarity. IEEE Access, 9, 50391–50412. https://doi.org/10.1109/ACCESS.2021.3069367 DOI: https://doi.org/10.1109/ACCESS.2021.3069367

Cheers, H., Lin, Y., & Yan, W. (2023). Identifying plagiarised programming assignments with detection tool consensus. Informatics in Education, 22(1), 1–19. https://doi.org/10.15388/infedu.2023.05 DOI: https://doi.org/10.15388/infedu.2023.05

Corso, V., Mariani, L., Micucci, D., & Riganelli, O. (2024). Generating Java methods: An empirical assessment of four AI-based code assistants. In Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension (ICPC 2024). https://doi.org/10.1145/3643916.3644402 DOI: https://doi.org/10.1145/3643916.3644402

Czibula, G., Lupea, M., & Briciu, A. (2022). Enhancing the performance of software authorship attribution using an ensemble of deep autoencoders. Mathematics, 10(15). https://doi.org/10.3390/math10152572 DOI: https://doi.org/10.3390/math10152572

Dehaerne, E., Dey, B., Halder, S., De Gendt, S., & Meert, W. (2022). Code generation using machine learning: A systematic review. IEEE Access, 10(July), 82434–82455. https://doi.org/10.1109/ACCESS.2022.3196347 DOI: https://doi.org/10.1109/ACCESS.2022.3196347

Ding, S. H. H., Fung, B. C. M., Iqbal, F., & Cheung, W. K. (2019). Learning stylometric representations for authorship analysis. IEEE Transactions on Cybernetics, 49(1), 107–121. https://doi.org/10.1109/TCYB.2017.2766189 DOI: https://doi.org/10.1109/TCYB.2017.2766189

Ebrahim, F., & Joy, M. (2023). Source code plagiarism detection with pre-trained model embeddings and automated machine learning. In International Conference Recent Advances in Natural Language Processing (RANLP) (pp. 301–309). https://doi.org/10.26615/978-954-452-092-2_034 DOI: https://doi.org/10.26615/978-954-452-092-2_034

Eliwa, E., Essam, S., Ashraf, M., & Sayed, A. (2023). Automated detection approaches for source code plagiarism in students’ submissions. Journal of Computing and Communication, 2(2), 8–18. https://doi.org/10.21608/jocc.2023.307054 DOI: https://doi.org/10.21608/jocc.2023.307054

Ghosal, S. S., Chakraborty, S., Geiping, J., Huang, F., Manocha, D., & Bedi, A. S. (2023). Towards possibilities and impossibilities of AI-generated text detection: A survey. arXiv preprint. https://doi.org/10.48550/arXiv.2310.15264

Idialu, O. J., Mathews, N. S., Maipradit, R., Atlee, J. M., & Nagappan, M. (2024). Whodunit: Classifying code as human authored or GPT-4 generated – A case study on CodeChef problems. https://doi.org/10.1145/3643991.3644926 DOI: https://doi.org/10.1145/3643991.3644926

Kalgutkar, V., Kaur, R., Gonzalez, H., Stakhanova, N., & Matyukhina, A. (2019). Code authorship attribution: Methods and challenges. ACM Computing Surveys, 52(1). https://doi.org/10.1145/3292577 DOI: https://doi.org/10.1145/3292577

Kazemitabaar, M., Ye, R., Wang, X., Henley, A. Z., Denny, P., Craig, M., & Grossman, T. (2024). CodeAid: Evaluating a classroom deployment of an LLM-based programming assistant that balances student and educator needs. In CHI ’24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3613904.3642773 DOI: https://doi.org/10.1145/3613904.3642773

Kotsiantis, S., Verykios, V., & Tzagarakis, M. (2024). AI-assisted programming tasks using code embeddings and transformers. Electronics, 13(4), 1–25. https://doi.org/10.3390/electronics13040767 DOI: https://doi.org/10.3390/electronics13040767

Krasniqi, R., & Do, H. (2023). Towards semantically enhanced detection of emerging quality-related concerns in source code. Software Quality Journal, 31(3), 865–915. https://doi.org/10.1007/s11219-023-09614-8 DOI: https://doi.org/10.1007/s11219-023-09614-8

Kuhail A. M., Mathew, S. S., Khalil, A., Berengueres, J., Jawad, S., & Shah, H. (2024). “Will I be replaced?” Assessing ChatGPT’s effect on software development and programmer perceptions of AI tools. Science of Computer Programming, 235, 103111. https://doi.org/10.1016/j.scico.2024.103111 DOI: https://doi.org/10.1016/j.scico.2024.103111

Lalitha, L. V. K., Sree, V., Lekha, R. S., & Kumar, V. N. (2021). Plagiat: A code plagiarism detection tool. EPRA International Journal of Research and Development (IJRD), 7838, 97–101.

Li, Z., Jiang, Y., Zhang, X. J., & Xu, H. Y. (2020). The metric for automatic code generation. Procedia Computer Science, 166, 279–286. https://doi.org/10.1016/j.procs.2020.02.099 DOI: https://doi.org/10.1016/j.procs.2020.02.099

Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30 (NIPS 2017). https://arxiv.org/abs/1705.07874

Makridakis, S. (2017). The forthcoming Artificial Intelligence (AI) revolution: Its impact on society and firms. Futures, 90, 46–60. https://doi.org/10.1016/j.futures.2017.03.006 DOI: https://doi.org/10.1016/j.futures.2017.03.006

Maryono, D., Yuana, R. A., & Hatta, P. (2019). The analysis of source code plagiarism in basic programming course. Journal of Physics: Conference Series, 1193(1). https://doi.org/10.1088/1742-6596/1193/1/012027 DOI: https://doi.org/10.1088/1742-6596/1193/1/012027

Nghiem, K., Nguyen, A. M., & Bui, N. D. Q. (2024). Envisioning the next-generation AI coding assistants: Insights and proposals. In 2024 First IDE Workshop (IDE ’24). https://doi.org/10.1145/3643796.3648467

Odeh, A., Odeh, N., & Mohammed, A. S. (2024). A comparative review of AI techniques for automated code generation in software development: Advancements, challenges, and future directions. TEM Journal, 13(1), 726–739. https://doi.org/10.18421/tem131-76 DOI: https://doi.org/10.18421/TEM131-76

Pan, W. H., Chok, M. J., Wong, J. L. S., Shin, Y. X., Poon, Y. S., Yang, Z., Chong, C. Y., Lo, D., & Lim, M. K. (2024). Assessing AI detectors in identifying AI-generated code: Implications for education. https://arxiv.org/abs/2401.03676 DOI: https://doi.org/10.1145/3639474.3640068

Portillo-Dominguez, A. O, Ayala-Rivera, V., Murphy, E., & Murphy, J. (2017). A unified approach to automate the usage of plagiarism detection tools in programming courses. In ICCSE 2017 – 12th International Conference on Computer Science and Education, ICCSE, 18–23. https://doi.org/10.1109/ICCSE.2017.8085456 DOI: https://doi.org/10.1109/ICCSE.2017.8085456

Raiaan, M. A. K., Mukta, M. S. H., Fatema, K., Fahad, N. M., Sakib, S., Mim, M. M. J., Ahmad, J., Ali, M. E., & Azam, S. (2024). A review on large language models: Architectures, applications, taxonomies, open issues and challenges. IEEE Access, 12(February), 26839–26874. https://doi.org/10.1109/ACCESS.2024.3365742 DOI: https://doi.org/10.1109/ACCESS.2024.3365742

ShaukatTamboli, M., & Prasad, R. (2013). Authorship analysis and identification techniques: A review. International Journal of Computer Applications, 77(16), 11–15. https://doi.org/10.5120/13566-1375 DOI: https://doi.org/10.5120/13566-1375

Song, X., Sun, H., Wang, X., & Yan, J. (2019). A survey of automatic generation of source code comments: Algorithms and techniques. IEEE Access, 7, 111411–111428. https://doi.org/10.1109/ACCESS.2019.2931579 DOI: https://doi.org/10.1109/ACCESS.2019.2931579

Srivastava, S., Rai, A., & Varshney, M. (2021). A tool to detect plagiarism in java source code. Lecture Notes in Networks and Systems, 145, 243–253. https://doi.org/10.1007/978-981-15-7345-3_20 DOI: https://doi.org/10.1007/978-981-15-7345-3_20

Tereszkowski-Kaminski, M., Pastrana, S., Blasco, J., & Suarez-Tangil, G. (2022). Towards improving code stylometry analysis in underground forums. In Proceedings on Privacy Enhancing Technologies, 2022(1), 126–147. https://doi.org/10.2478/popets-2022-0007 DOI: https://doi.org/10.2478/popets-2022-0007

Varona, D., & Suárez, J. L. (2022). Discrimination, bias, fairness, and trustworthy AI. Applied Sciences, 12(12). https://doi.org/10.3390/app12125826 DOI: https://doi.org/10.3390/app12125826

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Illia, P. (2017). Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan & R. Garnett (Eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems (pp. 5998–6008). https://doi.org/10.48550/arXiv.1706.03762

Wan, Y., He, Y., Bi, Z., Zhang, J., Zhang, H., Sui, Y., Xu, G., Jin, H., & Yu, P. S. (2023). Deep learning for code intelligence: Survey, benchmark and toolkit. Arxiv.Org, 1(1), 771–783. https://arxiv.org/abs/2401.00288

White, J., Hays, S., Fu, Q., Spencer-Smith, J., & Schmidt, D. C. (2023). ChatGPT prompt patterns for improving code quality, refactoring, requirements elicitation, and software design. https://doi.org/10.1007/978-3-031-55642-5_4 DOI: https://doi.org/10.1007/978-3-031-55642-5_4

Zafar, S., Sarwar, M. U., Salem, S., & Malik, M. Z. (2020). Language and obfuscation oblivious source code authorship attribution. IEEE Access, 8, 197581–197596. https://doi.org/10.1109/ACCESS.2020.3034932 DOI: https://doi.org/10.1109/ACCESS.2020.3034932

Zhang, H., Cruz, L., & van Deursen, A. (2022). Code smells for machine learning applications. In Proceedings – 1st International Conference on AI Engineering – Software Engineering for AI (CAIN) 2022 (pp. 217–228). https://doi.org/10.1145/3522664.3528620 DOI: https://doi.org/10.1145/3522664.3528620

Zheng, M., Pan, X., & Lillis, D. (2018). CodEX: Source code plagiarism detection based on abstract syntax trees. CEUR Workshop Proceedings, 2259, 362–373.

Detection of GenAI-produced and student-written C# code: A comparative study of classifier algorithms and code stylometry features

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Metrics

Make a Submission

Editor

LINK Centre

Language