Recognition of GUI Web Elements using the Deep Learning Approach and Yolo Architecture

Main Article Content

Helga Prifti
Bekir Karlik

Abstract

Context: The growing complexity of modern web and mobile applications, shaped by rapid UI/UX advancements and agile software development practices, demands efficient and automated recognition of graphical user interface (GUI) elements, as manual testing remains costly and time-consuming. Objective: This study aims to evaluate the performance of the YOLOv8 deep learning architecture for accurate recognition of diverse GUI components, such as buttons, fields, headings, links, and images, from real-world application screenshots. Method: A small YOLOv8 model was trained on the Roboflow GUI element detection dataset containing over 1,000 annotated website screenshots, using 35 training epochs, a batch size of 8, and an image resolution of 640×640. Results: The model achieved measurable improvements, with precision rising from 0.368 to 0.454, recall increasing from 0.296 to 0.425, and mAP50–95 improving from 0.101 to 0.232. Strongest detection was observed for buttons and input fields, while weaker performance was noted for iframes and labels due to their inherent ambiguity. Conclusions: The results demonstrate that YOLOv8 has significant potential in automating GUI recognition, reducing reliance on manual testing in agile workflows, and improving interface validation. Further optimization with larger datasets and advanced augmentation methods is recommended to enhance robustness and generalization

Article Details

Section

Research Articles

How to Cite

[1]
H. Prifti and Bekir Karlik, “Recognition of GUI Web Elements using the Deep Learning Approach and Yolo Architecture”, Systems and Computing, vol. 2, no. 1, Jan. 2026, doi: 10.64409/sycom.v2.i1.28.

References

[1] P. J. Van de Broek, C. Onime, J. O. Uhomobhi, M. Santachiara, “Evolution of User Interface and User Experience in Mobile Augmented and Virtual Reality Applications” in P. Bourdot & D. J. Kasik (Eds.), Haptic Technology - Intelligent Approach to Future Man-Machine Interaction, 2022, Chapter 3.

[2] T. Zhang, Tao et al. “Deep Learning-Based Mobile Application Isomorphic GUI Identification for Automated Robotic Testing” IEEE Software, 37, 67-74, 2020. https://doi.org/10.1177/15501329221115

[3] T. Yeh, T.H. Chang, and R.C. Miller, “Sikuli: Using GUI screenshots for search and automation”. In Proceedings of the 22nd Annual ACM Symposium on User Interface Software and Technology (UIST '09), pp. 183-192. 2009, Association for Computing Machinery. https://doi.org/10.1145/1622176.1622213

[4] X. Sun, T. Li and J. Xu, "UI Components Recognition System Based On Image Understanding" 2020 IEEE 20th International Conference on Software Quality, Reliability and Security Companion (QRS-C), Macau, China, pp. 65-71, 2020. https://doi.org/ 10.1109/QRS-C51114.2020.00022

[5] M.F. Yilmaz and B. Karlik, “Comparison of deep learning algorithms with different activation functions for brightness image enhancement”. International Journal of Artificial Intelligence and Expert Systems (IJAE), 13(2), pp.12-24, 2024.

[6] B. Selcuk, T. Serif. "A Comparison of YOLOv5 and YOLOv8 in the Context of Mobile UI Detection." Journal of Computer Vision and Pattern Recognition, Array 2024, vol. 22, 100351, 2024. https://doi.org/10.1007/978-3-031-39764-6_11

[7] K. Moran, C. Bernal-Cárdenas, M. Curcio, "Machine Learning-Based Prototyping of Graphical User Interfaces for Mobile Apps." In Proceedings of the 26th ACM/IEEE International Conference on Mobile Software Engineering and Systems (ESEC/FSE'18), vol. 46, pp. 196–221, 2018. https://doi.org/10.1109/TSE.2018.2844788

[8] S. N. Cavsak, A. Deliahmetoglu, B. T. Ay and S. Tanberk, "GUI Component Detection Using YOLO and Faster-RCNN," 2023 14th International Conference on Electrical and Electronics Engineering (ELECO), Bursa, Turkiye, pp. 1-5, 2023. https://doi.org/ 10.1109/ELECO60389.2023.10415929

[9] M. D. Altinbas, and T. Serif, T. (2022). GUI Element Detection from Mobile UI Images Using YOLOv5. In Computer Vision and Pattern Recognition (pp. 45-62). Springer.25- H. Agrawal and K. Desai, "Canny edge detection: A comprehensive review," Int. J. Tech. Res. & Sci., vol. 5, no. 1, pp. 126, 2024. https://doi.org/10.30780/specialissue-ISET-2024/023

[10] M. Xie, S. Feng, Z. Xing, J. Chen, and C. Chen, “UIED: A Hybrid Tool for GUI Element Detection." In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Virtual Event, USA) (ESE/FSE, 2020), Association for Computing Machinery, New York, NY, USA, pp. 1655-1659, 2020. https://doi.org/10.1145/3368089.3417940

[11] J. Cheng, D. Tan, T. Zhang, A. Wei, and J. Chen, “YOLOv5-MGC: GUI Element Identification for Mobile Applications Based on Improved YOLOv5”, Mobile Information Systems, 9800734, 9 pages, 2022. https://doi.org/10.1155/2022/8900734

[12] C. Jemmali, C. Harteveld, Y. Fu, and M. Seif El-Nasr, “VINS: Visual Search for Mobile User Interface Design. CHI '21: ACM CHI Conference on Human Factors in Computing Systems, (CHI '21).” Association for Computing Machinery, New York, NY, USA, Article 423, pp. 1–14, 2021.

[13] J. Vyskočil and L. Picek, "Improving Web User Interface Element Detection Using Faster R-CNN," in the Conference and Labs of the Evaluation Forum (CLEF), 2021, Bucharest, Romania. CEUR Workshop Proceedings, vol. 2936, pp. 1375-1386, 2021.

[14] R. Girshick J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, pp. 580-587, 2014. https://doi.org/10.1109/CVPR.2014.81

[15] A.A.M. Al-Saffar, T. Hai, M. A. Talab, “Review of Deep Convolution Neural Network in Image Classification.” In ACM Computing Surveys, 2017 International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunications, pp. 26-31, 2017. https://doi.org/10.1109/ICRAMET.2017.8253139

[16] https://universe.roboflow.com/yolo-z8ekd/ui-component/dataset/2https://docs.ultralytics.com/

[17] H. Prifti. "Recognition of GUI Web Elements Using the Deep Learning Approach and Yolo Architecture." Master Thesis, Epoka University, Tirana, Albania, 2025.