Comparative Analysis of Trademark Class Identification Using IndoBERT and Multilingual BERT
Keywords:
Trademark Classification, NICE Classification, IndoBERT, Transformer, Multilingual BERTAbstract
The rapid growth of trademark registrations in Indonesia has increased the demand for efficient and accurate classification into the internationally recognized NICE system. Manual assignment of classes remains time-consuming and prone to human error, motivating the need for an automated approach. This study investigates the application of Transformer-based language models for trademark class identification based solely on product and service descriptions. Two models were evaluated: the Multilingual BERT (mBERT) and the monolingual IndoBERT, both fine-tuned for sequence classification across 45 NICE classes using 59,948 trademark entries collected from the Directorate General of Intellectual Property (DGIP) database. The research methodology encompassed data preprocessing, stratified train-test splitting (80:20), and tokenization with a maximum sequence length of 64 tokens. Both models were trained for two epochs using the AdamW optimizer, and evaluated with accuracy, precision, recall, F1-score, and per-class accuracy (one-vs-all). Experimental results reveal that IndoBERT significantly outperforms mBERT, achieving an overall accuracy, precision, recall, and F1-score of 0.90, compared to 0.85 for mBERT. IndoBERT demonstrated particularly robust performance in low-support classes, indicating its superior ability to capture domain-specific linguistic nuances in Indonesian trademark descriptions. The findings underscore the potential of monolingual Transformer models for automating trademark classification in national intellectual property systems. The integration of such models can accelerate trademark registration, reduce examiner workload, and enhance consistency in class assignment. These results contribute to advancing the deployment of AI in legal and administrative contexts, while providing a foundation for future work involving multimodal features and explainable AI for comprehensive trademark management solutions.
References
[1] J. Patrun, L. Okreša, H. Iveković, and N. Rustemović, “Diagnostic accuracy of NICE classification system for optical recognition of predictive morphology of colorectal polyps,” Gastroenterology Research and Practice, vol. 2018, no. 1, p. 7531368, 2018.
[2] R. Tushnet, “Registering disagreement: registration in modern American trademark law,” Harvard Law Review, vol. 130, p. 867, 2016.
[3] G. A. Pratama, E. A. Putri, and A. M. Luwinanda, “International trademark registration through Madrid Protocol as a solution for trademark protection from Indonesia,” in Proc. 3rd Int. Conf. Business Law Local Wisdom Tourism (ICBLT 2022), Jan. 2023, pp. 321–332. Atlantis Press.
[4] World Intellectual Property Organization (WIPO), World Intellectual Property Indicators 2024. Geneva, Switzerland: WIPO, 2024. doi: 10.34667/tind.50133.
[5] Suyahman, Branding UMKM: Perencanaan, Perlindungan, dan Evaluasi Branding di Era Digital. Klaten, Indonesia: PT Solusi Administrasi Hukum, 2025.
[6] G. Showkatramani, N. Khatri, A. Landicho and D. Layog, "Deep Learning Approach to Trademark International Class Identification," 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), Boca Raton, FL, USA, 2019, pp. 608-612, doi: 10.1109/ICMLA.2019.00112.
[7] C. V. Trappey, A. J. Trappey, and B. H. Liu, “Identify trademark legal case precedents—Using machine learning to enable semantic analysis of judgments,” World Patent Information, vol. 62, p. 101980, 2020.
[8] C. V. Trappey, A. J. Trappey, and S. C.-C. Lin, “Intelligent trademark similarity analysis of image, spelling, and phonetic features using machine learning methodologies,” Advanced Engineering Informatics, vol. 45, p. 101120, 2020.
[9] S. Adarsh, E. Ash, S. Bechtold, B. Beebe, and J. Fromer, “Automating Abercrombie: Machine‐learning trademark distinctiveness,” Journal of Empirical Legal Studies, vol. 21, no. 4, pp. 826–860, 2024.
[10] R. Zhang and X. Wang, “Commodity classification in livestreaming marketing based on a conv-transformer network,” Multimedia Tools and Applications, vol. 83, no. 18, pp. 54909–54924, 2024.
[11] H. Jayadianti, W. Kaswidjanti, A. T. Utomo, S. Saifullah, F. A. Dwiyanto, and R. Drezewski, “Sentiment analysis of Indonesian reviews using fine-tuning IndoBERT and R-CNN,” ILKOM Jurnal Ilmiah, vol. 14, no. 3, pp. 348–354, 2022.
[12] L. Geni, E. Yulianti, and D. I. Sensuse, “Sentiment analysis of tweets before the 2024 elections in Indonesia using IndoBERT language models,” Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), vol. 9, no. 3, pp. 746–757, 2023.
[13] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A benchmark dataset and pre-trained language model for Indonesian NLP,” arXiv preprint arXiv:2011.00677, 2020.
[14] S. M. Isa, G. Nico, and M. Permana, “IndoBERT for Indonesian fake news detection,” ICIC Express Letters, vol. 16, no. 3, pp. 289–297, 2022.
[15] S. Suyahman and A. Hapsari, “VGG-Based Feature Extraction for Classifying Traditional Batik Motifs Using Machine Learning Models,” Preservation, Digital Technology & Culture, 2025.
[16] S. Suyahman, S. Sunardi, M. Murinto, and A. N. Khusna, “Siamese Neural Network Optimization Using Distance Metrics for Trademark Image Similarity Detection,” International Journal of Computing, vol. 17, no. 1, pp. 1–12, 2025.
[17] O. Goldman, A. Caciularu, M. Eyal, K. Cao, I. Szpektor, and R. Tsarfaty, “Unpacking tokenization: Evaluating text compression and its correlation with model performance,” arXiv preprint arXiv:2403.06265, 2024.
[18] H. Ding, Z. Wang, G. Paolini, V. Kumar, A. Deoras, D. Roth, and S. Soatto, “Fewer truncations improve language modeling,” arXiv preprint arXiv:2404.10830, 2024.
[19] M. A. Rahman, H. Shahriar, F. Wu, and A. Cuzzocrea, “Applying pre-trained multilingual BERT in embeddings for improved malicious prompt injection attacks detection,” in Proc. 2024 2nd Int. Conf. Artif. Intell., Blockchain, Internet Things (AIBThings), Sep. 2024, pp. 1–7.
[20] A. Kumar, N. Ware, and S. Gupta, “Leveraging transfer learning: Fine-tuning methodology for enhanced text classification using BERT,” in Proc. 2024 IEEE Pune Section Int. Conf. (PuneCon), Dec. 2024, pp. 1–5.
[21] C. Shaw, P. LaCasse, and L. Champagne, “Exploring emotion classification of Indonesian tweets using large scale transfer learning via IndoBERT,” Social Netw. Anal. Min., vol. 15, no. 1, p. 22, 2025.
[22] C. J. L. Tobing, I. L. Wijayakusuma, and L. P. I. Harini, “Perbandingan kinerja IndoBERT dan mBERT untuk deteksi berita hoaks politik dalam bahasa Indonesia,” J. Sains Teknol., vol. 14, no. 1, pp. 114–123, 2025.
[23] S. Selvakumari and M. Durairaj, “A comparative study of optimization techniques in deep learning using the MNIST dataset,” Indian J. Sci. Technol., vol. 18, no. 10, pp. 803–810, 2025.
[24] S. Suyahman, S. Sunardi, M. Murinto, and A. N. Khusna, “Data Augmentation Using Test-Time Augmentation on Convolutional Neural Network-Based Brand Logo Trademark Detection,” Indonesian Journal of Artificial Intelligence and Data Mining, vol. 7, no. 2, pp. 266–274, 2024.
[25] S. Sunardi and S. Suyahman, “Analisis Komparasi Prediksi Serangan DDoS Menggunakan Machine Learning,” in Proceeding of Informatics Collaborations and Dissemination Meeting, vol. 1, no. 1, pp. 84–91, May 2025.
[26] S. Suyahman, S. Sunardi, and M. Murinto, “Comparative Analysis of CNN Architectures in Siamese Networks with Test-Time Augmentation for Trademark Image Similarity Detection,” Scientific Journal of Informatics, vol. 11, no. 4, Nov. 2024.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Journal of Artificial Intelligence and Legal Technology

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.