Entity Extraction and Annotation for Job Title and Job Descriptions Using Bert-Based Model


Abstract
This research paper investigates Named Entity Recognition (NER) within Indonesia’s job vacancy domain, employing state-of-the-art Bert-based models. The study presents a detailed data collection and preprocessing methodology, followed by the Bert-based model’s fine-tuning for enhanced NER. The dataset comprises 48,673 job vacancies collected from the JobStreet website in July 2023, specifically focusing on multi-entity recognition, including job titles and job descriptions. An original annotation algorithm was developed using Python and Laravel for precise entity recognition. In addition, this paper provides an extensive literature review of NER and Bert-based models and discusses their relevance in the context of the Indonesian job market. The outcomes highlight the efficacy of our BERT-based model, attaining an average accuracy of 78.5%, a precision of 79.7%, a recall of 81.1%, and an F1 score of 80.8% in the Named Entity Recognition (NER) task. The study concludes by discussing the implications, limitations, and future directions, underscoring the model’s potential applicability in streamlining job matching and recruitment processes in Indonesia and beyond. This research contributes to the field by providing a robust framework for NER in job vacancies, highlighting the potential for improved job matching, and proposing enhancements for future model development and application in other languages and regions.
References
K. R. Chowdhary, “Natural Language Processing,” in Fundamentals of Artificial Intelligence. Springer India, 2021. Accessed: Oct. 20, 2024.
N. Nurchim, N. Nurmalitasari, and Z. A. Long, “Indonesian news classification application with named entity recognition approach,” JURNAL INFOTEL, vol. 15, no. 2, pp. 130–134, May 2023, doi: 10.20895/infotel.v15i2.909.
S. H. E* and M. A E, “Differential Hiring using a Combination of NER and Word Embedding,” International Journal of Recent Technology and Engineering (IJRTE), vol. 9, no. 1, pp. 1344–1349, May 2020, doi: 10.35940/ijrte.A2400.059120.
F. Stollenwerk, A. Sweden Niklas Fastlund, and A. Nyqvist, “Annotated Job Ads with Named Entity Recognition.”, doi: 10.1109/CSCWD49262.2021.9437789.
M. Melih Mutlu and A. Özgür, “A Dataset and BERT-based Models for Targeted Sentiment Analysis on Turkish Texts.”, doi: 10.48550/arXiv.2205.04185.
J. Li, A. Sun, J. Han, and C. Li, “A Survey on Deep Learning for Named Entity Recognition,” 2020.
A. Goyal, V. Gupta, and M. Kumar, “Recent Named Entity Recognition and Classification techniques: A systematic review,” Aug. 01, 2018, Elsevier Ireland Ltd. doi: 10.1016/j.cosrev.2018.06.001.
J.-J. Decorte, J. Van Hautte, T. Demeester, and C. Develder, “JobBERT: Understanding Job Titles through Skills.”
Z. Mincheva, N. Vasilev, V. Nikolov, and A. Antonov, “Extracting Structured Data from Text in Natural Language,” International Journal of Intelligent Information Systems, vol. 10, no. 4, p. 74, 2021, doi: 10.11648/j.ijiis.20211004.16.
J. Li, A. Sun, J. Han, and C. Li, “A Survey on Deep Learning for Named Entity Recognition,” IEEE Trans. Knowl. Data Eng., vol. 34, no. 1, pp. 50–70, Jan. 2022, doi: 10.1109/TKDE.2020.2981314.
A. Goyal, V. Gupta, and M. Kumar, “Recent Named Entity Recognition and Classification techniques: A systematic review,” Computer Science Review, vol. 29, pp. 21–43, Aug. 2018, doi: 10.1016/j.cosrev.2018.06.001.
J.-J. Decorte, J. Van Hautte, T. Demeester, and C. Develder, “JobBERT: Understanding Job Titles through Skills.” arXiv, Sep. 20, 2021. doi: 10.48550/arXiv.2109.09605.
H. H. Putro and N. R. Rakhmawati, “Job Standard Parameters from Online Job Vacancy,” IJPS, vol. 0, no. 6, p. 46, Mar. 2021, doi: 10.12962/j23546026.y2020i6.8905.
Copyright (c) 2025 Seftin Fitri Ana Wati, Anindo Saka Fitri, Herlambang Haryo Putra, Suryo Widodo, Arizia Aulia Aziiza

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with Inform: Jurnal Ilmiah Bidang Teknologi Informasi dan Komunikasi agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.