Entity Extraction and Annotation for Job Title and Job Descriptions Using Bert-Based Model

Authors

DOI:

https://doi.org/10.25139/inform.v10i1.7367

Keywords:

Named Entity Recognition, Bert-Based Model, Job Vacancy, Deep Learning, Language Processing

Abstract

This research paper investigates Named Entity Recognition (NER) within Indonesia’s job vacancy domain, employing state-of-the-art Bert-based models. The study presents a detailed data collection and preprocessing methodology, followed by the Bert-based model’s fine-tuning for enhanced NER. The dataset comprises 48,673 job vacancies collected from the JobStreet website in July 2023, specifically focusing on multi-entity recognition, including job titles and job descriptions. An original annotation algorithm was developed using Python and Laravel for precise entity recognition. In addition, this paper provides an extensive literature review of NER and Bert-based models and discusses their relevance in the context of the Indonesian job market. The outcomes highlight the efficacy of our BERT-based model, attaining an average accuracy of 78.5%, a precision of 79.7%, a recall of 81.1%, and an F1 score of 80.8% in the Named Entity Recognition (NER) task. The study concludes by discussing the implications, limitations, and future directions, underscoring the model’s potential applicability in streamlining job matching and recruitment processes in Indonesia and beyond. This research contributes to the field by providing a robust framework for NER in job vacancies, highlighting the potential for improved job matching, and proposing enhancements for future model development and application in other languages and regions.

References

K. R. Chowdhary, “Natural Language Processing,†in Fundamentals of Artificial Intelligence. Springer India, 2021. Accessed: Oct. 20, 2024.

N. Nurchim, N. Nurmalitasari, and Z. A. Long, “Indonesian news classification application with named entity recognition approach,†JURNAL INFOTEL, vol. 15, no. 2, pp. 130–134, May 2023, doi: 10.20895/infotel.v15i2.909.

S. H. E* and M. A E, “Differential Hiring using a Combination of NER and Word Embedding,†International Journal of Recent Technology and Engineering (IJRTE), vol. 9, no. 1, pp. 1344–1349, May 2020, doi: 10.35940/ijrte.A2400.059120.

F. Stollenwerk, A. Sweden Niklas Fastlund, and A. Nyqvist, “Annotated Job Ads with Named Entity Recognition.â€, doi: 10.1109/CSCWD49262.2021.9437789.

M. Melih Mutlu and A. Özgür, “A Dataset and BERT-based Models for Targeted Sentiment Analysis on Turkish Texts.â€, doi: 10.48550/arXiv.2205.04185.

J. Li, A. Sun, J. Han, and C. Li, “A Survey on Deep Learning for Named Entity Recognition,†2020.

A. Goyal, V. Gupta, and M. Kumar, “Recent Named Entity Recognition and Classification techniques: A systematic review,†Aug. 01, 2018, Elsevier Ireland Ltd. doi: 10.1016/j.cosrev.2018.06.001.

J.-J. Decorte, J. Van Hautte, T. Demeester, and C. Develder, “JobBERT: Understanding Job Titles through Skills.â€

Z. Mincheva, N. Vasilev, V. Nikolov, and A. Antonov, “Extracting Structured Data from Text in Natural Language,†International Journal of Intelligent Information Systems, vol. 10, no. 4, p. 74, 2021, doi: 10.11648/j.ijiis.20211004.16.

J. Li, A. Sun, J. Han, and C. Li, “A Survey on Deep Learning for Named Entity Recognition,†IEEE Trans. Knowl. Data Eng., vol. 34, no. 1, pp. 50–70, Jan. 2022, doi: 10.1109/TKDE.2020.2981314.

A. Goyal, V. Gupta, and M. Kumar, “Recent Named Entity Recognition and Classification techniques: A systematic review,†Computer Science Review, vol. 29, pp. 21–43, Aug. 2018, doi: 10.1016/j.cosrev.2018.06.001.

J.-J. Decorte, J. Van Hautte, T. Demeester, and C. Develder, “JobBERT: Understanding Job Titles through Skills.†arXiv, Sep. 20, 2021. doi: 10.48550/arXiv.2109.09605.

H. H. Putro and N. R. Rakhmawati, “Job Standard Parameters from Online Job Vacancy,†IJPS, vol. 0, no. 6, p. 46, Mar. 2021, doi: 10.12962/j23546026.y2020i6.8905.

Downloads

Published

2025-01-31

How to Cite

Entity Extraction and Annotation for Job Title and Job Descriptions Using Bert-Based Model. (2025). Inform : Jurnal Ilmiah Bidang Teknologi Informasi Dan Komunikasi, 10(1), 73–77. https://doi.org/10.25139/inform.v10i1.7367

Issue

Section

Articles

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.