Sentiment Analysis for IMDb Movie Review Using Support Vector Machine (SVM) Method

D. Diffran Nur Cahyo; Fidya Farasalsabila; Verra Budhi Lestari; Hanafi; Tutik  Lestari; Fahmi Rusdi  Al Islami; M. Akbar  Maulana

doi:10.25139/inform.v8i2.5700

D. Diffran Nur Cahyo Magister Teknik Informatika, Universitas Amikom Yogyakarta
Fidya Farasalsabila Universitas Amikom Yogyakarta
Verra Budhi Lestari Universitas Amikom Yogyakarta
Hanafi Informatics Department, Universitas Amikom Yogyakarta
Tutik Lestari Informatics Department, Institut Teknologi Tangerang Selatan
Fahmi Rusdi Al Islami Informatics Department, Institut Teknologi Tangerang Selatan
M. Akbar Maulana Informatics Department, Institut Teknologi Tangerang Selatan

https://doi.org/10.25139/inform.v8i2.5700

Abstract views: 774 ,

PDF downloads: 757

Keywords: Sentiment Analysis, IMDb, Movie Review, TF-IDF, SVM

Abstract

Many researchers currently employ supervised, machine learning methods to study sentiment analysis. Analysis can be done on movie reviews, Twitter reviews, online product reviews, blogs, discussion forums, Myspace comments, and social networks. Support Vector Machines (SVM) classifiers are used to analyze the Twitter data set using different parameters. The analysis and discussion were undertaken to allow for the conclusion that SVM has been successfully implemented utilizing the IMDb data for this study (Support Vector Machine). To complete this study, the preprocessing phase, which consisted of filtering and classifying data using SVM with a total of 50.000 data points, was completed after collecting up to 40.000 reviews to use as training data and 10.000 reviews to use as testing data. 25.000 positive and 25.000 negative points make up the view. In this study, we adopted an evaluation matrix including accurate, precision, recall, and F1-score. According to the experiment report, our model achieved SVM with Bags of Word (BoW) used to get results for the highest accuracy test, which was 88,59% accurate. Then, using grid-search, optimize against the SVM parameters to find the best parameters that SVM models can use. Our model achieved Term Frequency–inverse Document Frequency (TF-IDF) was used to get results for the highest accuracy test, which was 91,27% accurate.

Author Biographies

D. Diffran Nur Cahyo, Magister Teknik Informatika, Universitas Amikom Yogyakarta

Hanafi, Informatics Department, Universitas Amikom Yogyakarta

Tutik Lestari, Informatics Department, Institut Teknologi Tangerang Selatan

Fahmi Rusdi Al Islami, Informatics Department, Institut Teknologi Tangerang Selatan

M. Akbar Maulana, Informatics Department, Institut Teknologi Tangerang Selatan

References

D. Can, A. Kazemzadeh, F. Bar, H. Wang, and S. Narayanan, “A System for Real-time Twitter Sentiment Analysis of 2012 US Presidential Election Cycle Singing View project Emotion Prediction From Movies View project A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle,” no. July, pp. 8–14, 2012, [Online]. Available: https://www.researchgate.net/publication/262326668.

L. Zhang, S. Wang, and B. Liu, “Deep learning for sentiment analysis: A survey,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 8, no. 4, pp. 1–25, 2018, doi: 10.1002/widm.1253.

E. Kouloumpis, T. Wilson, and J. Moore, “Twitter Sentiment Analysis: The Good the Bad and the OMG!,” Proc. Int. AAAI Conf. Web Soc. Media, vol. 5, no. 1, pp. 538–541, 2021, doi: 10.1609/icwsm.v5i1.14185.

H. Li, Y. Ma, Z. Ma, and H. Zhu, “Weibo text sentiment analysis based on bert and deep learning,” Appl. Sci., vol. 11, no. 22, 2021, doi: 10.3390/app112210774.

Hanafi, N. Suryana, and A. S. H. Basari, “Generate Contextual Insight of Product Review Using Deep LSTM and Word Embedding,” J. Phys. Conf. Ser., vol. 1577, no. 1, 2020, doi: 10.1088/1742-6596/1577/1/012006.

Hanafi, N. Suryana, and A. S. Bashari, “Evaluation of e-Service Quality, Perceived Value on Customer Satisfaction and Customer Loyalty: a Study in Indonesia,” International Business Management, vol. 11, no. 11. pp. 1892–1900, 2017, [Online]. Available: https://medwelljournals.com/abstract/?doi=ibm.2017.1892.1900.

G. Eom, S. Yun, and H. Byeon, “Predicting the sentiment of South Korean Twitter users toward vaccination after the emergence of COVID-19 Omicron variant using deep learning-based natural language processing,” Front. Med., vol. 9, no. September, pp. 1–13, 2022, doi: 10.3389/fmed.2022.948917.

P. Ashokkumar, G. Siva Shankar, G. Srivastava, P. K. R. Maddikunta, and T. R. Gadekallu, “A Two-stage Text Feature Selection Algorithm for Improving Text Classification,” ACM Trans. Asian Low-Resource Lang. Inf. Process., vol. 20, no. 3, 2021, doi: 10.1145/3425781.

S. Bhatia, M. Sharma, and K. K. Bhatia, “Sentiment Analysis and Mining of Opinions,” Stud. Big Data, vol. 30, no. May, pp. 503–523, 2018, doi: 10.1007/978-3-319-60435-0_20.

F. A. Pozzi, E. Fersini, E. Messina, and B. Liu, Sentiment Analysis in Social Networks. 2016.

A. Hakim Dalimunthe, R. Aditiya, and R. Watrianthos, “Implementation Naïve Bayes Classification for Sentiment Analysis on Internet Movie Database,” Technol. Sci., vol. 4, no. 1, pp. 4–9, 2022, doi: 10.47065/bits.v4i1.1468.

V. Korsunova and O. Volchenko, “Cultural Modernisation And Film Industry: Naked Facts From IMDB,” SSRN Electron. J., 2021, doi: 10.2139/ssrn.3915414.

G. Ignatow and R. Mihalcea, An Introduction to text Mining. 2017.

Z. Jiang, B. Gao, Y. He, Y. Han, P. Doyle, and Q. Zhu, “Text Classification Using Novel Term Weighting Scheme-Based Improved TF-IDF for Internet Media Reports,” Math. Probl. Eng., vol. 2021, no. ii, 2021, doi: 10.1155/2021/6619088.

M. Lan, C. L. Tan, J. Su, and Y. Lu, “Supervised and traditional term weighting methods for automatic text categorization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 4, pp. 721–735, 2009, doi: 10.1109/TPAMI.2008.110.

T. Sabbah et al., “Modified frequency-based term weighting schemes for text classification,” Appl. Soft Comput., vol. 58, pp. 193–206, 2017, doi: 10.1016/j.asoc.2017.04.069.

M. McTear, Z. Callejas, and D. Griol, “The conversational interface: Talking to smart devices,” Conversational Interface Talk. to Smart Devices, no. 2009, pp. 1–422, 2016, doi: 10.1007/978-3-319-32967-3.

S. Raj, Pethuru; Deepu, “A Framework for Text Analytics using the Bag of Words (BoW) Model for Prediction,” Int. J. Adv. Netw. Appl., pp. 975–0282, 2016, [Online]. Available: https://archive.ics.uci.edu/ml/datasets/Bag+of+Words.

W. S. Noble, “What is a support vector machine?,” Nat. Biotechnol., vol. 24, no. 12, pp. 1565–1567, 2006, doi: 10.1038/nbt1206-1565.

R. Freund, F. Girosi, and E. Osuna, “An Improved Training Algorithm for Support Vector Machines 1 Introduction 2 Support Vector Machines,” Neural Networks Signal Process. [1997] VII. Proc. 1997 IEEE Work., pp. 276–285, 1997, [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.7400%5Cnhttp://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=622408&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D622408.

E. Elgeldawi, A. Sayed, A. R. Galal, and A. M. Zaki, “Hyperparameter tuning for machine learning algorithms used for arabic sentiment analysis,” Informatics, vol. 8, no. 4, pp. 1–21, 2021, doi: 10.3390/informatics8040079.

R. Hossain and D. D. Timmer, “Machine learning model optimization with hyper parameter tuning approach,” Glob. J. Comput. Sci. Technol., vol. 21, no. 2, pp. 7–13, 2021.

E. Beauxis-Aussalet and L. Hardman, “Simplifying the Visualization of Confusion Matrix,” pp. 1–2.