Sentiment Analysis for IMDb Movie Review Using Support Vector Machine (SVM) Method


Abstract
Many researchers currently employ supervised, machine learning methods to study sentiment analysis. Analysis can be done on movie reviews, Twitter reviews, online product reviews, blogs, discussion forums, Myspace comments, and social networks. Support Vector Machines (SVM) classifiers are used to analyze the Twitter data set using different parameters. The analysis and discussion were undertaken to allow for the conclusion that SVM has been successfully implemented utilizing the IMDb data for this study (Support Vector Machine). To complete this study, the preprocessing phase, which consisted of filtering and classifying data using SVM with a total of 50.000 data points, was completed after collecting up to 40.000 reviews to use as training data and 10.000 reviews to use as testing data. 25.000 positive and 25.000 negative points make up the view. In this study, we adopted an evaluation matrix including accurate, precision, recall, and F1-score. According to the experiment report, our model achieved SVM with Bags of Word (BoW) used to get results for the highest accuracy test, which was 88,59% accurate. Then, using grid-search, optimize against the SVM parameters to find the best parameters that SVM models can use. Our model achieved Term Frequency–inverse Document Frequency (TF-IDF) was used to get results for the highest accuracy test, which was 91,27% accurate.
References
D. Can, A. Kazemzadeh, F. Bar, H. Wang, and S. Narayanan, “A System for Real-time Twitter Sentiment Analysis of 2012 US Presidential Election Cycle Singing View project Emotion Prediction From Movies View project A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle,” no. July, pp. 8–14, 2012, [Online]. Available: https://www.researchgate.net/publication/262326668.
L. Zhang, S. Wang, and B. Liu, “Deep learning for sentiment analysis: A survey,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 8, no. 4, pp. 1–25, 2018, doi: 10.1002/widm.1253.
E. Kouloumpis, T. Wilson, and J. Moore, “Twitter Sentiment Analysis: The Good the Bad and the OMG!,” Proc. Int. AAAI Conf. Web Soc. Media, vol. 5, no. 1, pp. 538–541, 2021, doi: 10.1609/icwsm.v5i1.14185.
H. Li, Y. Ma, Z. Ma, and H. Zhu, “Weibo text sentiment analysis based on bert and deep learning,” Appl. Sci., vol. 11, no. 22, 2021, doi: 10.3390/app112210774.
Hanafi, N. Suryana, and A. S. H. Basari, “Generate Contextual Insight of Product Review Using Deep LSTM and Word Embedding,” J. Phys. Conf. Ser., vol. 1577, no. 1, 2020, doi: 10.1088/1742-6596/1577/1/012006.
Hanafi, N. Suryana, and A. S. Bashari, “Evaluation of e-Service Quality, Perceived Value on Customer Satisfaction and Customer Loyalty: a Study in Indonesia,” International Business Management, vol. 11, no. 11. pp. 1892–1900, 2017, [Online]. Available: https://medwelljournals.com/abstract/?doi=ibm.2017.1892.1900.
G. Eom, S. Yun, and H. Byeon, “Predicting the sentiment of South Korean Twitter users toward vaccination after the emergence of COVID-19 Omicron variant using deep learning-based natural language processing,” Front. Med., vol. 9, no. September, pp. 1–13, 2022, doi: 10.3389/fmed.2022.948917.
P. Ashokkumar, G. Siva Shankar, G. Srivastava, P. K. R. Maddikunta, and T. R. Gadekallu, “A Two-stage Text Feature Selection Algorithm for Improving Text Classification,” ACM Trans. Asian Low-Resource Lang. Inf. Process., vol. 20, no. 3, 2021, doi: 10.1145/3425781.
S. Bhatia, M. Sharma, and K. K. Bhatia, “Sentiment Analysis and Mining of Opinions,” Stud. Big Data, vol. 30, no. May, pp. 503–523, 2018, doi: 10.1007/978-3-319-60435-0_20.
F. A. Pozzi, E. Fersini, E. Messina, and B. Liu, Sentiment Analysis in Social Networks. 2016.
A. Hakim Dalimunthe, R. Aditiya, and R. Watrianthos, “Implementation Naïve Bayes Classification for Sentiment Analysis on Internet Movie Database,” Technol. Sci., vol. 4, no. 1, pp. 4–9, 2022, doi: 10.47065/bits.v4i1.1468.
V. Korsunova and O. Volchenko, “Cultural Modernisation And Film Industry: Naked Facts From IMDB,” SSRN Electron. J., 2021, doi: 10.2139/ssrn.3915414.
G. Ignatow and R. Mihalcea, An Introduction to text Mining. 2017.
Z. Jiang, B. Gao, Y. He, Y. Han, P. Doyle, and Q. Zhu, “Text Classification Using Novel Term Weighting Scheme-Based Improved TF-IDF for Internet Media Reports,” Math. Probl. Eng., vol. 2021, no. ii, 2021, doi: 10.1155/2021/6619088.
M. Lan, C. L. Tan, J. Su, and Y. Lu, “Supervised and traditional term weighting methods for automatic text categorization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 4, pp. 721–735, 2009, doi: 10.1109/TPAMI.2008.110.
T. Sabbah et al., “Modified frequency-based term weighting schemes for text classification,” Appl. Soft Comput., vol. 58, pp. 193–206, 2017, doi: 10.1016/j.asoc.2017.04.069.
M. McTear, Z. Callejas, and D. Griol, “The conversational interface: Talking to smart devices,” Conversational Interface Talk. to Smart Devices, no. 2009, pp. 1–422, 2016, doi: 10.1007/978-3-319-32967-3.
S. Raj, Pethuru; Deepu, “A Framework for Text Analytics using the Bag of Words (BoW) Model for Prediction,” Int. J. Adv. Netw. Appl., pp. 975–0282, 2016, [Online]. Available: https://archive.ics.uci.edu/ml/datasets/Bag+of+Words.
W. S. Noble, “What is a support vector machine?,” Nat. Biotechnol., vol. 24, no. 12, pp. 1565–1567, 2006, doi: 10.1038/nbt1206-1565.
R. Freund, F. Girosi, and E. Osuna, “An Improved Training Algorithm for Support Vector Machines 1 Introduction 2 Support Vector Machines,” Neural Networks Signal Process. [1997] VII. Proc. 1997 IEEE Work., pp. 276–285, 1997, [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.7400%5Cnhttp://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=622408&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D622408.
E. Elgeldawi, A. Sayed, A. R. Galal, and A. M. Zaki, “Hyperparameter tuning for machine learning algorithms used for arabic sentiment analysis,” Informatics, vol. 8, no. 4, pp. 1–21, 2021, doi: 10.3390/informatics8040079.
R. Hossain and D. D. Timmer, “Machine learning model optimization with hyper parameter tuning approach,” Glob. J. Comput. Sci. Technol., vol. 21, no. 2, pp. 7–13, 2021.
E. Beauxis-Aussalet and L. Hardman, “Simplifying the Visualization of Confusion Matrix,” pp. 1–2.
Copyright (c) 2023 Fidya Farasalsabila, Verra Budhi Lestari, D. Diffran Nur Cahyo,Hanafi, Tutik Lestari, Fahmi Rusdi Al Islami, M. Akbar Maulana

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with Inform: Jurnal Ilmiah Bidang Teknologi Informasi dan Komunikasi agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.