Systematic XGBoost Pipeline for Phishing Website Detection: Hyperparameter Tuning Approach with Nested Cross-Validation

Najma Prameswari; Wildanil  Ghozi; Fauzi Adi  Rafrastara

doi:10.25139/inform.v11i1.11221

Authors

Najma Prameswari
Wildanil Ghozi
Fauzi Adi Rafrastara

DOI:

https://doi.org/10.25139/inform.v11i1.11221

Keywords:

URL-Based Phishing Detection, XGBoost, Hyperparameter Tuning, Nested Cross-Validation

Abstract

Phishing attacks have become increasingly sophisticated and pose a critical threat to cybersecurity, with more than 4.7 million attacks reported in 2023. Traditional blacklist and rule-based detection struggles to keep pace with evolving URL patterns and impersonation techniques. Rather than proposing a new classifier, this study presents a systematic and reproducible XGBoost-based phishing detection pipeline intended as an academic baseline with operationally motivated evaluation (not a production-integrated system). The Mendeley Phishing Websites dataset (58,645 URLs; 30,647 phishing and 27,998 legitimate) with 111 URL- and website-based features. The pipeline applies data cleaning, column-transformer-based pre-processing, and a stratified 80:20 train–test split, with all pre-processing steps fit on the training data only to reduce leakage risk. The final model uses 98 active features after removing 13 constant attributes; quasi-constant features are analyzed and retained. Continuous features are sanitised, log-transformed, and standardised, while binary features are left unchanged. Hyperparameters are tuned via stratified cross-validation using the ROC-AUC metrics, followed by early stopping, probability calibration, and simple threshold tuning. On the hold-out test set, the optimized model, set at a 0.50 decision threshold, achieves 96.34% accuracy, 96.31% precision, 96.70% recall, and 96.51% F1-score, improving over a default XGBoost baseline and yielding fewer false positives and false negatives. These results show that a systematically designed XGBoost pipeline provides a strong and reproducible baseline for URL-based phishing website detection and offers a practical foundation for future work on cost-sensitive learning and temporal validation. This study is limited to tabular URL/website feature-based detection and does not include visual content analysis, HTML/DOM parsing, or deep learning on raw text/images.

References

Q. E. u. Haq, M. H. Faheem, and I. Ahmad, "Detecting phishing URLs based on a deep learning approach to prevent cyber-attacks," Applied Sciences, vol. 14, no. 22, pp. 10086, Nov. 2024, doi: 10.3390/app142210086.

N. F Almujahid et al., "Comparative evaluation of machine learning algorithms for phishing website detection," PeerJ Comput. Sci., vol. 10, e2131, 2024, doi: 10.7717/peerj-cs.2131.

S. Khan, B. Khan, S. Jan, S. Ullah, and Aiman, "Empirical analysis of neural networks-based models for phishing website classification using diverse datasets," Journal of Cyber Security, vol. 5, no. 1, pp. 47–66, 2023, doi: 10.32604/jcs.2023.045579.

K. Omari et al., "Comparative study of machine learning algorithms for phishing website detection," International Journal of Advanced Computer Science and Applications, vol. 14, no. 9, pp. 417–425, 2023, doi: 10.14569/IJACSA.2023.0140945.

T. Srinivasa, P. K. Srivastava, and M. Sharma, "A comprehensive survey on phishing website detection techniques," SGVU International Journal of Convergence of Technology and Management, vol. 11, no. 2, pp. 52–59, 2025.

M. R. T. Utami, M. H. Hilman, and S. Yazid, "Enhancing phishing detection: Integrating XGBoost with feature selection techniques," SSRN preprint, Jan. 2025, doi: 10.2139/ssrn.5087049.

K. Omari et al., "Phishing detection using gradient boosting classifier," ICECMSN, vol. 230, pp. 120–127, 2023, doi: 10.1016/j.procs.2023.12.067.

S. S. M. Aldaham, O. Ouda, and A. A. Abd El-Aziz, "Improved detection of phishing websites using machine learning," International Journal of Intelligent Systems and Applications in Engineering, vol. 12, no. 21s, pp. 4619–4633, 2024.

Anti-Phishing Working Group (APWG), "Phishing activity trends report, 4th quarter 2023," Anti-Phishing Working Group, Feb. 2024.

B. Toulas, "Hackers target FCC, crypto firms in advanced Okta phishing attacks," BleepingComputer, Mar. 2024

M. Hernandez, "Ripple CTO warns of huge phishing surge as seed phrases become targets," The Currency Analytics, Oct. 2025.

A. Fajar et al., "Enhancing phishing detection through feature importance and selection," arXiv preprint, 2024, arXiv:2411.06860, doi: 10.48550/arXiv.2411.06860.

T. Nagunwa, "Comparative analysis of nature-inspired metaheuristic techniques for optimizing phishing website detection," AI, vol. 3, no. 3, pp. 344–367, 2024.

M. Sokolova and G. Lapalme, "A systematic analysis of performance measures for classification tasks," Information Processing & Management, vol. 45, no. 4, pp. 427–437, 2009, doi: 10.1016/j.ipm.2009.03.002.

D. M. W. Powers, "Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation," Journal of Machine Learning Technologies, vol. 2, no. 1, pp. 37–63, 2011.

G. C. Cawley and N. L. C. Talbot, "On over-fitting in model selection and subsequent selection bias in performance evaluation," Journal of Machine Learning Research, vol. 11, pp. 2079–2107, 2010.

S. Varma and R. Simon, "Bias in error estimation when using cross-validation for model selection," BMC Bioinformatics, vol. 7, no. 91, 2006.

Lukito and W. B. T. Handaya, “Deteksi website phishing menggunakan teknik machine learning,” Jurnal Informatika Atma Jogja, vol. 6, no. 1, pp. 69–80, May 2025, doi: 10.24002/jiaj.v6i1.11538.

G. Vrbančič, I. Fister Jr., and V. Podgorelec, "Datasets for phishing websites detection," Data in Brief, vol. 33, art. 106438, 2020, doi: 10.1016/j.dib.2020.106438.

J. Bergstra and Y. Bengio, "Random search for hyper-parameter optimization," Journal of Machine Learning Research, vol. 13, pp. 281–305, 2012.

H. A. K. Afandi, M. L. F. Al-Dzaki, N. Qomariasih, and R. A. Wildana, “GuardSurfing: Ekstensi browser sebagai alat bantu deteksi website phishing dengan metode klasifikasi XGBoost untuk deteksi URL phishing berbasis Flask framework,” Info Kripto, vol. 19, no. 2, pp. 73–85, 2025, doi: 10.56706/ik.v19i2.124.

S. Sheikhi and P. Kostakos, "Safeguarding cyberspace: Enhancing malicious website detection with PSO-optimized XGBoost and firefly-based feature selection," Computers & Security, vol. 142, p. 103885, Jul. 2024, doi: 10.1016/j.cose.2024.103885.

H. Ghalechyan et al., "Phishing URL detection with neural networks: An empirical study," Scientific Reports, vol. 14, no. 1, art. 25134, 2024, doi: 10.1038/s41598-024-74725-6.

T. Chen and C. Guestrin, "XGBoost: A scalable tree boosting system," in Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2016.

I. Guyon and A. Elisseeff, "An introduction to variable and feature selection," Journal of Machine Learning Research, vol. 3, pp. 1157–1182, 2003.

R. Kohavi, "A study of cross-validation and bootstrap for accuracy estimation and model selection," in Proc. IJCAI, 1995.

S. Sathyanarayanan and B. R. Tantri, "Confusion matrix-based performance evaluation metrics," African Journal of Biomedical Research, vol. 27, no. 4S, pp. 4023–4031, Nov. 2024, doi: 10.53555/AJBR.v27i4S.4345.

M. Grandini, E. Bagli, and G. Visani, "Metrics for multi-class classification: An overview," arXiv Preprint, arXiv:2008.05756, 2020.

A. F. Tjahjono, H. Hasan, R. P. Putera, D. M. P. Indranto, and A. T. Hermawan, “Klasifikasi URL phishing untuk SIEM: Perbandingan model machine learning XGBoost dan deep learning TabNet dalam deteksi ancaman siber,” Sains Data: Jurnal Studi Matematika dan Teknologi, vol. 3, no. 2, pp. 62–71, Jul. 2025, doi: 10.52620/sainsdata.v3i2.227.

F. Pedregosa et al., "Scikit-learn: Machine learning in Python," Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.

S. Kaufman, S. Rosset, C. Perlich, and O. Stitelman, "Leakage in data mining: Formulation, detection, and avoidance," ACM Transactions on Knowledge Discovery from Data, vol. 6, no. 4, art. 15, 2012, doi: 10.1145/2382577.2382579.

C. Feng, H. Wang, and N. Lu, "Log transformation and its implications for data analysis," Shanghai Archives of Psychiatry, vol. 26, no. 2, pp. 105–109, 2014, doi: 10.3969/j.issn.1002-0829.2014.02.009.

T. Fawcett, "An introduction to ROC analysis," Pattern Recognition Letters, vol. 27, no. 8, pp. 861–874, 2006.

J. Davis and M. Goadrich, "The relationship between precision-recall and ROC curves," in Proc. ICML, 2006.

B. Zadrozny and C. Elkan, "Transforming classifier scores into accurate multiclass probability estimates," in Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2002.

A. Niculescu-Mizil and R. Caruana, "Predicting good probabilities with supervised learning," in Proc. ICML, 2005.

M. Bahaghighat, M. Ghasemi, and F. Ozen, "A high-accuracy phishing website detection method based on machine learning," Journal of Information Security and Applications, vol. 77, p. 103553, Sep. 2023, doi: 10.1016/j.jisa.2023.103553

Systematic XGBoost Pipeline for Phishing Website Detection: Hyperparameter Tuning Approach with Nested Cross-Validation

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Make a Submission

Information