Please use this identifier to cite or link to this item: https://repository.uksw.edu//handle/123456789/30729
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorHartomo, Kristoko Dwi-
dc.contributor.authorLopo, Joanito Agili-
dc.date.accessioned2023-08-02T04:58:15Z-
dc.date.available2023-08-02T04:58:15Z-
dc.date.issued2023-05-29-
dc.identifier.urihttps://repository.uksw.edu//handle/123456789/30729-
dc.description.abstractDetecting fraud in the healthcare insurance dataset is challenging due to severe class imbalance, where fraud cases are rare compared to non-fraud cases. Various techniques have been applied to address this problem, such as oversampling and undersampling methods. However, there is a lack of comparison and evaluation of these sampling methods. Therefore, the research contribution of this study is to conduct a comprehensive evaluation of the different sampling methods in different class distributions, utilizing multiple evaluation metrics, including π΄π‘ˆπΆπ‘…π‘‚πΆ, 𝐺 βˆ’ π‘šπ‘’π‘Žπ‘›, 𝐹1π‘šπ‘Žπ‘π‘Ÿπ‘œ, Precision, and Recall. In addition, a model evaluation approach be proposed to address the issue of inconsistent scores in different metrics. This study employs a real-world dataset with the XGBoost algorithm utilized alongside widely used data sampling techniques such as Random Oversampling and Undersampling, SMOTE, and Instance Hardness Threshold. Results indicate that Random Oversampling and Undersampling perform well in the 50% distribution, while SMOTE and Instance Hardness Threshold methods are more effective in the 70% distribution. Instance Hardness Threshold performs best in the 90% distribution. The 70% distribution is more robust with the SMOTE and Instance Hardness Threshold, particularly in the consistent score in different metrics, although they have longer computation times. These models consistently performed well across all evaluation metrics, indicating their ability to generalize to new unseen data in both the minority and majority classes. The study also identifies key features such as costs, diagnosis codes, type of healthcare service, gender, and severity level of diseases, which are important for accurate healthcare insurance fraud detection. These findings could be valuable for healthcare providers to make informed decisions with lower risks. A well-performing fraud detection model ensures the accurate classification of fraud and non-fraud cases. The findings also can be used by healthcare insurance providers to develop more effective fraud detection and prevention strategies.en
dc.language.isoenen_US
dc.subjectHealthcare Insuranceen_US
dc.subjectImbalanced Dataseten_US
dc.subjectOversamplingen_US
dc.subjectXGBoosten_US
dc.subjectFraud Detectionen_US
dc.subjectUndersamplingen_US
dc.titleEvaluating Sampling Techniques for Healthcare Insurance Fraud Detection in Imbalanced Dataseten_US
dc.typeThesisen_US
uksw.departmentSistem Informasien_US
uksw.facultyFakultas Teknologi Informasien_US
uksw.identifier.kodeprodiKODEPRODI57201#Sistem Informasi-
uksw.identifier.nidnNIDN0626127301-
uksw.identifier.nimNIM682019013-
uksw.programSistem Informasien_US
Appears in Collections:T1 - Information Systems

Files in This Item:
File Description SizeFormat 
T1_682019013_Judul.pdf15.88 MBAdobe PDFView/Open
T1_682019013_Isi.pdf
  Until 9999-01-01
1.05 MBAdobe PDFView/Open
T1_682019013_Daftar Pustaka.pdf770.85 kBAdobe PDFView/Open
T1_682019013_Formulir Pernyataan Persetujuan Penyerahan Lisensi Nonekslusif Tugas Akhir dan Pilihan Embargo.pdf
  Restricted Access
466.94 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.