Heart Disease Classification: A Feature Engineering Approach

Authors

  • S Mirudula, Akram Pasha, Sathya Rupa M, Ritu Shenoy

Abstract

The healthcare applications are in the demand for rigorous medical data analytics algorithms. Machine Learning (ML) has taken a leading role in various data analytics field including healthcare applications. The ML algorithms are influenced by the various features of the medical data sets and eventually contribute to enhance the accuracy of classification of the diseases. In this paper, an attempt is made to experiment the level of influence of the various features of the Heart Disease Data set (HDD) through both feature selection and feature extraction techniques to enhance the classification accuracy of the various ML algorithms. Six ML classification algorithms have been deployed such as k-Nearest Neighbor (kNN), Decision Tree (DT), Gaussian Naive Bayes (GNB), Logistic Regression (LR), Support Vector Machines (SVM) and Random Forest (RF)  in this study. The HDD consists of 303 records with 14 attributes of165 patients being tested on heart disease. The HDD was normalized and partitioned as Training and Testing sets in the ratio of 0.8 and 0.2 before training the ML classifiers. After scaling, it was observed that there was a hike in the accuracy of the SVM Classifier from 65% to 87% which is the highest compared to all other models. Weightage of all the attributes has been computed using RF-based feature importance. The Principal Component Analysis (PCA) based SVM was found to give the highest accuracy of 90.16% among all the classification models employed in the study.

 Keywords: Classification; Machine Learning; Feature Selection; Feature extraction; PCA

Downloads

Published

2020-05-16

Issue

Section

Articles