Predicting Breast Cancer Survivability Rates a Differentiation of Three Data Mining Models

Authors

  • Ghofran Othoum
  • Wadee Al-Halabi
  • Houria Oudghiri

Abstract

The new approach in cancer research that shifted from pure long-term biological and clinical experiments to computer-generated experiments is the main inspiration for this project. Three Data mining techniques (Decision Trees, Neural Networks and Naïve Bayes) were built to compare their performance based on three main parameters: accuracy, sensitivity and specificity. The experiment was set up with multi-layer perceptron as the baseline scheme and with statistical significance of 0.05. The models were built using data collected from Saudi Arabia, more specifically, from King Faisal Specialist Hospital and Research Center. The prediction is based on 8 attributes: age, birth location, reason for no radiation, laterality, grade, sex, primary site and marital status. However, the data collected had around 680 instances which were not sufficient to build the models. Sampling with a random seed was completed to double the size of the training dataset. The results showed that decision tree had the highest accuracy and sensitivity with values (0.979, 0.988) respectively. Naïve bayes had the highest classification error (0.094) and neural networks had the highest specificity (0.896).

Downloads

Published

2019-12-19

Issue

Section

Articles