A Study on Improved Outlier Detection and Prediction Based on Hybrid Machine Learning
Data collection and preprocessing are important processes for training prediction models. If researcher cannot find outliers of wrongly collected, the noise data will be learned in the prediction model learning as well. Therefore, it can be considered that the removal of the outliers is essential for preprocessing process in order to learn prediction model more accurately.
In this paper, proposing hybrid machine learning for outlier detection and prediction by applying two algorithms. Two medical data were selected for the experiment, and the information gain was calculated before applying the DBSCAN, and two attributes with the highest relevance to the label value were extracted. DBSCAN parameters were selected based on extracted attributes. First, in the outlier detection process, in this paper, experiment was implemented with the proposed algorithm using the characteristics of DBSCAN, which is a density-based algorithm, and preprocessed the learning data three times. Second, the preprocessed data was evaluated by Neural Network and Boosted Decision Tree. Experimental results show that model accuracy of preprocessed data is similar or better than model accuracy of raw data. Applying the hybrid model proposed in this paper, it is expected higher accuracy and generalization of outliers and frequent medical data.