Data Mining is an integral part of KDD (Knowledge Discovery in Databases) process. It deals with discovering unknown patterns and knowledge hidden in data. Classification is a pivotal data mining technique with a very wide range of applications. Now a day’s diabetic has become a major disease which has almost crippled people across the globe. It is a medical condition that causes the metabolism to become dysfunctional and increases the blood sugar level in the body and it becomes a major concern for medical practitioner and people at large. An early diagnosis is the starting point for living well with diabetes. Classification Analysis on diabetic dataset is a part of this diagnosis process which can help to detect a diabetic patient from non-diabetic. In this paper classification algorithms are applied on the Pima Indian Diabetic Database which is collected from UCI Machine Learning Laboratory. Various classification algorithms which are Naïve Bayes Classifier, Logistic Regression, Decision Tree Classifier, Random Forest Classifier, Support Vector Classifier and XGBoost Classifier are analyzed and compared based on the accuracy delivered by the models.
Sahoo, Sipra; Mitra, Tushar; Mohanty, Arup Kumar; Sahoo, Bharat Jyoti Ranjan; and Rath, Smita
"Diabetes Prediction: A Study of Various Classification based Data Mining Techniques,"
International Journal of Computer Science and Informatics: Vol. 4:
3, Article 1.
Available at: https://www.interscience.in/ijcsi/vol4/iss3/1