Journal of Health and Biomedical Informatics

fa تشخیص بیماری دیابت نوع2 با استفاده از درخت تصمیم C4.5 A Detection of Type2 Diabetes using C4.5 Decision Tree داده کاوی Data Mining پژوهشي اصیل Original Article <div style="text-align: justify;">مقدمه: یکی از شایعترین بیماری ها در دنیای امروز بیماری دیابت است و سالانه شیوع دیابت در سطح جهان حدود 6 درصد افزایش مییابد. استفاده از تکنیک های داده کاوی برای ایجاد مدل های پیشگویی کننده، جهت شناسایی افراد در معرض خطر برای کاهش عوارض ناشی از بیماری بسیار کمک کننده است. در این پژوهش با استفاده از درخت تصمیم 5.C4 به روشهای پیشگیری و تشخیص این بیماری پرداخته شد. روش: در این پژوهش کاربردی- توصیفی از دادههای استاندارد UCI و مجموعه داده diabetes-indians-pima استفاده شد. این پایگاه داده شامل 768 رکورد با 8 فیلد می باشد. تجزیه و تحلیل به کمک نرمافزار 3.6 Weka با به کارگیری روش CRISP3 انجام شد. در بخش مدلسازی درخت تصمیم 5.C4 با به کارگیری متغیرهای ورودی و تعیین متغیر هدف ایجاد شد. همچنین جهت ارزیابی مدل از شاخص های حساسیت، ویژگی، دقت، ارزش اخباری مثبت و منفی استفاده شد. نتایج: با توجه به مدل استفاده شده مشخص شد که به ترتیب متغیرهای میزان بالای قند خون دوساعته، تعداد دفعات بالای حاملگی، سن بالا، فشارخون دیاستولیک بالا، سابقه خانوادگی و شاخص توده بدنی(BMI )بالا، بیشترین تأثیر را در ابتلا به بیماری دیابت نوع 2 دارا هستند. نرخ دسته بندی برابر با 73/8 %و دقت الگوریتم 5.C4 برابر با 79 %به دست آمد. نتیجه گیری: در مقایسه با نتایج مطالعات انجام شده در حوزه داده کاوی بیماری دیابت، دقت به دست آمده الگوریتم پیشنهادی قابل قبول است. بیشترین عوامل تأثیرگذار بر بیماری دیابت شناسایی شدند. همچنین قوانینی استخراج شد که می تواند به عنوان الگویی در جهت پیشگویی احتمال ابتلا افراد به بیماری دیابت استفاده شود.</div> <div style="text-align: justify;">Introduction: One of the most common diseases in the world is diabetes and the global prevalence of diabetes increases by about six percent annually. The use of data mining techniques to create predictive models is very helpful in identifying people at risk and reducing the complications of the disease. In this study, through using decision tree C4.5, methods of prevention and treatment of diabetes were investigated. Methods: In this applied and descriptive study, we used the standard UCI data and the pima-Indians-diabetes data set. This database contains 768 records with 8 fields. The analysis was done using Weka software using the CRISP3 methodology. In modeling decision tree, C4.5 was created using input variables and determining target variables. Also, the sensitivity, specificity, accuracy, as well as positive and negative predictive values were used to evaluate the model. Results: According to the model, high blood sugar levels, high gravidity, high age, high diastolic blood pressure, familial history and high BMI have respectively the highest effects on type 2 diabetes mellitus. The ranking rate was 73.8% and the accuracy of the C4.5 algorithm was 79%. Conclusion: Compared to the results of studies in the field of data mining for diabetes, the accuracy of the proposed algorithm is acceptable. The most effective factors on diabetes were identified. Also, rules were developed that can be used as a model to predict the risk of diabetes in people.  </div> داده‌کاوی, بیماری دیابت نوع 2, درخت تصمیم C4.5 Data mining, Type2 diabetes, C4.5 Decision tree 293 303 http://jhbmi.ir/browse.php?a_code=A-10-247-2&slc_lang=fa&sid=1 Hamed Sabbagh Gol حامد صباغ گل hamedsabbagh@gmail.com 10031947532846005944 10031947532846005944 Yes M.Sc in Computer Engineering, Faculty of Computer, Department of Computer Engineering, Payame Noor University (PNU), Iran مربی، کارشناسی ارشد مهندسی کامپیوتر، عضو هیات علمی گروه کامپیوتر، دانشگاه پیام نور، ایران