عرض تفاصيل البحث



المجلد مجلة جامعة الأزهر , سلسلة العلوم الطبيعية , ديسمبر 2011 , مجلد13, عدد1
تاريخ النشر 2011
عنوان البحث

Estimating the Number of Clusters in Genetics of Acute Lymphoblastic Leukemia Data

ملخص البحث

Cluster analysis is a statistical technique that has been widely used for the analysis of genetic data to cluster gene expressions and other data in many fields. However, the problem encountered in the literature is the choice of the number of clusters. Specifically, the problem of estimating the number of clusters in a given population particularly for gene expressions is of a great interest and needs to be addressed. Many algorithms are used in practice for that purpose in different fields.  In this paper we examined different clustering algorithms, for estimating the number of clusters, that are based on probabilities, covariance matrix, and eigenvalues on real data sets using R package algorithms. Specifically, we examined the model based algorithm (Mclust) and hierarchical clustering algorithm (hclust) and compared these algorithms with Partition Around Medoid (PAM) algorithm. The results we found are that the first algorithm can be used only for large data sets and the second one can be safely used for small data sets. The Mclust is a model based clustering approach built on Bayesian Information Criterion (BIC) which maximizes (EM) algorithm. The results of these two algorithms are compared with a third approach based on Partition Around Medoid (PAM) algorithm but selects the number of clusters manually according to the average silhouette width and selecting the number of clusters as that number which maximizes the average silhouette width. The later algorithm although allows to estimate the number of clusters manually, it has the best performance. However, the first two algorithms can be automated to produce the best estimate for the number of clusters in a given data set. These algorithms can be applied not only for genetic data but also for many other fields such as market research.

Keywords: clustering, model based algorithm, hierarchical clustering, Partition Around Medoid, Bayesian Information Criterion, average silhouette, hierarchical tree, gene expression.

لغة البحث ENGLISH
الباحثون

Khaled I. A. Almghari

Mahmoud K. Okasha

ملف مرفق 9-محمود عكاشة وخالد مغاري للنشر.pdf
   

البحث العلمي

  • كلمة عميد البحث العلمي
  • أهداف البحث العلمي
  • الرؤية و الرسالة
  • نظام البحث العلمي
  • مجلس البحث العلمي

مجلة العلوم الطبيعية

  • هيئة تحرير مجلة العلوم الطبيعية
  • قواعد النشر
  • إجراءات تسليم البحث
  • الأعداد الصادرة

مجلة العلوم الإنسانية

  • هيئة تحرير مجلة العلوم الإنسانية
  • قواعد النشر
  • إجراءات تسليم البحث
  • الأعداد الصادرة