عرض تفاصيل البحث
المجلد | |
||
تاريخ النشر | |
||
عنوان البحث |
|
||
ملخص البحث |
Cluster analysis is a statistical
technique that has been widely used for the analysis of genetic data to cluster
gene expressions and other data in many fields. However, the problem
encountered in the literature is the choice of the number of clusters.
Specifically, the problem of estimating the number of clusters in a given
population particularly for gene expressions is of a great interest and needs
to be addressed. Many algorithms are used in practice for that purpose in
different fields. In this paper we
examined different clustering algorithms, for estimating the number of
clusters, that are based on probabilities, covariance matrix, and eigenvalues
on real data sets using R package algorithms. Specifically, we examined the
model based algorithm (Mclust) and hierarchical clustering algorithm (hclust)
and compared these algorithms with Partition Around Medoid (PAM) algorithm. The
results we found are that the first algorithm can be used only for large data
sets and the second one can be safely used for small data sets. The Mclust is a
model based clustering approach built on Bayesian Information Criterion (BIC)
which maximizes (EM) algorithm. The results of these two algorithms are
compared with a third approach based on Partition Around Medoid (PAM) algorithm
but selects the number of clusters manually according to the average silhouette
width and selecting the number of clusters as that number which maximizes the
average silhouette width. The later algorithm although allows to estimate the
number of clusters manually, it has the best performance. However, the first
two algorithms can be automated to produce the best estimate for the number of
clusters in a given data set. These algorithms can be applied not only for
genetic data but also for many other fields such as market research. Keywords: clustering,
model based algorithm, hierarchical clustering, Partition Around Medoid,
Bayesian Information Criterion, average silhouette, hierarchical tree, gene
expression. |
||
لغة البحث | ENGLISH | ||
الباحثون |
|
||
ملف مرفق | 9-محمود عكاشة وخالد مغاري للنشر.pdf | ||