عرض تفاصيل البحث
المجلد | |
|
تاريخ النشر | |
|
عنوان البحث |
|
|
ملخص البحث |
In this paper, we address the problems of line segmentation and character segmentation for discrete Arabic script documents. A robust algorithm to efficiently segment lines in very general textual documents is presented. Also, we present an approach to character segmentation problem without any restriction on the style of text, hence; more realistic documents are dealt with. Vertical white cuts are supported with connected component analysis to aid in segmentation. A dataset that contains natural Arabic text with diacritics was used as the basis of our dataset. The line segmentation algorithm was tested on the whole dataset consisting of the 157 images achieving a line success rate that exceeds 97%. Some extra lines were generated. These extra lines can be appended to adjacent actual lines in a succeeding OCR stage. Our approach for character segmentation was tested on two pages of discrete Arabic script. The overall character success rate was 94.4%. The algorithms were implemented and run on a Pentium III 866MHz PC with 128 MB RAM. The average time required to segment one character was 62 msec. Programming optimisations and more powerful computers can be used to speed up the segmentation process. Keywords: OCR, Discrete Arabic Script, Line Segmentation, Character Segmentation |
|
لغة البحث | ENGLISH | |
الباحثون |
|
|
ملف مرفق | ابراهيم أبو هيبة فهرس رقم 8.pdf | |