Segmentation of Discrete Arabic Script Document Images

عرض تفاصيل البحث

المجلد

مجلة جامعة الأزهر , سلسلة العلوم الطبيعية , ديسمبر 2006 , مجلد8, عدد1

تاريخ النشر

2006

عنوان البحث

Segmentation of Discrete Arabic Script Document Images

ملخص البحث

In this paper, we address the problems of line segmentation and character segmentation for discrete Arabic script documents. A robust algorithm to efficiently segment lines in very general textual documents is presented. Also, we present an approach to character segmentation problem without any restriction on the style of text, hence; more realistic documents are dealt with. Vertical white cuts are supported with connected component analysis to aid in segmentation. A dataset that contains natural Arabic text with diacritics was used as the basis of our dataset. The line segmentation algorithm was tested on the whole dataset consisting of the 157 images achieving a line success rate that exceeds 97%. Some extra lines were generated. These extra lines can be appended to adjacent actual lines in a succeeding OCR stage. Our approach for character segmentation was tested on two pages of discrete Arabic script. The overall character success rate was 94.4%. The algorithms were implemented and run on a Pentium III 866MHz PC with 128 MB RAM. The average time required to segment one character was 62 msec. Programming optimisations and more powerful computers can be used to speed up the segmentation process.

Keywords: OCR, Discrete Arabic Script, Line Segmentation, Character Segmentation

لغة البحث

ENGLISH

الباحثون

ابراهيم سليمان ابراهيم ابو هيبة

ملف مرفق

ابراهيم أبو هيبة فهرس رقم 8.pdf

عرض تفاصيل البحث

Segmentation of Discrete Arabic Script Document Images

ابراهيم سليمان ابراهيم ابو هيبة

البحث العلمي

مجلة العلوم الطبيعية

مجلة العلوم الإنسانية