The Department of Computer Science at the University of Cyprus cordially invites you to the Colloquium entitled:
Approximate pattern matching for OCR texts
Speaker: Dr. Manolis Christodoulakis
The process of digitising old books and manuscripts is of immense importance to a variety of people, such as librarians, academics, publishers, etc. This task is achieved by scanning the documents and then performing Optical Character Recognition (OCR) to obtain text that can be stored, searched for, indexed etc. Quite often the original paper-copies of the publications are of poor print quality, leading to digital texts that contain errors. Consequently, any attempt for exact pattern matching will fail, and algorithms for approximate pattern matching must be used, where matches similar (rather than identical) to the pattern can be identified. There exist several different ways for defining text similarity, which however fail to incorporate the specific nature of errors that occur in OCR-texts. In this talk I will present a recently developed similarity measure that is specifically tailored for this purpose. In particular, it incorporates optical similarities of characters as well as matching combinations of characters to yield better approximate matching. Early implementations suggest that it is a promising method, and there is number of variants worth exploring in the future.
Dr. Manolis Christodoulakis received his BSc from the Department of Computer Engineering and Informatics, University of Patras, and his PhD from the Department of Computer Science in King's College London. In the past, he has worked as a Research Associate and later as an External Lecturer in King's College. Since September 2007, he serves as a Lecturer in the Secure Systems and Software Development (SD) field, in the School of Computing, Information Technology & Engineering. His research interests include: design and analysis of combinatorial algorithms, sequence analysis (pattern matching, repetition finding etc.), computational biology/bioinformatics, and computational music analysis.
|Sponsor: The CS Colloquium Series is supported by a generous donation from