Skip to main content
  • A physicist by education, I received my PhD in Computer Science from the University of Liverpool, UK in 2003. I worke... moreedit
Results of ICDAR 2017 Robust Reading Challenge on Omnidirectional Video are presented. This competition uses Downtown Osaka Scene Text (DOST) Dataset that was captured in Osaka, Japan with an omnidirectional camera. Hence, it consists of... more
Results of ICDAR 2017 Robust Reading Challenge on Omnidirectional Video are presented. This competition uses Downtown Osaka Scene Text (DOST) Dataset that was captured in Osaka, Japan with an omnidirectional camera. Hence, it consists of sequential images (videos) of different view angles. Regarding the sequential images as videos (video mode), two tasks of localisation and end-to-end recognition are prepared. Regarding them as a set of still images (still image mode), three tasks of localisation, cropped word recognition and end-to-end recognition are prepared. As the dataset has been captured in Japan, the dataset contains Japanese text but also include text consisting of alphanumeric characters (Latin text). Hence, a submitted result for each task is evaluated in three ways: using Japanese only ground truth (GT), using Latin only GT and using combined GTs of both. Finally, by the submission deadline, we have received two submissions in the text localisation task of the still imag...
A number of mechanisms are responsible for the generation of reversible or irreversible drift in the response of a sensor. In this letter, we discuss three approaches for the identification of reversible state dependent drift in sensors... more
A number of mechanisms are responsible for the generation of reversible or irreversible drift in the response of a sensor. In this letter, we discuss three approaches for the identification of reversible state dependent drift in sensors through the use of the Extended Kalman Filter. We compare their performance by simulation and demonstrate their validity by estimating the drift of an accelerometer, modeled as a weakly nonlinear system. A. Chorti, D. Karatzas, N. White and C. Harris are with the Electronic Systems Design Group, Department of Electronics and Computer Science, University of Southampton, SO17 1BJ, UK (e-mails: ac2@ecs.soton.ac.uk, dk3@ecs.soton.ac.uk, nmw@ecs.soton.ac.uk, cjh@ecs.soton.ac.uk.) Corresponding author: A. Chorti, ac2@ecs.soton.ac.uk
There is a significant need to extract and recognise the semantically-important text contained in images on Web pages. This paper proposes a new approach to text extraction from this special class of images. The method attempts to emulate... more
There is a significant need to extract and recognise the semantically-important text contained in images on Web pages. This paper proposes a new approach to text extraction from this special class of images. The method attempts to emulate closer than before the way humans perceive colour differences in order to differentiate between text and background regions. Pixels of similar colour (as humans see it) are merged into components and a fuzzy inference mechanism (using connectivity and colour distance features) is devised to group components into larger character-like regions.
Most colour‐normal human observers have no difficulty to adjust a coloured light such that it appears neither red nor green, or such that it appears neither yellow nor blue. Moreover, it has been shown that these hue judgements are not... more
Most colour‐normal human observers have no difficulty to adjust a coloured light such that it appears neither red nor green, or such that it appears neither yellow nor blue. Moreover, it has been shown that these hue judgements are not significantly influenced by language or age and individual differences in colour sensitivity are not reflected in the unique‐hue settings. Here we show how we can use the invariance of these unique‐hue judgements to develop a colour calibration technique for display devices, which eliminates the need for an external calibration standard or a measurement device.
Research Interests:
La cerca d’objectes de text en imatges d’escena reals es un problema obert i una area de cerca molt activa la visio per computador. S’han proposat un gran nombre de metodes basats en l’extensio dels metodes des de l’analisi de documents o... more
La cerca d’objectes de text en imatges d’escena reals es un problema obert i una area de cerca molt activa la visio per computador. S’han proposat un gran nombre de metodes basats en l’extensio dels metodes des de l’analisi de documents o inspirat en metodes de deteccio d’objectes. No obstant aixo, el problema de la cerca d’objectes en imatges d’escena reals segueix sent un problema extremadament dificil a causa de la gran variabilitat en l’aparen¸ca dels objectes. Aquesta tesi es basa en els mes recents troballes en la literatura de l’atencio visual, introduint un nou model computacional de visio guiada que apunta descriure la cerca de text en imatges d’escenes reals. En primer lloc es presenten els resultats mes pertinents de la literatura cientifica en relacio amb l’atencio visual, els moviments oculars i la cerca visual. Els mes rellevants models d’atencio son discutits i integrats amb recents observacions sobre la funcio dels anomenats ’top-down constraints’ i l’emergent necessitat d’un model estratificat d’atencio en que la saliencia no es l’unic factor guia d’atencio. L’atencio visual s’explica per la interaccio de diversos factors moduladors, com ara objectes, valor, plans i saliencia. S’introdueix la nostra formulacio probabilistica dels mecanismes d’atencio en es- cenes reals per a la tasca de cerca d’objectes. El model es basa en l’argument que el desplegament d’atencio depen de dos processos diferents pero interactuants: un proces d’atencio que assigna valor a les fonts d’informacio i un proces motor que uneix flexiblement informacio amb l’accio. En aquest marc, l’eleccio d’on buscar la propera tasca es dependent i orientada a les classes d’objectes incrustats en imatges d’escenes reals. La dependencia de la tasca es te en compte en explotar el valor i la recompensa de contemplar certes parts o proto-objectes de la imatge que proporcionen una esclarissada representacio dels objectes en l’escena. A la seccio experimental prova el model en condicions de laboratori, comparant les simulacions del model amb dades d’experiments de eye tracking. La comparacio es qualitativa en termes de trajectories d’exploracio i quantitativa, en termes de similitud estadistica de l’amplitud de moviments oculars. Els experiments s’han realitzat amb dades de eye tracking tant d’un conjunt de dades public de rostre humans i text, tant amb un nou conjunt de dades de eye tracking i d’imatges urbanes amb text. L’ultima part d’aquesta tesi es dedica a estudiar en quina mesura el model proposat pot respondre del desplegament d’atencio en un entorn complex. S’ha utilitzat un dispositiu mobil de eye tracking i una metodologia desenvolupada especificament per comparar les dades simulades amb les dades gravades de eye tracking. Tal configuracio permet posar a prova el model en la tasca de cerca de text molt semblant a una cerca real, en la condicio d’informacio visual incompleta.
Research Interests:
Art
The research presented in this thesis addresses the problem of Text Segmentation in Web images. Text is routinely created in image form (headers, banners etc.) on Web pages, as an attempt to overcome the stylistic limitations of HTML.... more
The research presented in this thesis addresses the problem of Text Segmentation in Web images. Text is routinely created in image form (headers, banners etc.) on Web pages, as an attempt to overcome the stylistic limitations of HTML. This text however, has a potentially high semantic value in terms of indexing and searching for the corresponding Web pages. As current search engine technology does not allow for text extraction and recognition in images, the text in image form is ignored. Moreover, it is desirable to obtain a uniform ...
Many scene text understanding methods approach the end-to-end recognition problem from a word-spotting perspective and take huge benefit from using small per-image lexicons. Such customized lexicons are normally assumed as given and their... more
Many scene text understanding methods approach the end-to-end recognition problem from a word-spotting perspective and take huge benefit from using small per-image lexicons. Such customized lexicons are normally assumed as given and their source is rarely discussed. In this paper we propose a method that generates contextualized lexicons for scene images using only visual information. For this, we exploit the correlation between visual and textual information in a dataset consisting of images and textual content associated with them. Using the topic modeling framework to discover a set of latent topics in such a dataset allows us to re-rank a fixed dictionary in a way that prioritizes the words that are more likely to appear in a given image. Moreover, we train a CNN that is able to reproduce those word rankings but using only the image raw pixels as input. We demonstrate that the quality of the automatically obtained custom lexicons is superior to a generic frequency-based baseline.
Research Interests:
ABSTRACT We present a hybrid algorithm for detection and tracking of text in natural scenes that goes beyond the fulldetection approaches in terms of time performance optimization. A state-of-the-art scene text detection module based on... more
ABSTRACT We present a hybrid algorithm for detection and tracking of text in natural scenes that goes beyond the fulldetection approaches in terms of time performance optimization. A state-of-the-art scene text detection module based on Maximally Stable Extremal Regions (MSER) is used to detect text asynchronously, while on a separate thread detected text objects are tracked by MSER propagation. The cooperation of these two modules yields real time video processing at high frame rates even on low-resource devices.
ABSTRACT The importance of logos and trademarks in nowadays society is indisputable, variably seen under a positive light as a valuable service for consumers or a negative one as a catalyst of ever-increasing consumerism. This chapter... more
ABSTRACT The importance of logos and trademarks in nowadays society is indisputable, variably seen under a positive light as a valuable service for consumers or a negative one as a catalyst of ever-increasing consumerism. This chapter discusses the technical approaches for enabling machines to work with logos, looking into the latest methodologies for logo detection, localization, representation, recognition, retrieval, and spotting in a variety of media. This analysis is presented in the context of three different applications covering the complete depth and breadth of state of the art techniques. These are trademark retrieval systems, logo recognition in document images, and logo detection and removal in images and videos. This chapter, due to the very nature of logos and trademarks, brings together various facets of document image analysis spanning graphical and textual content, while it links document image analysis to other computer vision domains, especially when it comes to the analysis of real-scene videos and images.
ABSTRACT We present a hybrid algorithm for detection and tracking of text in natural scenes that goes beyond the fulldetection approaches in terms of time performance optimization. A state-of-the-art scene text detection module based on... more
ABSTRACT We present a hybrid algorithm for detection and tracking of text in natural scenes that goes beyond the fulldetection approaches in terms of time performance optimization. A state-of-the-art scene text detection module based on Maximally Stable Extremal Regions (MSER) is used to detect text asynchronously, while on a separate thread detected text objects are tracked by MSER propagation. The cooperation of these two modules yields real time video processing at high frame rates even on low-resource devices.
ABSTRACT With increasing computational power, the trend in unconstrained text recognition is going towards whole document processing. For this task, more sophisticated language models can be employed. One approach is to take advantage the... more
ABSTRACT With increasing computational power, the trend in unconstrained text recognition is going towards whole document processing. For this task, more sophisticated language models can be employed. One approach is to take advantage the fact that the text of a document normally deals with a specific topic and hence the word occurrence probability is biased. Cache language models combine information about recent words, the cache, with a general statistical language model to increase the recognition rate. In this work we introduce a modified version of the cache language model to the task of handwriting recognition, where the n-best recognition output of the entire document is used to refine the language model for a consecutive recognition pass. An experimental evaluation on the IAM database demonstrates that the word error rate can be reduced with the proposed cache language model.

And 85 more