In this paper, we focus on the problem of determining the gender of the person described in a biographical text. Since support vector machine classifers are well suited for text classifcation tasks, we present a new stopping criterion for support vector optimisation algorithms tailored to this problem. This new approach exploits the geometric properties of the vector representation of such content. An experiment on a set of English and Spanish biographical articles retrieved from Wikipedia illustrates this approach and compares it to other machine learning classifcation algorithms. The proposed method allows real-time classifcation algorithm training. Moreover, these results confrm the advantage of leveraging additional gender information in strongly infected languages, like Spanish, for this task.
© 2008-2025 Fundación Dialnet · Todos los derechos reservados