Resumen de On the Use of Labelled and Unlabelled Data to Improve Nearest Neighbor Classification

F. Vazquez, Ferrán Pla

The present paper shows that the accuracy of nearest neighbor classifiers can be improved by incorporating large amounts of unlabelled patterns into a training set with a (possibly) reduced number of labelled instances. The semi-supervised learning algorithm here introduced is primarily based on a set of techniques strongly related with the popular nearest neighbor classifier, mainly in the direction of filtering the training set. Experimental results, obtained using several benchmark data sets taken from the UCI Machine Learning Database Repository, show that the employment of unlabelled data can effectively reduce classification error by up to 16\%. In order to achieve such an increase in performance, it is necessary to conveniently process the unlabelled patterns by means of some editing (filtering) technique. Otherwise, errors produced by misclassifications could be incorporated into the training set, thus importantly degrading the final classification accuracy.

Acceso de usuarios registrados

¿Es nuevo? Regístrese

Coordinado por: