User models and recommender systems due to their similarity can be considered the same thing except from the use that we make of them. Both have their root in multiple disciplines such as information retrieval or machine learning among others. The impact has grown rapidly with the importance of data on systems and applications. Most of the big companies employ one of the other for different reasons such as: gathering more customers, boost sales or increase revenue. Thus very well-known companies like Amazon, EBay or Google use models to improve their businesses. In fact, as data becomes more and more important for companies, universities and people, user models are crucial to make decisions over large amounts of data. Although user models can provide accurate predictions on large populations their use and application is not restricted to predictions but can be extended to selection of dialogue strategies or detection of communities within complex domains. After a deep review of the existing literature, it was found that there is a lack of statistical user models based on experience plus the existing models in the area are content-based models that suffer from major problems as scalability, cold-start or new user problem. Furthermore, researchers in the area of user modelling usually develop their own models and then perform ad-hoc evaluations that are not replicable and therefore not comparable. The lack of a complete framework for evaluation makes very difficult to compare results across models and domains. There are two main approaches to build a user model or recommender system: the content based approach, where predictions are based on the same user past behaviours; and the collaborative approach where predictions rely on like-minded people. Both approaches have advantages but also downsides that have to be considered before building a model. The main goal of this thesis is to develop a hybrid user model that takes the strengths of both approaches and mitigates the downsides by combining both methods. The proposed hybrid model is based on an R-Tree structure. The selection of this structure to support the models is backed from the fact that the rectangle tree is specifically designed to effectively store and manipulate multidimensional data. This data structure introduced by Guttman in 1984 is a height balanced tree that only requires visiting a few nodes to perform a tree search. As a result, it can manage large populations of data efficiently as only a few nodes are visited during the inference. R-Tree has two different typologies of nodes: the leaf-node and the non-leaf node. Leaf nodes contain the whole universe of users while non leaf nodes are somehow redundant and contain summaries of child nodes. Along this thesis two statistical user models based on experience have been proposed. The first one is a knowledge base user mode (KLUM), is a classical approach that summarizes and remove data in order to keep performance level within reasonable margins. The second one, an R-Tree user model (RTUM), is an innovative model based on an R-Tree structure. This new model not only solves the problem of removing data but also the scalability problem which turns out to be one of the major problems in the area of user modelling. Both models have been developed and tested with equivalent formulations to make comparisons relevant. Both models are prepared to create their own knowledge base from scratch but also they can be fed with expert knowledge. Thus alleviating another major problem in the area of user modelling as it is the start-up problem. Regarding the proposal of this thesis, two statistical user models are proposed (KLUM and RTUM). In addition, a refinement of RTUM user model is proposed, while RTUM performs node partitions based on the centroids of the users in that node, the new refinement implements a new partition based on privileged features. Hence, the new approach takes advantage of most discriminatory features of the domain to perform the partition. This new approach not only provides accurate inferences, but also an excellent clustering that can be useful in many different scenarios. For instance, this clustering can be employed in the area of social networks to detect communities within the social network. This is a tough task that has been one of the goals of many researchers during the last few years. This thesis also provides a complete evaluation of the models with a great diversity of parameterizations and domains. The models are tested in four different domains and as a result of the evaluation, it is proved that RTUM user model provides a massive gain against classical user models as KLUM. During the evaluation, RTUM reached success rates of 85% while the analogous KLUM could only reach a 65% thus leaving a 20% gain for the proposed model. The evaluation provided not only compares models and success rates, but also provides a broad analysis of how every parameter of the models impact the performance plus a complete study of the databases sizes and inference times for the models. The main conclusion to the evaluation is that after a complete evaluation with a wide diversity of parameters and domains RTUM outperforms KLUM on every scenario tested. As previously mentioned, after the literature review it was also found a lack of evaluation frameworks for user modelling. This thesis also provides a complete evaluation framework for user modelling. This fills a gap in the literature as well as makes the evaluation replicable and therefore comparable. Along years researchers and developers had found difficulties to compare evaluations and measure the quality of their models in different domains due to the lack of an evaluation standard. The evaluation framework presented in this thesis covers data samples including training set and test set plus different sets of experiments alongside with a statistical analysis of the domain, confidence intervals and confidence levels to guarantee that each experiment is statistically significant. The evaluation framework can be downloaded and then used to complete evaluations and cross-validate results across different models.
© 2008-2024 Fundación Dialnet · Todos los derechos reservados