Collective variables (CVs) are low-dimensional representations of the state of a complex system, which help us rationalize molecular conformations and sample free energy landscapes with molecular dynamics simulations. However, identifying a representative set of CVs for a given system is far from obvious, and most often relies on physical intuition or partial knowledge about the systems. An inappropriate choice of CVs is misleading and can lead to inefficient sampling. Thus, there is a need for systematic approaches to effectively identify CVs. In recent years, machine learning techniques, especially nonlinear dimensionality reduction (NLDR), have shown their ability to automatically identify the most important collective behavior of molecular systems. These methods have been widely used to visualize molecular trajectories. However, in general they do not provide a differentiable mapping from high-dimensional configuration space to their low-dimensional representation, as required in enhanced sampling methods, and they cannot deal with systems with inherently nontrivial conformational manifolds. In the fist part of this dissertation, we introduce a methodology that, starting from an ensemble representative of molecular flexibility, builds smooth and nonlinear data-driven collective variables (SandCV) from the output of nonlinear manifold learning algorithms. We demonstrate the method with a standard benchmark molecule and show how it can be non-intrusively combined with off-the-shelf enhanced sampling methods, here the adaptive biasing force method. SandCV identifies the system's conformational manifold, handles out-of-manifold conformations by a closest point projection, and exactly computes the Jacobian of the resulting CVs. We also illustrate how enhanced sampling simulations with SandCV can explore regions that were poorly sampled in the original molecular ensemble. We then demonstrate that NLDR methods face serious obstacles when the underlying CVs present periodicities, e.g.~arising from proper dihedral angles. As a result, NLDR methods collapse very distant configurations, thus leading to misinterpretations and inefficiencies in enhanced sampling. Here, we identify this largely overlooked problem, and discuss possible approaches to overcome it. Additionally, we characterize flexibility of alanine dipeptide molecule and show that it evolves around a flat torus in four-dimensional space. In the final part of this thesis, we propose a novel method, atlas of collective variables, that systematically overcomes topological obstacles, ameliorates the geometrical distortions and thus allows NLDR techniques to perform optimally in molecular simulations. This method automatically partitions the configuration space and treats each partition separately. Then, it connects these partitions from the statistical mechanics standpoint.
© 2008-2024 Fundación Dialnet · Todos los derechos reservados