Distance geometry and data science

Liberti, Leo

doi:10.1007/s11750-020-00563-0

Distance geometry and data science

Invited Paper
Published: 05 June 2020

Volume 28, pages 271–339, (2020)
Cite this article

TOP Aims and scope Submit manuscript

Leo Liberti¹

1833 Accesses
20 Citations
21 Altmetric
Explore all metrics

Abstract

Data are often represented as graphs. Many common tasks in data science are based on distances between entities. While some data science methodologies natively take graphs as their input, there are many more that take their input in vectorial form. In this survey, we discuss the fundamental problem of mapping graphs to vectors, and its relation with mathematical programming. We discuss applications, solution methods, dimensional reduction techniques, and some of their limits. We then present an application of some of these ideas to neural networks, showing that distance geometry techniques can give competitive performance with respect to more traditional graph-to-vector mappings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similarity Between Points in Metric Measure Spaces

Network geometry

Article 29 January 2021

Introduction

Notes

Not to be confused with mathematical induction.
http://blogs.princeton.edu/imabandit/2013/02/19/orf523-ipms-for-lps-and-sdps/.
Also see http://math.stackexchange.com/questions/1882130/ for a compact derivation.
A young and unknown George Dantzig had just finished his presentation of LP to an audience of “big shots”, including Koopmans and Von Neumann. Harold Hotelling raised his hand, and stated: “but we all know that the world is nonlinear!”, thereby obliterating the simplex method as a mathematical curiosity. Luckily, Von Neumann answered on Dantzig's behalf and in his defence (Dantzig 1983).

References

Abadi M et al (2015) TensorFlow: large-scale machine learning on heterogeneous systems. Software available from tensorflow.org. http://tensorflow.org/
Achlioptas D (2003) Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J Comput Syst Sci 66:671–687
Article Google Scholar
Aggarwal C, Hinneburg A, Keim D (2001) On the surprising behavior of distance metrics in high dimensional space. In: den Bussche JV, Vianu V (eds) Proceedings of ICDT, LNCS, vol 1973. Springer, Berlin, pp 420–434
Ahmadi A, Majumdar A (2019) DSOS and SDSOS optimization: more tractable alternatives to sum of squares and semidefinite optimization. SIAM J Appl Algebra Geom 3(2):193–230
Article Google Scholar
Ahmadi A, Jungers R, Parrilo P, Roozbehani M (2014) Joint spectral radius and path-complete graph Lyapunov functions. SIAM J Control Optim 52(1):687–717
Article Google Scholar
Ailon N, Chazelle B (2006) Approximate nearest neighbors and fast Johnson–Lindenstrauss lemma. In: Proceedings of the symposium on the theory of computing, STOC, vol. ’06. ACM, Seattle
Alfakih A, Khandani A, Wolkowicz H (1999) Solving Euclidean distance matrix completion problems via semidefinite programming. Comput Optim Appl 12:13–30
Article Google Scholar
Allen G (2012) Sparse higher-order principal components analysis. In: N. Lawrence, M. Girolami (eds) Proceedings of the international conference on Artificial intelligence and Statistics, vol 22, pp 27–36. PMLR, La Palma
Allen-Zhu Z, Gelashvili R, Micali S, Shavit N (2014) Sparse sign-consistent Johnson-Lindenstrauss matrices: Compression with neuroscience-based constraints. Proc Natl Acad Sci 111(47):16872–16876
Article Google Scholar
Aloise D, Cafieri S, Caporossi G, Hansen P, Perron S, Liberti L (2010) Column generation algorithms for exact modularity maximization in networks. Phys Rev E 82(4):046112
Article Google Scholar
Aloise D, Hansen P, Liberti L (2012) An improved column generation algorithm for minimum sum-of-squares clustering. Math Program A 131:195–220
Article Google Scholar
Aloise D, Caporossi G, Hansen P, Liberti L, Perron S, Ruiz M (2013) Modularity maximization in networks by variable neighbourhood search. In: Bader D, Sanders P, Wagner D (eds) Graph partitioning and graph clustering, contemporary mathematics, vol 588. AMS, Providence, pp 113–127
Chapter Google Scholar
Amaldi E, Liberti L, Maffioli F, Maculan N (2009) Edge-swapping algorithms for the minimum fundamental cycle basis problem. Math Methods Oper Res 69:205–223
Article Google Scholar
Anderson J (1995) An introduction to neural networks. MIT Press, Cambridge
Book Google Scholar
Arriaga R, Vempala S (2006) An algorithmic theory of learning: Robust concepts and random projection. Mach Learn 63:161–182
Article Google Scholar
Asimow L, Roth B (1978) The rigidity of graphs. Trans AMS 245:279–289
Article Google Scholar
Bahr A, Leonard J, Fallon M (2009) Cooperative localization for autonomous underwater vehicles. Int J Robot Res 28(6):714–728
Article Google Scholar
Barker G, Carlson D (1975) Cones of diagonally dominant matrices. Pac J Math 57(1):15–32
Article Google Scholar
Barvinok A (2002) A course in convexity, No. 54 in graduate studies in mathematics. AMS, Providence
Barvinok A (1995) Problems of distance geometry and convex properties of quadratic maps. Discrete Comput Geom 13:189–202
Article Google Scholar
Barvinok A (1997) Measure concentration in optimization. Math Program 79:33–53
Google Scholar
Beeker N, Gaubert S, Glusa C, Liberti L (2013) Is the distance geometry problem in NP? In: Mucherino A., Lavor C., Liberti L., Maculan N. (eds) Distance geometry. Springer, New York, NY, pp 85–94
Belotti P, Lee J, Liberti L, Margot F, Wächter A (2009) Branching and bounds tightening techniques for non-convex MINLP. Optim Methods Softw 24(4):597–634
Article Google Scholar
ben Judah of Worms E (XII-XIII Century) Sodei Razayya
Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. Advances in neural information processing systems. NIPS, vol 19. MIT Press, Cambridge, pp 153–160
Google Scholar
Ben-Tal A, Ghaoui LE, Nemirovski A (2009) Robust optimization. Princeton University Press, Princeton
Book Google Scholar
Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1998) When is “nearest neighbor” meaningful? In: Beeri C, Buneman P (eds) Proceedings of ICDT, LNCS, vol 1540. Springer, Heidelberg, pp 217–235
Bird S, Klein E, Loper E (2009) Natural language processing with Python. O’Reilly, Cambridge
Google Scholar
Birge J, Louveaux F (2011) Introduction to stochastic programming. Springer, New York
Book Google Scholar
Blömer J, Lammersen C, Schmidt M, Sohler C (2016) Theoretical analysis of the k-means algorithm: a survey. In: Kliemann L, Sanders P (eds) Algorithm engineering, LNCS, vol 9220. Springer, Cham, pp 81–116
Chapter Google Scholar
Blumenthal L (1953) Theory and applications of distance geometry. Oxford University Press, Oxford
Google Scholar
Böhm C, Jacopini G (1966) Flow diagrams, Turing machines and languages with only two formation rules. Commun ACM 9(5):366–371
Article Google Scholar
Bollobás B (1998) Modern graph theory. Springer, New York
Book Google Scholar
Borg I, Groenen P (2010) Modern multidimensional scaling, 2nd edn. Springer, New York
Google Scholar
Bottou L (2012) Stochastic gradient descent tricks. In: Montavon G et al (eds) Neural networks: tricks of the trade, LNCS, vol 7700. Springer, Berlin, pp 421–436
Chapter Google Scholar
Bourgain J (1985) On Lipschitz embeddings of finite metric spaces in Hilbert space. Isr J Math 52(1–2):46–52
Article Google Scholar
Boutsidis C, Zouzias A, Drineas P (2010) Random projections for \(k\)-means clustering. Advances in neural information processing systems. NIPS. NIPS Foundation, La Jolla, pp 298–306
Google Scholar
Brambilla A, Premoli A (2001) Rigorous event-driven (red) analysis of large-scale nonlinear rc circuits. IEEE Trans Circ Syst I Fundam Theory Appl 48(8):938–946
Article Google Scholar
Brandes U, Delling D, Gaertler M, Görke R, Hoefer M, Nikoloski Z, Wagner D (2008) On modularity clustering. IEEE Trans Knowl Data Eng 20(2):172–188
Article Google Scholar
Cafieri S, Hansen P, Liberti L (2010) Loops and multiple edges in modularity maximization of networks. Phys Rev E 81(4):46102
Article Google Scholar
Cafieri S, Hansen P, Liberti L (2011) Locally optimal heuristic for modularity maximization of networks. Phys Rev E 83(056105):1–8
Google Scholar
Cafieri S, Hansen P, Liberti L (2014) Improving heuristics for network modularity maximization using an exact algorithm. Discrete Appl Math 163:65–72
Article Google Scholar
Cauchy AL (1813) Sur les polygones et les polyèdres. Journal de l’École Polytechnique 16(9):87–99
Google Scholar
Cayley A (1841) A theorem in the geometry of position. Camb Math J II:267–271
Google Scholar
Chollet F et al (2015) Keras. https://keras.io
Chomsky N (1965) Aspects of the theory of syntax. MIT Press, Cambridge
Google Scholar
Choromanska A, Henaff M, Mathieu M, Arous GB, LeCun Y (2015) The loss surfaces of multilayer networks. In: Proceedings of the international conference on artificial intelligence and statistics, AISTATS, vol 18. JMLR, San Diego
COIN-OR (2006) Introduction to IPOPT: a tutorial for downloading, installing, and using IPOPT
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2461–2505
Google Scholar
Connelly R (1978) A counterexample to the rigidity conjecture for polyhedra. Publications Mathématiques de l’IHES 47:333–338
Article Google Scholar
Cox T, Cox M (2001) Multidimensional scaling. Chapman & Hall, Boca Raton
Google Scholar
D’Ambrosio C, Liberti L (2017) Distance geometry in linearizable norms. In: Nielsen F, Barbaresco F (eds) Geometric science of information, LNCS, vol 10589. Springer, Berlin, pp 830–838
Chapter Google Scholar
D’Ambrosio C, Liberti L, Poirion PL, Vu K (2019) Random projections for quadratic programming. Math Program B (in revision)
D’Ambrosio C, Liberti L, Poirion PL, Vu K (2019) Random projections for quadratic programming. Tech. Rep. 2019-7-7322, Optimization Online
Dantzig G (1983) Reminiscences about the origins of linear programming. In: Bachem A, Grötschel M, Korte B (eds) Mathematical programming: the state of the art. Springer, Berlin
Google Scholar
Dasgupta S, Gupta A (2002) An elementary proof of a theorem by Johnson and Lindenstrauss. Random Struct Algorithms 22:60–65
Article Google Scholar
D’Aspremont A, Bach F, Ghaoui LE (2014) Approximation bounds for sparse principal component analysis. Math Program B 148:89–110
Article Google Scholar
Dattorro J (2015) Convex optimization and Euclidean distance geometry. \({\cal M}\epsilon \beta oo\), Palo Alto
Dauphin Y, Pascanu R, Gulcehre C, Cho K, Ganguli S, Bengio Y (2014) Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. Advances in neural information processing systems. NIPS. NIPS Foundation, La Jolla, pp 2933–2941
Google Scholar
Demartines P, Hérault J (1997) Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets. IEEE Trans Neural Netw 8(1):148–154
Article Google Scholar
Deo N, Prabhu G, Krishnamoorthy M (1982) Algorithms for generating fundamental cycles in a graph. ACM Trans Math Softw 8(1):26–42
Article Google Scholar
Dey S, Mazumder R, Molinaro M, Wang G (2017) Sparse principal component analysis and its \(\ell _1\)-relaxation. Tech. Rep. arXiv:1712.00800v1
Dias G, Liberti L (2016) Diagonally dominant programming in distance geometry. In: Cerulli R, Fujishige S, Mahjoub R (eds) International symposium in combinatorial optimization, LNCS, vol 9849. Springer, New York, pp 225–236
Chapter Google Scholar
Douven I (2017) Abduction. In: Zalta E (ed) The Stanford encyclopedia of philosophy. Stanford University, Stanford
Google Scholar
Durrant R, Kabán A (2009) When is ‘nearest neighbour’ meaningful: a converse theorem and implications. J Complex 25:385–397
Article Google Scholar
Eco U (1983) Horns, hooves, insteps. Some hypotheses on three kinds of abduction. In: Eco U, Sebeok T (eds) Dupin, Holmes. Peirce. The Sign of Three. Indiana University Press, Bloomington
Eco U (1984) Semiotics and the philosophy of language. Indiana University Press, Bloomington
Book Google Scholar
Eren T, Goldenberg D, Whiteley W, Yang Y, Morse A, Anderson B, Belhumeur P (2004) Rigidity, computation, and randomization in network localization. IEEE, pp 2673–2684
Euler L (1862) Continuatio fragmentorum ex adversariis mathematicis depromptorum: II Geometria, 97. In: Fuss P, Fuss N (eds) Opera postuma mathematica et physica anno 1844 detecta, vol I. Eggers & C, Petropolis, pp 494–496
Google Scholar
Fiedler M (1973) Algebraic connectivity of graphs. Czechoslov Math J 23(2):298–305
Google Scholar
Flexer A, Schnitzer D (2015) Choosing \(\ell _p\) norms in high-dimensional spaces based on hub analysis. Neurocomputing 169:281–287
Article Google Scholar
Floreano D (1996) Manuale sulle Reti Neurali Il. Mulino, Bologna
Google Scholar
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174
Article Google Scholar
François D, Wertz V, Verleysen M (2007) The concentration of fractional distances. IEEE Trans Knowl Data Eng 19(7):873–886
Article Google Scholar
Friedler F, Huang Y, Fan L (1992) Combinatorial algorithms for process synthesis. Comput Chem Eng 16(1):313–320
Article Google Scholar
Gayraud N (2017) Public remark. Le Monde des Mathématiques Industrielles at INRIA Sophia-Antipolis (MOMI17)
Gilbreth F, Gilbreth L (1921) Process charts: first steps in finding the one best way to do work. In: Proceedings of the annual meeting. American Society of Mechanical Engineers, New York
Gill P (2006) User’s guide for SNOPT version 7.2. Systems Optimization Laboratory, Stanford University, California
Gödel K (1986) On the isometric embeddability of quadruples of points of \(r_3\) in the surface of a sphere. In: Feferman S, Dawson J, Kleene S, Moore G, Solovay R, van Heijenoort J (eds) Kurt Gödel: collected works, vol I, pp (1933b) 276–279. Oxford University Press, Oxford
Google Scholar
Gonçalves D, Mucherino A, Lavor C, Liberti L (2017) Recent advances on the interval distance geometry problem. J Glob Optim 69:525–545
Article Google Scholar
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge
Google Scholar
Haeffele B, Vidal R (2017) Global optimality in neural network training. In: Proceedings of the conference in computer vision and pattern recognition, CVPR. IEEE, Piscataway, pp 4390–4398
Hagberg A, Schult D, Swart P (2008) Exploring network structure, dynamics, and function using NetworkX. In: Varoquaux G, Vaught T, Millman J (eds) Proceedings of the 7th Python in science conference (SciPy2008), Pasadena, pp 11–15
Hansen P, Jaumard B (1997) Cluster analysis and mathematical programming. Math Program 79:191–215
Google Scholar
Henneberg L (1911) Die Graphische Statik der starren Systeme. Teubner, Leipzig
Google Scholar
Heron (50AD) Metrica, vol I. Alexandria
Hinneburg A, Aggarwal C, Keim D (2000) What is the nearest neighbor in high dimensional spaces? In: Proceedings of the conference on very large databases, VLDB, vol 26. Morgan Kaufman, San Francisco, pp. 506–515
Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(6):417–441
Article Google Scholar
IBM (2017) ILOG CPLEX 12.8 User’s Manual. IBM
Indyk P (2001) Algorithmic applications of low-distortion geometric embeddings. Foundations of computer science. FOCS, vol 42. IEEE, Washington, DC, pp 10–33
Google Scholar
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the symposium on the theory of computing, STOC, vol 30. ACM, New York, pp 604–613
Indyk P, Naor A (2007) Nearest neighbor preserving embeddings. ACM Trans Algorithms 3(3), Art. 31
Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Article Google Scholar
Johnson W, Lindenstrauss J (1984) Extensions of Lipschitz mappings into a Hilbert space. In: Hedlund G (ed) Conference in modern analysis and probability, contemporary mathematics, vol 26. AMS, Providence, pp 189–206
Chapter Google Scholar
Jolliffe I (2010) Principal component analysis, 2nd edn. Springer, Berlin
Google Scholar
Jordan M (1995) Why the logistic function? A tutorial discussion on probabilities and neural networks. Tech. Rep. Computational Cognitive Science TR 9503, MIT
Kane D, Nelson J (2014) Sparser Johnson–Lindenstrauss transforms. J ACM 61(1):4
Article Google Scholar
Kantor I, Matoušek J, Šámal R (2015) Mathematics++: selected topics beyond the basic courses. No. 75 in Student Mathematical Library. AMS, Providence
Khalife S, Liberti L, Vazirgiannis M (2019) Geometry and analogies: a study and propagation method for word representation. In: Statistical language and speech processing, SLSP, vol. 7
Kingma D, Ba J (2015)Adam: A method for stochastic optimization. In: Proceedings of ICLR. San Diego
Knuth D (1997) The art of computer programming, part I: fundamental algorithms, 3rd edn. Addison-Wesley, Reading
Kullback S, Leibler R (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Article Google Scholar
Kuratowski C (1935) Quelques problèmes concernant les espaces métriques non-séparables. Fundam Math 25:534–545
Article Google Scholar
Lavor C, Liberti L, Maculan N (2006) Computational experience with the molecular distance geometry problem. In: Pintér J (ed) Global optimization: scientific and engineering case studies. Springer, Berlin, pp 213–225
Chapter Google Scholar
Lavor C, Liberti L, Maculan N, Mucherino A (2012) The discretizable molecular distance geometry problem. Comput Optim Appl 52:115–146
Article Google Scholar
Lavor C, Liberti L, Mucherino A (2013) The interval Branch-and-Prune algorithm for the discretizable molecular distance geometry problem with inexact distances. J Glob Optim 56:855–871
Article Google Scholar
Lavor C, Liberti L, Donald B, Worley B, Bardiaux B, Malliavin T, Nilges M (2019) Minimal NMR distance information for rigidity of protein graphs. Discrete Appl Math 256:91–104
Article Google Scholar
Lavor C, Souza M, Carvalho L, Liberti L (2019) On the polynomiality of finding \({}^K\)DMDGP re-orders. Discrete Appl Math 267:190–194
Article Google Scholar
Lehmann S, Hansen L (2007) Deterministic modularity optimization. Eur Phys J B 60:83–88
Article Google Scholar
Levine R, Mason T, Brown D (1995) Lex and Yacc, 2nd edn. O’Reilly, Cambridge
Google Scholar
Liberti L (2010) Software modelling and architecture: exercises. Ecole Polytechnique. https://www.lix.polytechnique.fr/~liberti/swarchex.pdf
Liberti L (2009) Reformulations in mathematical programming: definitions and systematics. RAIRO-RO 43(1):55–86
Article Google Scholar
Liberti L (2019) Undecidability and hardness in mixed-integer nonlinear programming. RAIRO-Oper Res 53:81–109
Article Google Scholar
Liberti L, Lavor C (2013) On a relationship between graph realizability and distance matrix completion. In: Migdalas A, Sifaleras A, Georgiadis C, Papathanaiou J, Stiakakis E (eds) Optimization theory, decision making, and operational research applications, proceedings in mathematics & statistics, vol 31. Springer, Berlin, pp 39–48
Chapter Google Scholar
Liberti L, Lavor C (2016) Six mathematical gems in the history of distance geometry. Int Trans Oper Res 23:897–920
Article Google Scholar
Liberti L, Lavor C (2017) Euclidean distance geometry: an introduction. Springer, New York
Book Google Scholar
Liberti L, Marinelli F (2014) Mathematical programming: Turing completeness and applications to software analysis. J Comb Optim 28(1):82–104
Article Google Scholar
Liberti L, Vu K (2018) Barvinok’s naive algorithm in distance geometry. Oper Res Lett 46:476–481
Article Google Scholar
Liberti L, Lavor C, Maculan N (2008) A branch-and-prune algorithm for the molecular distance geometry problem. Int Trans Oper Res 15:1–17
Article Google Scholar
Liberti L, Cafieri S, Tarissan F (2009) Reformulations in mathematical programming: a computational approach. In: Abraham A, Hassanien AE, Siarry P, Engelbrecht A (eds) Foundations of computational intelligence, vol 3. no 203 in Studies in Computational Intelligence. Springer, Berlin, pp 153–234
Google Scholar
Liberti L, Cafieri S, Savourey D (2010) Reformulation optimization software engine. In: Fukuda K, van der Hoeven J, Joswig M, Takayama N (eds) Mathematical software, LNCS, vol 6327. Springer, New York, pp 303–314
Google Scholar
Liberti L, Lavor C, Mucherino A, Maculan N (2010) Molecular distance geometry methods: from continuous to discrete. Int Trans Oper Res 18:33–51
Article Google Scholar
Liberti L, Lavor C, Alencar J, Abud G (2013) Counting the number of solutions of \({}^k\)DMDGP instances. In: Nielsen F, Barbaresco F (eds) Geometric science of information, LNCS, vol 8085. Springer, New York, pp 224–230
Chapter Google Scholar
Liberti L, Lavor C, Maculan N, Mucherino A (2014) Euclidean distance geometry and applications. SIAM Rev 56(1):3–69
Article Google Scholar
Liberti L, Masson B, Lavor C, Lee J, Mucherino A (2014) On the number of realizations of certain Henneberg graphs arising in protein conformation. Discrete Appl Math 165:213–232
Article Google Scholar
Liberti L, Swirszcz G, Lavor C (2016) Distance geometry on the sphere. In: Akiyama J et al (eds) JCDCG\({}^2\), LNCS, vol 9943. Springer, New York, pp 204–215
Google Scholar
Liberti L, D’Ambrosio C (2017) The Isomap algorithm in distance geometry. In: Iliopoulos C, Pissis S, Puglisi S, Raman R (eds) Proceedings of 16th international symposium on experimental algorithms (SEA), LIPICS, vol 75. Dagstuhl Publishing, Schloss Dagstuhl, pp 5:1–5:13
Liberti L, Lavor C, Mucherino A (2013) The discretizable molecular distance geometry problem seems easier on proteins. In: Mucherino A, Lavor C, Liberti L, Maculan N (eds) Distance geometry: theory, methods and applications. Springer, New York, pp 47–60
Chapter Google Scholar
Linial N, London E, Rabinovich Y (1995) The geometry of graphs and some of its algorithmic applications. Combinatorica 15(2):215–245
Article Google Scholar
Majumdar A, Ahmadi A, Tedrake R (2014) Control and verification of high-dimensional systems with dsos and sdsos programming. Conference on decision and control, vol 53. Piscataway, IEEE, pp 394–401
Chapter Google Scholar
Malliavin T, Mucherino A, Lavor C, Liberti L (2019) Systematic exploration of protein conformational space using a distance geometry approach. J Chem Inf Model 59:4486–4503
Article Google Scholar
Manning C, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge
Google Scholar
Mansouri J, Khademi M (2015) Multiplicative distance: a method to alleviate distance instability for high-dimensional data. Knowl Inf Syst 45:783–805
Article Google Scholar
Matoušek J (2013) Lecture notes on metric embeddings. Tech. rep, ETH Zürich
Matoušek J (2008) On variants of the Johnson-Lindenstrauss lemma. Random Struct Algorithms 33:142–156
Article Google Scholar
Maxwell J (1864) On the calculation of the equilibrium and stiffness of frames. Philos Mag 27(182):294–299
Article Google Scholar
McCormick G (1976) Computability of global solutions to factorable nonconvex programs: Part I-Convex underestimating problems. Math Program 10:146–175
Article Google Scholar
McCulloch W (1961) What is a number, that a man may know it, and a man, that he may know a number? Gen Semant Bull 26–27:7–18
Google Scholar
Mencarelli L, Sahraoui Y, Liberti L (2017) A multiplicative weights update algorithm for MINLP. EURO J Comput Optim 5:31–86
Article Google Scholar
Menger K (1928) Untersuchungen über allgemeine Metrik. Math Ann 100:75–163
Article Google Scholar
Menger K (1931) New foundation of Euclidean geometry. Am J Math 53(4):721–745
Article Google Scholar
Merris R (1994) Laplacian matrices of graphs: a survey. Linear Algebra Appl 198:143–176
Article Google Scholar
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Burges C, Bottou L, Welling M, Ghahramani Z, Weinberger K (eds) Advances in neural information processing systems, NIPS, vol 26. NIPS Foundation, La Jolla, pp 3111–3119
Google Scholar
Miller G (1995) Wordnet: a lexical database for English. Commun ACM 38(11):39–41
Article Google Scholar
Milnor J (1964) On the Betti numbers of real varieties. Proc AMS 15:275–280
Article Google Scholar
Minsky M (1986) The society of mind. Simon & Schuster, New York
Google Scholar
Moitra A (2018) Algorithmic aspects of machine learning. CUP, Cambridge
Book Google Scholar
Moro A (2008) The boundaries of Babel. MIT Press, Cambridge
Book Google Scholar
Morris C (1946) Signs. Language and behavior. Prentice-Hall, New York
Book Google Scholar
Mucherino A, Lavor C, Liberti L (2012) Exploiting symmetry properties of the discretizable molecular distance geometry problem. J Bioinform Comput Biol 10(1–15):1242009
Article Google Scholar
Mucherino A, Lavor C, Liberti L, Maculan N (eds) (2013) Distance geometry: theory, methods, and applications. Springer, New York
Google Scholar
Newman M, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113
Article Google Scholar
Object Management Group (2005) Unified modelling language: superstructure, v. 2.0. Tech. Rep. formal/05-07-04, OMG
O’Donoghue B, Chu E, Parikh N, Boyd S (2016) Operator splitting for conic optimization via homogeneous self-dual embedding. J Optim Theory Appl 169(3):1042–1068
Article Google Scholar
Paton K (1969) An algorithm for finding a fundamental set of cycles of a graph. Commun ACM 12(9):514–518
Article Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Google Scholar
Peirce C (1878) Illustrations of the logic of science, part 6: induction, deduction, and hypothesis. Popul Sci Mon 13:470–482
Google Scholar
Penrose R (1989) The emperor’s new mind. Penguin, New York
Google Scholar
Pfeffer A (2016) Practical probabilistic programming. Manning Publications, Shelter Island
Google Scholar
Popper K (1968) The logic of scientific discovery. Hutchinson, London
Google Scholar
Potra F, Wright S (2000) Interior-point methods. J Comput Appl Math 124:281–302
Article Google Scholar
Proni G (2016) Is there abduction in Aristotle? Peirce, Eco, and some further remarks. Ocula 17:1–14
Google Scholar
Radovanović M, Nanopoulos A, Ivanović M (2010) Hubs in space: Popular nearest neighbors in high-dimensional data. J Mach Learn Res 11:2487–2531
Google Scholar
Rousseau F, Vazirgiannis M (2013) Graph-of-word and TW-IDF: new approach to ad hoc IR. In: Proceedings of CIKM. ACM, New York
Saerens M, Fouss F, Yen L, Dupont P (2004) The principal components analysis of a graph, and its relationships to spectral clustering. In: Boulicaut JF, Esposito F, Giannotti F, Pedreschi D (eds) Proceedings of the European conference in machine learning (ECML), LNAI, vol 3201. Springer, Berlin, pp 371–383
Salgado E, Scozzari A, Tardella F, Liberti L (2018) Alternating current optimal power flow with generator selection. In: Lee J, Rinaldi G, Mahjoub R (eds) Combinatorial optimization (Proceedings of ISCO 2018), LNCS, vol 10856, pp 364–375
Sánchez AB, Lavor C (2020) On the estimation of unknown distances for a class of Euclidean distance matrix completion problems with interval data. Linear Algebra Appl 592:287–305
Article Google Scholar
Saxe J (1979) Embeddability of weighted graphs in \(k\)-space is strongly NP-hard. In: Proceedings of 17th Allerton conference in communications, control and computing, pp 480–489
Schaeffer S (2007) Graph clustering. Comput Sci Rev 1:27–64
Article Google Scholar
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117. https://doi.org/10.1016/j.neunet.2014.09.003. arXiv:1404.7828 [cs.NE]
Schoenberg I (1935) Remarks to Maurice Fréchet’s article Sur la définition axiomatique d’une classe d’espaces distanciés vectoriellement applicable sur l’espace de Hilbert. Ann Math 36(3):724–732
Article Google Scholar
Schumacher M, Roßner R, Vach W (1996) Neural networks and logistic regression: part I. Comput Stat Data Anal 21:661–682
Article Google Scholar
Seshu S, Reed M (1961) Linear graphs and electrical networks. Addison-Wesley, Reading
Singer A (2011) Angular synchronization by eigenvectors and semidefinite programming. Appl Comput Harmon Anal 30:20–36
Article Google Scholar
Smith E, Pantelides C (1999) A symbolic reformulation/spatial branch-and-bound algorithm for the global optimisation of nonconvex MINLPs. Comput Chem Eng 23:457–478
Article Google Scholar
Steinhaus H (1956) Sur la division des corps matériels en parties. Bulletin de l’Académie Polonaise des Sciences Cl. III 4(12):801–804
Tabaghi P, Dokmanić I, Vetterli M (2019) On the move: localization with kinetic Euclidean distance matrices. In: International conference on acoustics, speech and signal processing (ICASSP). IEEE, Piscataway
Tawarmalani M, Sahinidis N (2004) Global optimization of mixed integer nonlinear programs: a theoretical and computational study. Math Program 99:563–591
Article Google Scholar
Tenenbaum J, de Silva V, Langford J (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2322
Article Google Scholar
Thoreau H (1849) Resistance to civil government. In: Peabody E (ed) Æsthetic papers. J. Wilson, Boston
van Rossum G et al (2019) Python language reference, version 3. Python Software Foundation
Vavasis S (1991) Nonlinear optimization: complexity issues. Oxford University Press, Oxford
Google Scholar
Vempala S (2004) The Random projection method. No. 65 in DIMACS series in discrete mathematics and theoretical computer science. AMS, Providence
Venkatasubramanian S, Wang Q (2011) The Johnson–Lindenstrauss transform: an empirical study. Algorithm engineering and experiments. ALENEX, vol 13. SIAM, Providence, pp 164–173
Google Scholar
Verboon A (2014) The medieval tree of Porphyry: an organic structure of logic. In: Worm A, Salonis P (eds) The Tree. Symbol, allegory and structural device in medieval art and thought, international medieval research, vol 20. Brepols, Turnhout, pp 83–101
Vershynin R (2018) High-dimensional probability. CUP, Cambridge
Book Google Scholar
Vidal R, Ma Y, Sastry S (2016) Generalized principal component analysis. Springer, New York
Book Google Scholar
Vu K, Poirion PL, Liberti L (2018) Random projections for linear programming. Math Oper Res 43(4):1051–1071
Article Google Scholar
Vu K, Poirion PL, D’Ambrosio C, Liberti L (2019) Random projections for quadratic programs over a Euclidean ball. In: Lodi A et al (eds) Integer programming and combinatorial optimization (IPCO), LNCS, vol 11480. Springer, New York, pp 442–452
Chapter Google Scholar
Vu K, Poirion PL, Liberti L (2019) Gaussian random projections for Euclidean membership problems. Discrete Appl Math 253:93–102
Article Google Scholar
Wikipedia: Civil disobedience (thoreau) (2019). http://en.wikipedia.org/wiki/Civil_Disobedience_(Thoreau). [Online; accessed 190804]
Wikipedia: Computational pragmatics (2019). http://en.wikipedia.org/wiki/Computational_pragmatics. [Online; accessed 190802]
Wikipedia: Diagonally dominant matrix (2019). http://en.wikipedia.org/wiki/Diagonally_dominant_matrix. [Online; accessed 190716]
Wikipedia: Flowchart (2019). http://en.wikipedia.org/wiki/Flochart. [Online; accessed 190802]
Wikipedia: Principal component analysis (2019). http://en.wikipedia.org/wiki/Principal_component_analysis. [Online; accessed 190726]
Wikipedia: Rectifier (neurl networks) (2019). http://en.wikipedia.org/wiki/Rectifier_(neural_networks). [Online; accessed 190807]
Wikipedia: Slutsky’s theorem (2019). http://en.wikipedia.org/wiki/Slutsky%27s_theorem. [Online; accessed 190802]
Williams H (1999) Model building in mathematical programming, 4th edn. Wiley, Chichester
Google Scholar
Woodruff D (2014) Sketching as a tool for linear algebra. Found Trends Theor Comput Sci 10(1–2):1–157
Article Google Scholar
Wüthrich K (1989) Protein structure determination in solution by nuclear magnetic resonance spectroscopy. Science 243:45–50
Article Google Scholar
Xu G, Tsoka S, Papageorgiou L (2007) Finding community structures in complex networks using mixed integer optimisation. Eur Phys J B 60:231–239
Article Google Scholar
Yemini Y (1978) The positioning problem—a draft of an intermediate summary. In: Proceedings of the conference on distributed sensor networks. Carnegie-Mellon University, Pittsburgh, pp 137–145
Yemini Y (1979) Some theoretical aspects of position-location problems. In: Proceedings of the 20th annual symposium on the foundations of computer science, pp. 1–8. IEEE, Piscataway
Yun C, Sra S, Jadbabaie A (2018) Global optimality conditions for deep neural networks. In: Proceedings of the 6th international conference on learning representations. ICLR, La Jolla, CA
Zhang L, Mahdavi M, Jin R, Yang T, Zhu S (2013) Recovering the optimal solution by dual random projection. In: Shalev-Shwartz S, Steinwart I (eds) Conference on learning theory (COLT), Proceedings of machine learning research, vol 30, pp 135–157. \(\langle\) http://mlr.org \(\rangle\)

Download references

Acknowledgements

I am grateful to J.J. Salazar, the Editor-in-Chief of TOP, for inviting me to write this survey. This work would not have been possible without the numerous co-authors with whom I pursued my investigations in distance geometry, among which I will single out the longest-standing: C. Lavor, N. Maculan, and A. Mucherino. I have first heard of concentration of measure as I passed by D. Malioutov’s office at the T.J. Watson IBM Research laboratory: the door was open, the Johnson-Lindenstrauss lemma was mentioned, and I could not refrain from interrupting the conversation and asking for clarification, as I thought that there must surely be a mistake; incredibly, the result was true, and I am grateful to Dr. Malioutov for hosting the conversation I eavesdropped on. I am very thankful to the co-authors who helped me investigate random projections, in particular P.L. Poirion and K. Vu, without whom none of our papers would have been possible. I learned about the existence of the distance instability result thanks to N. Gayraud, who was in the audience during a talk I gave, and suggested it to me as I expressed puzzlement at the poor quality of k-means clusterings. João Fontes Gonçalves, a student in my M.Sc. course, first made the remark following Eq. (6.1.3) (“why are you optimizing a constant?”). I am very grateful to S. Khalife, D. Gonçalves, and M. Escobar for reading the manuscript and making insightful comments. This research was partly funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement n. 764759 ETN “MINOA”.

Author information

Authors and Affiliations

LIX CNRS, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
Leo Liberti

Authors

Leo Liberti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leo Liberti.

Additional information

Dedicated to the memory of Mariano Bellasio (1943–2019).

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This invited paper is discussed in the comments available at https://doi.org/10.1007/s11750-020-00560-3, https://doi.org/10.1007/s11750-020-00561-2, https://doi.org/10.1007/s11750-020-00562-1.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liberti, L. Distance geometry and data science. TOP 28, 271–339 (2020). https://doi.org/10.1007/s11750-020-00563-0

Download citation

Published: 05 June 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s11750-020-00563-0

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distance geometry and data science

Abstract

Access this article

Similar content being viewed by others

Similarity Between Points in Metric Measure Spaces

Network geometry

Introduction

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Distance geometry and data science

Abstract

Access this article

Similar content being viewed by others

Similarity Between Points in Metric Measure Spaces

Network geometry

Introduction

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation