Skip to main content
Log in

Distance geometry and data science

  • Invited Paper
  • Published:
TOP Aims and scope Submit manuscript

Abstract

Data are often represented as graphs. Many common tasks in data science are based on distances between entities. While some data science methodologies natively take graphs as their input, there are many more that take their input in vectorial form. In this survey, we discuss the fundamental problem of mapping graphs to vectors, and its relation with mathematical programming. We discuss applications, solution methods, dimensional reduction techniques, and some of their limits. We then present an application of some of these ideas to neural networks, showing that distance geometry techniques can give competitive performance with respect to more traditional graph-to-vector mappings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Not to be confused with mathematical induction.

  2. http://blogs.princeton.edu/imabandit/2013/02/19/orf523-ipms-for-lps-and-sdps/.

  3. Also see http://math.stackexchange.com/questions/1882130/ for a compact derivation.

  4. A young and unknown George Dantzig had just finished his presentation of LP to an audience of “big shots”, including Koopmans and Von Neumann. Harold Hotelling raised his hand, and stated: “but we all know that the world is nonlinear!”, thereby obliterating the simplex method as a mathematical curiosity. Luckily, Von Neumann answered on Dantzig's behalf and in his defence (Dantzig 1983).

References

  • Abadi M et al (2015) TensorFlow: large-scale machine learning on heterogeneous systems. Software available from tensorflow.org. http://tensorflow.org/

  • Achlioptas D (2003) Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J Comput Syst Sci 66:671–687

    Article  Google Scholar 

  • Aggarwal C, Hinneburg A, Keim D (2001) On the surprising behavior of distance metrics in high dimensional space. In: den Bussche JV, Vianu V (eds) Proceedings of ICDT, LNCS, vol 1973. Springer, Berlin, pp 420–434

  • Ahmadi A, Majumdar A (2019) DSOS and SDSOS optimization: more tractable alternatives to sum of squares and semidefinite optimization. SIAM J Appl Algebra Geom 3(2):193–230

    Article  Google Scholar 

  • Ahmadi A, Jungers R, Parrilo P, Roozbehani M (2014) Joint spectral radius and path-complete graph Lyapunov functions. SIAM J Control Optim 52(1):687–717

    Article  Google Scholar 

  • Ailon N, Chazelle B (2006) Approximate nearest neighbors and fast Johnson–Lindenstrauss lemma. In: Proceedings of the symposium on the theory of computing, STOC, vol. ’06. ACM, Seattle

  • Alfakih A, Khandani A, Wolkowicz H (1999) Solving Euclidean distance matrix completion problems via semidefinite programming. Comput Optim Appl 12:13–30

    Article  Google Scholar 

  • Allen G (2012) Sparse higher-order principal components analysis. In: N. Lawrence, M. Girolami (eds) Proceedings of the international conference on Artificial intelligence and Statistics, vol 22, pp 27–36. PMLR, La Palma

  • Allen-Zhu Z, Gelashvili R, Micali S, Shavit N (2014) Sparse sign-consistent Johnson-Lindenstrauss matrices: Compression with neuroscience-based constraints. Proc Natl Acad Sci 111(47):16872–16876

    Article  Google Scholar 

  • Aloise D, Cafieri S, Caporossi G, Hansen P, Perron S, Liberti L (2010) Column generation algorithms for exact modularity maximization in networks. Phys Rev E 82(4):046112

    Article  Google Scholar 

  • Aloise D, Hansen P, Liberti L (2012) An improved column generation algorithm for minimum sum-of-squares clustering. Math Program A 131:195–220

    Article  Google Scholar 

  • Aloise D, Caporossi G, Hansen P, Liberti L, Perron S, Ruiz M (2013) Modularity maximization in networks by variable neighbourhood search. In: Bader D, Sanders P, Wagner D (eds) Graph partitioning and graph clustering, contemporary mathematics, vol 588. AMS, Providence, pp 113–127

    Chapter  Google Scholar 

  • Amaldi E, Liberti L, Maffioli F, Maculan N (2009) Edge-swapping algorithms for the minimum fundamental cycle basis problem. Math Methods Oper Res 69:205–223

    Article  Google Scholar 

  • Anderson J (1995) An introduction to neural networks. MIT Press, Cambridge

    Book  Google Scholar 

  • Arriaga R, Vempala S (2006) An algorithmic theory of learning: Robust concepts and random projection. Mach Learn 63:161–182

    Article  Google Scholar 

  • Asimow L, Roth B (1978) The rigidity of graphs. Trans AMS 245:279–289

    Article  Google Scholar 

  • Bahr A, Leonard J, Fallon M (2009) Cooperative localization for autonomous underwater vehicles. Int J Robot Res 28(6):714–728

    Article  Google Scholar 

  • Barker G, Carlson D (1975) Cones of diagonally dominant matrices. Pac J Math 57(1):15–32

    Article  Google Scholar 

  • Barvinok A (2002) A course in convexity, No. 54 in graduate studies in mathematics. AMS, Providence

  • Barvinok A (1995) Problems of distance geometry and convex properties of quadratic maps. Discrete Comput Geom 13:189–202

    Article  Google Scholar 

  • Barvinok A (1997) Measure concentration in optimization. Math Program 79:33–53

    Google Scholar 

  • Beeker N, Gaubert S, Glusa C, Liberti L (2013) Is the distance geometry problem in NP? In: Mucherino A., Lavor C., Liberti L., Maculan N. (eds) Distance geometry. Springer, New York, NY, pp 85–94

  • Belotti P, Lee J, Liberti L, Margot F, Wächter A (2009) Branching and bounds tightening techniques for non-convex MINLP. Optim Methods Softw 24(4):597–634

    Article  Google Scholar 

  • ben Judah of Worms E (XII-XIII Century) Sodei Razayya 

  • Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. Advances in neural information processing systems. NIPS, vol 19. MIT Press, Cambridge, pp 153–160

    Google Scholar 

  • Ben-Tal A, Ghaoui LE, Nemirovski A (2009) Robust optimization. Princeton University Press, Princeton

    Book  Google Scholar 

  • Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1998) When is “nearest neighbor” meaningful? In: Beeri C, Buneman P (eds) Proceedings of ICDT, LNCS, vol 1540. Springer, Heidelberg, pp 217–235

  • Bird S, Klein E, Loper E (2009) Natural language processing with Python. O’Reilly, Cambridge

    Google Scholar 

  • Birge J, Louveaux F (2011) Introduction to stochastic programming. Springer, New York

    Book  Google Scholar 

  • Blömer J, Lammersen C, Schmidt M, Sohler C (2016) Theoretical analysis of the k-means algorithm: a survey. In: Kliemann L, Sanders P (eds) Algorithm engineering, LNCS, vol 9220. Springer, Cham, pp 81–116

    Chapter  Google Scholar 

  • Blumenthal L (1953) Theory and applications of distance geometry. Oxford University Press, Oxford

    Google Scholar 

  • Böhm C, Jacopini G (1966) Flow diagrams, Turing machines and languages with only two formation rules. Commun ACM 9(5):366–371

    Article  Google Scholar 

  • Bollobás B (1998) Modern graph theory. Springer, New York

    Book  Google Scholar 

  • Borg I, Groenen P (2010) Modern multidimensional scaling, 2nd edn. Springer, New York

    Google Scholar 

  • Bottou L (2012) Stochastic gradient descent tricks. In: Montavon G et al (eds) Neural networks: tricks of the trade, LNCS, vol 7700. Springer, Berlin, pp 421–436

    Chapter  Google Scholar 

  • Bourgain J (1985) On Lipschitz embeddings of finite metric spaces in Hilbert space. Isr J Math 52(1–2):46–52

    Article  Google Scholar 

  • Boutsidis C, Zouzias A, Drineas P (2010) Random projections for \(k\)-means clustering. Advances in neural information processing systems. NIPS. NIPS Foundation, La Jolla, pp 298–306

    Google Scholar 

  • Brambilla A, Premoli A (2001) Rigorous event-driven (red) analysis of large-scale nonlinear rc circuits. IEEE Trans Circ Syst I Fundam Theory Appl 48(8):938–946

    Article  Google Scholar 

  • Brandes U, Delling D, Gaertler M, Görke R, Hoefer M, Nikoloski Z, Wagner D (2008) On modularity clustering. IEEE Trans Knowl Data Eng 20(2):172–188

    Article  Google Scholar 

  • Cafieri S, Hansen P, Liberti L (2010) Loops and multiple edges in modularity maximization of networks. Phys Rev E 81(4):46102

    Article  Google Scholar 

  • Cafieri S, Hansen P, Liberti L (2011) Locally optimal heuristic for modularity maximization of networks. Phys Rev E 83(056105):1–8

    Google Scholar 

  • Cafieri S, Hansen P, Liberti L (2014) Improving heuristics for network modularity maximization using an exact algorithm. Discrete Appl Math 163:65–72

    Article  Google Scholar 

  • Cauchy AL (1813) Sur les polygones et les polyèdres. Journal de l’École Polytechnique 16(9):87–99

    Google Scholar 

  • Cayley A (1841) A theorem in the geometry of position. Camb Math J II:267–271

    Google Scholar 

  • Chollet F et al (2015) Keras. https://keras.io

  • Chomsky N (1965) Aspects of the theory of syntax. MIT Press, Cambridge

    Google Scholar 

  • Choromanska A, Henaff M, Mathieu M, Arous GB, LeCun Y (2015) The loss surfaces of multilayer networks. In: Proceedings of the international conference on artificial intelligence and statistics, AISTATS, vol 18. JMLR, San Diego

  • COIN-OR (2006) Introduction to IPOPT: a tutorial for downloading, installing, and using IPOPT

  • Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2461–2505

    Google Scholar 

  • Connelly R (1978) A counterexample to the rigidity conjecture for polyhedra. Publications Mathématiques de l’IHES 47:333–338

    Article  Google Scholar 

  • Cox T, Cox M (2001) Multidimensional scaling. Chapman & Hall, Boca Raton

    Google Scholar 

  • D’Ambrosio C, Liberti L (2017) Distance geometry in linearizable norms. In: Nielsen F, Barbaresco F (eds) Geometric science of information, LNCS, vol 10589. Springer, Berlin, pp 830–838

    Chapter  Google Scholar 

  • D’Ambrosio C, Liberti L, Poirion PL, Vu K (2019) Random projections for quadratic programming. Math Program B (in revision)

  • D’Ambrosio C, Liberti L, Poirion PL, Vu K (2019) Random projections for quadratic programming. Tech. Rep. 2019-7-7322, Optimization Online

  • Dantzig G (1983) Reminiscences about the origins of linear programming. In: Bachem A, Grötschel M, Korte B (eds) Mathematical programming: the state of the art. Springer, Berlin

    Google Scholar 

  • Dasgupta S, Gupta A (2002) An elementary proof of a theorem by Johnson and Lindenstrauss. Random Struct Algorithms 22:60–65

    Article  Google Scholar 

  • D’Aspremont A, Bach F, Ghaoui LE (2014) Approximation bounds for sparse principal component analysis. Math Program B 148:89–110

    Article  Google Scholar 

  • Dattorro J (2015) Convex optimization and Euclidean distance geometry. \({\cal M}\epsilon \beta oo\), Palo Alto

  • Dauphin Y, Pascanu R, Gulcehre C, Cho K, Ganguli S, Bengio Y (2014) Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. Advances in neural information processing systems. NIPS. NIPS Foundation, La Jolla, pp 2933–2941

    Google Scholar 

  • Demartines P, Hérault J (1997) Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets. IEEE Trans Neural Netw 8(1):148–154

    Article  Google Scholar 

  • Deo N, Prabhu G, Krishnamoorthy M (1982) Algorithms for generating fundamental cycles in a graph. ACM Trans Math Softw 8(1):26–42

    Article  Google Scholar 

  • Dey S, Mazumder R, Molinaro M, Wang G (2017) Sparse principal component analysis and its \(\ell _1\)-relaxation. Tech. Rep. arXiv:1712.00800v1

  • Dias G, Liberti L (2016) Diagonally dominant programming in distance geometry. In: Cerulli R, Fujishige S, Mahjoub R (eds) International symposium in combinatorial optimization, LNCS, vol 9849. Springer, New York, pp 225–236

    Chapter  Google Scholar 

  • Douven I (2017) Abduction. In: Zalta E (ed) The Stanford encyclopedia of philosophy. Stanford University, Stanford

    Google Scholar 

  • Durrant R, Kabán A (2009) When is ‘nearest neighbour’ meaningful: a converse theorem and implications. J Complex 25:385–397

    Article  Google Scholar 

  • Eco U (1983) Horns, hooves, insteps. Some hypotheses on three kinds of abduction. In: Eco U, Sebeok T (eds) Dupin, Holmes. Peirce. The Sign of Three. Indiana University Press, Bloomington

  • Eco U (1984) Semiotics and the philosophy of language. Indiana University Press, Bloomington

    Book  Google Scholar 

  • Eren T, Goldenberg D, Whiteley W, Yang Y, Morse A, Anderson B, Belhumeur P (2004) Rigidity, computation, and randomization in network localization. IEEE, pp 2673–2684

  • Euler L (1862) Continuatio fragmentorum ex adversariis mathematicis depromptorum: II Geometria, 97. In: Fuss P, Fuss N (eds) Opera postuma mathematica et physica anno 1844 detecta, vol I. Eggers & C, Petropolis, pp 494–496

    Google Scholar 

  • Fiedler M (1973) Algebraic connectivity of graphs. Czechoslov Math J 23(2):298–305

    Google Scholar 

  • Flexer A, Schnitzer D (2015) Choosing \(\ell _p\) norms in high-dimensional spaces based on hub analysis. Neurocomputing 169:281–287

    Article  Google Scholar 

  • Floreano D (1996) Manuale sulle Reti Neurali Il. Mulino, Bologna

    Google Scholar 

  • Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174

    Article  Google Scholar 

  • François D, Wertz V, Verleysen M (2007) The concentration of fractional distances. IEEE Trans Knowl Data Eng 19(7):873–886

    Article  Google Scholar 

  • Friedler F, Huang Y, Fan L (1992) Combinatorial algorithms for process synthesis. Comput Chem Eng 16(1):313–320

    Article  Google Scholar 

  • Gayraud N (2017) Public remark. Le Monde des Mathématiques Industrielles at INRIA Sophia-Antipolis (MOMI17)

  • Gilbreth F, Gilbreth L (1921) Process charts: first steps in finding the one best way to do work. In: Proceedings of the annual meeting. American Society of Mechanical Engineers, New York

  • Gill P (2006) User’s guide for SNOPT version 7.2. Systems Optimization Laboratory, Stanford University, California

  • Gödel K (1986) On the isometric embeddability of quadruples of points of \(r_3\) in the surface of a sphere. In: Feferman S, Dawson J, Kleene S, Moore G, Solovay R, van Heijenoort J (eds) Kurt Gödel: collected works, vol I, pp (1933b) 276–279. Oxford University Press, Oxford

    Google Scholar 

  • Gonçalves D, Mucherino A, Lavor C, Liberti L (2017) Recent advances on the interval distance geometry problem. J Glob Optim 69:525–545

    Article  Google Scholar 

  • Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge

    Google Scholar 

  • Haeffele B, Vidal R (2017) Global optimality in neural network training. In: Proceedings of the conference in computer vision and pattern recognition, CVPR. IEEE, Piscataway, pp 4390–4398

  • Hagberg A, Schult D, Swart P (2008) Exploring network structure, dynamics, and function using NetworkX. In: Varoquaux G, Vaught T, Millman J (eds) Proceedings of the 7th Python in science conference (SciPy2008), Pasadena, pp 11–15

  • Hansen P, Jaumard B (1997) Cluster analysis and mathematical programming. Math Program 79:191–215

    Google Scholar 

  • Henneberg L (1911) Die Graphische Statik der starren Systeme. Teubner, Leipzig

    Google Scholar 

  • Heron (50AD) Metrica, vol I. Alexandria

  • Hinneburg A, Aggarwal C, Keim D (2000) What is the nearest neighbor in high dimensional spaces? In: Proceedings of the conference on very large databases, VLDB, vol 26. Morgan Kaufman, San Francisco, pp. 506–515

  • Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(6):417–441

    Article  Google Scholar 

  • IBM (2017) ILOG CPLEX 12.8 User’s Manual. IBM

  • Indyk P (2001) Algorithmic applications of low-distortion geometric embeddings. Foundations of computer science. FOCS, vol 42. IEEE, Washington, DC, pp 10–33

    Google Scholar 

  • Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the symposium on the theory of computing, STOC, vol 30. ACM, New York, pp 604–613

  • Indyk P, Naor A (2007) Nearest neighbor preserving embeddings. ACM Trans Algorithms 3(3), Art. 31

  • Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  • Johnson W, Lindenstrauss J (1984) Extensions of Lipschitz mappings into a Hilbert space. In: Hedlund G (ed) Conference in modern analysis and probability, contemporary mathematics, vol 26. AMS, Providence, pp 189–206

    Chapter  Google Scholar 

  • Jolliffe I (2010) Principal component analysis, 2nd edn. Springer, Berlin

    Google Scholar 

  • Jordan M (1995) Why the logistic function? A tutorial discussion on probabilities and neural networks. Tech. Rep. Computational Cognitive Science TR 9503, MIT

  • Kane D, Nelson J (2014) Sparser Johnson–Lindenstrauss transforms. J ACM 61(1):4

    Article  Google Scholar 

  • Kantor I, Matoušek J, Šámal R (2015) Mathematics++: selected topics beyond the basic courses. No. 75 in Student Mathematical Library. AMS, Providence

  • Khalife S, Liberti L, Vazirgiannis M (2019) Geometry and analogies: a study and propagation method for word representation. In: Statistical language and speech processing, SLSP, vol. 7

  • Kingma D, Ba J (2015)Adam: A method for stochastic optimization. In: Proceedings of ICLR. San Diego

  • Knuth D (1997) The art of computer programming, part I: fundamental algorithms, 3rd edn. Addison-Wesley, Reading

  • Kullback S, Leibler R (1951) On information and sufficiency. Ann Math Stat 22(1):79–86

    Article  Google Scholar 

  • Kuratowski C (1935) Quelques problèmes concernant les espaces métriques non-séparables. Fundam Math 25:534–545

    Article  Google Scholar 

  • Lavor C, Liberti L, Maculan N (2006) Computational experience with the molecular distance geometry problem. In: Pintér J (ed) Global optimization: scientific and engineering case studies. Springer, Berlin, pp 213–225

    Chapter  Google Scholar 

  • Lavor C, Liberti L, Maculan N, Mucherino A (2012) The discretizable molecular distance geometry problem. Comput Optim Appl 52:115–146

    Article  Google Scholar 

  • Lavor C, Liberti L, Mucherino A (2013) The interval Branch-and-Prune algorithm for the discretizable molecular distance geometry problem with inexact distances. J Glob Optim 56:855–871

    Article  Google Scholar 

  • Lavor C, Liberti L, Donald B, Worley B, Bardiaux B, Malliavin T, Nilges M (2019) Minimal NMR distance information for rigidity of protein graphs. Discrete Appl Math 256:91–104

    Article  Google Scholar 

  • Lavor C, Souza M, Carvalho L, Liberti L (2019) On the polynomiality of finding \({}^K\)DMDGP re-orders. Discrete Appl Math 267:190–194

    Article  Google Scholar 

  • Lehmann S, Hansen L (2007) Deterministic modularity optimization. Eur Phys J B 60:83–88

    Article  Google Scholar 

  • Levine R, Mason T, Brown D (1995) Lex and Yacc, 2nd edn. O’Reilly, Cambridge

    Google Scholar 

  • Liberti L (2010) Software modelling and architecture: exercises. Ecole Polytechnique. https://www.lix.polytechnique.fr/~liberti/swarchex.pdf

  • Liberti L (2009) Reformulations in mathematical programming: definitions and systematics. RAIRO-RO 43(1):55–86

    Article  Google Scholar 

  • Liberti L (2019) Undecidability and hardness in mixed-integer nonlinear programming. RAIRO-Oper Res 53:81–109

    Article  Google Scholar 

  • Liberti L, Lavor C (2013) On a relationship between graph realizability and distance matrix completion. In: Migdalas A, Sifaleras A, Georgiadis C, Papathanaiou J, Stiakakis E (eds) Optimization theory, decision making, and operational research applications, proceedings in mathematics & statistics, vol 31. Springer, Berlin, pp 39–48

    Chapter  Google Scholar 

  • Liberti L, Lavor C (2016) Six mathematical gems in the history of distance geometry. Int Trans Oper Res 23:897–920

    Article  Google Scholar 

  • Liberti L, Lavor C (2017) Euclidean distance geometry: an introduction. Springer, New York

    Book  Google Scholar 

  • Liberti L, Marinelli F (2014) Mathematical programming: Turing completeness and applications to software analysis. J Comb Optim 28(1):82–104

    Article  Google Scholar 

  • Liberti L, Vu K (2018) Barvinok’s naive algorithm in distance geometry. Oper Res Lett 46:476–481

    Article  Google Scholar 

  • Liberti L, Lavor C, Maculan N (2008) A branch-and-prune algorithm for the molecular distance geometry problem. Int Trans Oper Res 15:1–17

    Article  Google Scholar 

  • Liberti L, Cafieri S, Tarissan F (2009) Reformulations in mathematical programming: a computational approach. In: Abraham A, Hassanien AE, Siarry P, Engelbrecht A (eds) Foundations of computational intelligence, vol 3. no 203 in Studies in Computational Intelligence. Springer, Berlin, pp 153–234

    Google Scholar 

  • Liberti L, Cafieri S, Savourey D (2010) Reformulation optimization software engine. In: Fukuda K, van der Hoeven J, Joswig M, Takayama N (eds) Mathematical software, LNCS, vol 6327. Springer, New York, pp 303–314

    Google Scholar 

  • Liberti L, Lavor C, Mucherino A, Maculan N (2010) Molecular distance geometry methods: from continuous to discrete. Int Trans Oper Res 18:33–51

    Article  Google Scholar 

  • Liberti L, Lavor C, Alencar J, Abud G (2013) Counting the number of solutions of \({}^k\)DMDGP instances. In: Nielsen F, Barbaresco F (eds) Geometric science of information, LNCS, vol 8085. Springer, New York, pp 224–230

    Chapter  Google Scholar 

  • Liberti L, Lavor C, Maculan N, Mucherino A (2014) Euclidean distance geometry and applications. SIAM Rev 56(1):3–69

    Article  Google Scholar 

  • Liberti L, Masson B, Lavor C, Lee J, Mucherino A (2014) On the number of realizations of certain Henneberg graphs arising in protein conformation. Discrete Appl Math 165:213–232

    Article  Google Scholar 

  • Liberti L, Swirszcz G, Lavor C (2016) Distance geometry on the sphere. In: Akiyama J et al (eds) JCDCG\({}^2\), LNCS, vol 9943. Springer, New York, pp 204–215

    Google Scholar 

  • Liberti L, D’Ambrosio C (2017) The Isomap algorithm in distance geometry. In: Iliopoulos C, Pissis S, Puglisi S, Raman R (eds) Proceedings of 16th international symposium on experimental algorithms (SEA), LIPICS, vol 75. Dagstuhl Publishing, Schloss Dagstuhl, pp 5:1–5:13

  • Liberti L, Lavor C, Mucherino A (2013) The discretizable molecular distance geometry problem seems easier on proteins. In: Mucherino A, Lavor C, Liberti L, Maculan N (eds) Distance geometry: theory, methods and applications. Springer, New York, pp 47–60

    Chapter  Google Scholar 

  • Linial N, London E, Rabinovich Y (1995) The geometry of graphs and some of its algorithmic applications. Combinatorica 15(2):215–245

    Article  Google Scholar 

  • Majumdar A, Ahmadi A, Tedrake R (2014) Control and verification of high-dimensional systems with dsos and sdsos programming. Conference on decision and control, vol 53. Piscataway, IEEE, pp 394–401

    Chapter  Google Scholar 

  • Malliavin T, Mucherino A, Lavor C, Liberti L (2019) Systematic exploration of protein conformational space using a distance geometry approach. J Chem Inf Model 59:4486–4503

    Article  Google Scholar 

  • Manning C, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge

    Google Scholar 

  • Mansouri J, Khademi M (2015) Multiplicative distance: a method to alleviate distance instability for high-dimensional data. Knowl Inf Syst 45:783–805

    Article  Google Scholar 

  • Matoušek J (2013) Lecture notes on metric embeddings. Tech. rep, ETH Zürich

  • Matoušek J (2008) On variants of the Johnson-Lindenstrauss lemma. Random Struct Algorithms 33:142–156

    Article  Google Scholar 

  • Maxwell J (1864) On the calculation of the equilibrium and stiffness of frames. Philos Mag 27(182):294–299

    Article  Google Scholar 

  • McCormick G (1976) Computability of global solutions to factorable nonconvex programs: Part I-Convex underestimating problems. Math Program 10:146–175

    Article  Google Scholar 

  • McCulloch W (1961) What is a number, that a man may know it, and a man, that he may know a number? Gen Semant Bull 26–27:7–18

    Google Scholar 

  • Mencarelli L, Sahraoui Y, Liberti L (2017) A multiplicative weights update algorithm for MINLP. EURO J Comput Optim 5:31–86

    Article  Google Scholar 

  • Menger K (1928) Untersuchungen über allgemeine Metrik. Math Ann 100:75–163

    Article  Google Scholar 

  • Menger K (1931) New foundation of Euclidean geometry. Am J Math 53(4):721–745

    Article  Google Scholar 

  • Merris R (1994) Laplacian matrices of graphs: a survey. Linear Algebra Appl 198:143–176

    Article  Google Scholar 

  • Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Burges C, Bottou L, Welling M, Ghahramani Z, Weinberger K (eds) Advances in neural information processing systems, NIPS, vol 26. NIPS Foundation, La Jolla, pp 3111–3119

    Google Scholar 

  • Miller G (1995) Wordnet: a lexical database for English. Commun ACM 38(11):39–41

    Article  Google Scholar 

  • Milnor J (1964) On the Betti numbers of real varieties. Proc AMS 15:275–280

    Article  Google Scholar 

  • Minsky M (1986) The society of mind. Simon & Schuster, New York

    Google Scholar 

  • Moitra A (2018) Algorithmic aspects of machine learning. CUP, Cambridge

    Book  Google Scholar 

  • Moro A (2008) The boundaries of Babel. MIT Press, Cambridge

    Book  Google Scholar 

  • Morris C (1946) Signs. Language and behavior. Prentice-Hall, New York

    Book  Google Scholar 

  • Mucherino A, Lavor C, Liberti L (2012) Exploiting symmetry properties of the discretizable molecular distance geometry problem. J Bioinform Comput Biol 10(1–15):1242009

    Article  Google Scholar 

  • Mucherino A, Lavor C, Liberti L, Maculan N (eds) (2013) Distance geometry: theory, methods, and applications. Springer, New York

    Google Scholar 

  • Newman M, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113

    Article  Google Scholar 

  • Object Management Group (2005) Unified modelling language: superstructure, v. 2.0. Tech. Rep. formal/05-07-04, OMG

  • O’Donoghue B, Chu E, Parikh N, Boyd S (2016) Operator splitting for conic optimization via homogeneous self-dual embedding. J Optim Theory Appl 169(3):1042–1068

    Article  Google Scholar 

  • Paton K (1969) An algorithm for finding a fundamental set of cycles of a graph. Commun ACM 12(9):514–518

    Article  Google Scholar 

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    Google Scholar 

  • Peirce C (1878) Illustrations of the logic of science, part 6: induction, deduction, and hypothesis. Popul Sci Mon 13:470–482

    Google Scholar 

  • Penrose R (1989) The emperor’s new mind. Penguin, New York

    Google Scholar 

  • Pfeffer A (2016) Practical probabilistic programming. Manning Publications, Shelter Island

    Google Scholar 

  • Popper K (1968) The logic of scientific discovery. Hutchinson, London

    Google Scholar 

  • Potra F, Wright S (2000) Interior-point methods. J Comput Appl Math 124:281–302

    Article  Google Scholar 

  • Proni G (2016) Is there abduction in Aristotle? Peirce, Eco, and some further remarks. Ocula 17:1–14

    Google Scholar 

  • Radovanović M, Nanopoulos A, Ivanović M (2010) Hubs in space: Popular nearest neighbors in high-dimensional data. J Mach Learn Res 11:2487–2531

    Google Scholar 

  • Rousseau F, Vazirgiannis M (2013) Graph-of-word and TW-IDF: new approach to ad hoc IR. In: Proceedings of CIKM. ACM, New York

  • Saerens M, Fouss F, Yen L, Dupont P (2004) The principal components analysis of a graph, and its relationships to spectral clustering. In: Boulicaut JF, Esposito F, Giannotti F, Pedreschi D (eds) Proceedings of the European conference in machine learning (ECML), LNAI, vol 3201. Springer, Berlin, pp 371–383

  • Salgado E, Scozzari A, Tardella F, Liberti L (2018) Alternating current optimal power flow with generator selection. In: Lee J, Rinaldi G, Mahjoub R (eds) Combinatorial optimization (Proceedings of ISCO 2018), LNCS, vol 10856, pp 364–375

  • Sánchez AB, Lavor C (2020) On the estimation of unknown distances for a class of Euclidean distance matrix completion problems with interval data. Linear Algebra Appl 592:287–305

    Article  Google Scholar 

  • Saxe J (1979) Embeddability of weighted graphs in \(k\)-space is strongly NP-hard. In: Proceedings of 17th Allerton conference in communications, control and computing, pp 480–489

  • Schaeffer S (2007) Graph clustering. Comput Sci Rev 1:27–64

    Article  Google Scholar 

  • Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117. https://doi.org/10.1016/j.neunet.2014.09.003. arXiv:1404.7828 [cs.NE]

  • Schoenberg I (1935) Remarks to Maurice Fréchet’s article Sur la définition axiomatique d’une classe d’espaces distanciés vectoriellement applicable sur l’espace de Hilbert. Ann Math 36(3):724–732

    Article  Google Scholar 

  • Schumacher M, Roßner R, Vach W (1996) Neural networks and logistic regression: part I. Comput Stat Data Anal 21:661–682

    Article  Google Scholar 

  • Seshu S, Reed M (1961) Linear graphs and electrical networks. Addison-Wesley, Reading

  • Singer A (2011) Angular synchronization by eigenvectors and semidefinite programming. Appl Comput Harmon Anal 30:20–36

    Article  Google Scholar 

  • Smith E, Pantelides C (1999) A symbolic reformulation/spatial branch-and-bound algorithm for the global optimisation of nonconvex MINLPs. Comput Chem Eng 23:457–478

    Article  Google Scholar 

  • Steinhaus H (1956) Sur la division des corps matériels en parties. Bulletin de l’Académie Polonaise des Sciences Cl. III 4(12):801–804

  • Tabaghi P, Dokmanić I, Vetterli M (2019) On the move: localization with kinetic Euclidean distance matrices. In: International conference on acoustics, speech and signal processing (ICASSP). IEEE, Piscataway

  • Tawarmalani M, Sahinidis N (2004) Global optimization of mixed integer nonlinear programs: a theoretical and computational study. Math Program 99:563–591

    Article  Google Scholar 

  • Tenenbaum J, de Silva V, Langford J (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2322

    Article  Google Scholar 

  • Thoreau H (1849) Resistance to civil government. In: Peabody E (ed) Æsthetic papers. J. Wilson, Boston

  • van Rossum G et al (2019) Python language reference, version 3. Python Software Foundation

  • Vavasis S (1991) Nonlinear optimization: complexity issues. Oxford University Press, Oxford

    Google Scholar 

  • Vempala S (2004) The Random projection method. No. 65 in DIMACS series in discrete mathematics and theoretical computer science. AMS, Providence

  • Venkatasubramanian S, Wang Q (2011) The Johnson–Lindenstrauss transform: an empirical study. Algorithm engineering and experiments. ALENEX, vol 13. SIAM, Providence, pp 164–173

    Google Scholar 

  • Verboon A (2014) The medieval tree of Porphyry: an organic structure of logic. In: Worm A, Salonis P (eds) The Tree. Symbol, allegory and structural device in medieval art and thought, international medieval research, vol 20. Brepols, Turnhout, pp 83–101

  • Vershynin R (2018) High-dimensional probability. CUP, Cambridge

    Book  Google Scholar 

  • Vidal R, Ma Y, Sastry S (2016) Generalized principal component analysis. Springer, New York

    Book  Google Scholar 

  • Vu K, Poirion PL, Liberti L (2018) Random projections for linear programming. Math Oper Res 43(4):1051–1071

    Article  Google Scholar 

  • Vu K, Poirion PL, D’Ambrosio C, Liberti L (2019) Random projections for quadratic programs over a Euclidean ball. In: Lodi A et al (eds) Integer programming and combinatorial optimization (IPCO), LNCS, vol 11480. Springer, New York, pp 442–452

    Chapter  Google Scholar 

  • Vu K, Poirion PL, Liberti L (2019) Gaussian random projections for Euclidean membership problems. Discrete Appl Math 253:93–102

    Article  Google Scholar 

  • Wikipedia: Civil disobedience (thoreau) (2019). http://en.wikipedia.org/wiki/Civil_Disobedience_(Thoreau). [Online; accessed 190804]

  • Wikipedia: Computational pragmatics (2019). http://en.wikipedia.org/wiki/Computational_pragmatics. [Online; accessed 190802]

  • Wikipedia: Diagonally dominant matrix (2019). http://en.wikipedia.org/wiki/Diagonally_dominant_matrix. [Online; accessed 190716]

  • Wikipedia: Flowchart (2019). http://en.wikipedia.org/wiki/Flochart. [Online; accessed 190802]

  • Wikipedia: Principal component analysis (2019). http://en.wikipedia.org/wiki/Principal_component_analysis. [Online; accessed 190726]

  • Wikipedia: Rectifier (neurl networks) (2019). http://en.wikipedia.org/wiki/Rectifier_(neural_networks). [Online; accessed 190807]

  • Wikipedia: Slutsky’s theorem (2019). http://en.wikipedia.org/wiki/Slutsky%27s_theorem. [Online; accessed 190802]

  • Williams H (1999) Model building in mathematical programming, 4th edn. Wiley, Chichester

    Google Scholar 

  • Woodruff D (2014) Sketching as a tool for linear algebra. Found Trends Theor Comput Sci 10(1–2):1–157

    Article  Google Scholar 

  • Wüthrich K (1989) Protein structure determination in solution by nuclear magnetic resonance spectroscopy. Science 243:45–50

    Article  Google Scholar 

  • Xu G, Tsoka S, Papageorgiou L (2007) Finding community structures in complex networks using mixed integer optimisation. Eur Phys J B 60:231–239

    Article  Google Scholar 

  • Yemini Y (1978) The positioning problem—a draft of an intermediate summary. In: Proceedings of the conference on distributed sensor networks. Carnegie-Mellon University, Pittsburgh, pp 137–145

  • Yemini Y (1979) Some theoretical aspects of position-location problems. In: Proceedings of the 20th annual symposium on the foundations of computer science, pp. 1–8. IEEE, Piscataway

  • Yun C, Sra S, Jadbabaie A (2018) Global optimality conditions for deep neural networks. In: Proceedings of the 6th international conference on learning representations. ICLR, La Jolla, CA

  • Zhang L, Mahdavi M, Jin R, Yang T, Zhu S (2013) Recovering the optimal solution by dual random projection. In: Shalev-Shwartz S, Steinwart I (eds) Conference on learning theory (COLT), Proceedings of machine learning research, vol 30, pp 135–157. \(\langle\) http://mlr.org \(\rangle\)

Download references

Acknowledgements

I am grateful to J.J. Salazar, the Editor-in-Chief of TOP, for inviting me to write this survey. This work would not have been possible without the numerous co-authors with whom I pursued my investigations in distance geometry, among which I will single out the longest-standing: C. Lavor, N. Maculan, and A. Mucherino. I have first heard of concentration of measure as I passed by D. Malioutov’s office at the T.J. Watson IBM Research laboratory: the door was open, the Johnson-Lindenstrauss lemma was mentioned, and I could not refrain from interrupting the conversation and asking for clarification, as I thought that there must surely be a mistake; incredibly, the result was true, and I am grateful to Dr. Malioutov for hosting the conversation I eavesdropped on. I am very thankful to the co-authors who helped me investigate random projections, in particular P.L. Poirion and K. Vu, without whom none of our papers would have been possible. I learned about the existence of the distance instability result thanks to N. Gayraud, who was in the audience during a talk I gave, and suggested it to me as I expressed puzzlement at the poor quality of k-means clusterings. João Fontes Gonçalves, a student in my M.Sc. course, first made the remark following Eq. (6.1.3) (“why are you optimizing a constant?”). I am very grateful to S. Khalife, D. Gonçalves, and M. Escobar for reading the manuscript and making insightful comments. This research was partly funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement n. 764759 ETN “MINOA”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leo Liberti.

Additional information

Dedicated to the memory of Mariano Bellasio (1943–2019).

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This invited paper is discussed in the comments available at https://doi.org/10.1007/s11750-020-00560-3, https://doi.org/10.1007/s11750-020-00561-2, https://doi.org/10.1007/s11750-020-00562-1.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liberti, L. Distance geometry and data science. TOP 28, 271–339 (2020). https://doi.org/10.1007/s11750-020-00563-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11750-020-00563-0

Keywords

Mathematics Subject Classification

Navigation