Abstract
Data are often represented as graphs. Many common tasks in data science are based on distances between entities. While some data science methodologies natively take graphs as their input, there are many more that take their input in vectorial form. In this survey, we discuss the fundamental problem of mapping graphs to vectors, and its relation with mathematical programming. We discuss applications, solution methods, dimensional reduction techniques, and some of their limits. We then present an application of some of these ideas to neural networks, showing that distance geometry techniques can give competitive performance with respect to more traditional graph-to-vector mappings.
Similar content being viewed by others
Notes
Not to be confused with mathematical induction.
Also see http://math.stackexchange.com/questions/1882130/ for a compact derivation.
A young and unknown George Dantzig had just finished his presentation of LP to an audience of “big shots”, including Koopmans and Von Neumann. Harold Hotelling raised his hand, and stated: “but we all know that the world is nonlinear!”, thereby obliterating the simplex method as a mathematical curiosity. Luckily, Von Neumann answered on Dantzig's behalf and in his defence (Dantzig 1983).
References
Abadi M et al (2015) TensorFlow: large-scale machine learning on heterogeneous systems. Software available from tensorflow.org. http://tensorflow.org/
Achlioptas D (2003) Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J Comput Syst Sci 66:671–687
Aggarwal C, Hinneburg A, Keim D (2001) On the surprising behavior of distance metrics in high dimensional space. In: den Bussche JV, Vianu V (eds) Proceedings of ICDT, LNCS, vol 1973. Springer, Berlin, pp 420–434
Ahmadi A, Majumdar A (2019) DSOS and SDSOS optimization: more tractable alternatives to sum of squares and semidefinite optimization. SIAM J Appl Algebra Geom 3(2):193–230
Ahmadi A, Jungers R, Parrilo P, Roozbehani M (2014) Joint spectral radius and path-complete graph Lyapunov functions. SIAM J Control Optim 52(1):687–717
Ailon N, Chazelle B (2006) Approximate nearest neighbors and fast Johnson–Lindenstrauss lemma. In: Proceedings of the symposium on the theory of computing, STOC, vol. ’06. ACM, Seattle
Alfakih A, Khandani A, Wolkowicz H (1999) Solving Euclidean distance matrix completion problems via semidefinite programming. Comput Optim Appl 12:13–30
Allen G (2012) Sparse higher-order principal components analysis. In: N. Lawrence, M. Girolami (eds) Proceedings of the international conference on Artificial intelligence and Statistics, vol 22, pp 27–36. PMLR, La Palma
Allen-Zhu Z, Gelashvili R, Micali S, Shavit N (2014) Sparse sign-consistent Johnson-Lindenstrauss matrices: Compression with neuroscience-based constraints. Proc Natl Acad Sci 111(47):16872–16876
Aloise D, Cafieri S, Caporossi G, Hansen P, Perron S, Liberti L (2010) Column generation algorithms for exact modularity maximization in networks. Phys Rev E 82(4):046112
Aloise D, Hansen P, Liberti L (2012) An improved column generation algorithm for minimum sum-of-squares clustering. Math Program A 131:195–220
Aloise D, Caporossi G, Hansen P, Liberti L, Perron S, Ruiz M (2013) Modularity maximization in networks by variable neighbourhood search. In: Bader D, Sanders P, Wagner D (eds) Graph partitioning and graph clustering, contemporary mathematics, vol 588. AMS, Providence, pp 113–127
Amaldi E, Liberti L, Maffioli F, Maculan N (2009) Edge-swapping algorithms for the minimum fundamental cycle basis problem. Math Methods Oper Res 69:205–223
Anderson J (1995) An introduction to neural networks. MIT Press, Cambridge
Arriaga R, Vempala S (2006) An algorithmic theory of learning: Robust concepts and random projection. Mach Learn 63:161–182
Asimow L, Roth B (1978) The rigidity of graphs. Trans AMS 245:279–289
Bahr A, Leonard J, Fallon M (2009) Cooperative localization for autonomous underwater vehicles. Int J Robot Res 28(6):714–728
Barker G, Carlson D (1975) Cones of diagonally dominant matrices. Pac J Math 57(1):15–32
Barvinok A (2002) A course in convexity, No. 54 in graduate studies in mathematics. AMS, Providence
Barvinok A (1995) Problems of distance geometry and convex properties of quadratic maps. Discrete Comput Geom 13:189–202
Barvinok A (1997) Measure concentration in optimization. Math Program 79:33–53
Beeker N, Gaubert S, Glusa C, Liberti L (2013) Is the distance geometry problem in NP? In: Mucherino A., Lavor C., Liberti L., Maculan N. (eds) Distance geometry. Springer, New York, NY, pp 85–94
Belotti P, Lee J, Liberti L, Margot F, Wächter A (2009) Branching and bounds tightening techniques for non-convex MINLP. Optim Methods Softw 24(4):597–634
ben Judah of Worms E (XII-XIII Century) Sodei Razayya
Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. Advances in neural information processing systems. NIPS, vol 19. MIT Press, Cambridge, pp 153–160
Ben-Tal A, Ghaoui LE, Nemirovski A (2009) Robust optimization. Princeton University Press, Princeton
Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1998) When is “nearest neighbor” meaningful? In: Beeri C, Buneman P (eds) Proceedings of ICDT, LNCS, vol 1540. Springer, Heidelberg, pp 217–235
Bird S, Klein E, Loper E (2009) Natural language processing with Python. O’Reilly, Cambridge
Birge J, Louveaux F (2011) Introduction to stochastic programming. Springer, New York
Blömer J, Lammersen C, Schmidt M, Sohler C (2016) Theoretical analysis of the k-means algorithm: a survey. In: Kliemann L, Sanders P (eds) Algorithm engineering, LNCS, vol 9220. Springer, Cham, pp 81–116
Blumenthal L (1953) Theory and applications of distance geometry. Oxford University Press, Oxford
Böhm C, Jacopini G (1966) Flow diagrams, Turing machines and languages with only two formation rules. Commun ACM 9(5):366–371
Bollobás B (1998) Modern graph theory. Springer, New York
Borg I, Groenen P (2010) Modern multidimensional scaling, 2nd edn. Springer, New York
Bottou L (2012) Stochastic gradient descent tricks. In: Montavon G et al (eds) Neural networks: tricks of the trade, LNCS, vol 7700. Springer, Berlin, pp 421–436
Bourgain J (1985) On Lipschitz embeddings of finite metric spaces in Hilbert space. Isr J Math 52(1–2):46–52
Boutsidis C, Zouzias A, Drineas P (2010) Random projections for \(k\)-means clustering. Advances in neural information processing systems. NIPS. NIPS Foundation, La Jolla, pp 298–306
Brambilla A, Premoli A (2001) Rigorous event-driven (red) analysis of large-scale nonlinear rc circuits. IEEE Trans Circ Syst I Fundam Theory Appl 48(8):938–946
Brandes U, Delling D, Gaertler M, Görke R, Hoefer M, Nikoloski Z, Wagner D (2008) On modularity clustering. IEEE Trans Knowl Data Eng 20(2):172–188
Cafieri S, Hansen P, Liberti L (2010) Loops and multiple edges in modularity maximization of networks. Phys Rev E 81(4):46102
Cafieri S, Hansen P, Liberti L (2011) Locally optimal heuristic for modularity maximization of networks. Phys Rev E 83(056105):1–8
Cafieri S, Hansen P, Liberti L (2014) Improving heuristics for network modularity maximization using an exact algorithm. Discrete Appl Math 163:65–72
Cauchy AL (1813) Sur les polygones et les polyèdres. Journal de l’École Polytechnique 16(9):87–99
Cayley A (1841) A theorem in the geometry of position. Camb Math J II:267–271
Chollet F et al (2015) Keras. https://keras.io
Chomsky N (1965) Aspects of the theory of syntax. MIT Press, Cambridge
Choromanska A, Henaff M, Mathieu M, Arous GB, LeCun Y (2015) The loss surfaces of multilayer networks. In: Proceedings of the international conference on artificial intelligence and statistics, AISTATS, vol 18. JMLR, San Diego
COIN-OR (2006) Introduction to IPOPT: a tutorial for downloading, installing, and using IPOPT
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2461–2505
Connelly R (1978) A counterexample to the rigidity conjecture for polyhedra. Publications Mathématiques de l’IHES 47:333–338
Cox T, Cox M (2001) Multidimensional scaling. Chapman & Hall, Boca Raton
D’Ambrosio C, Liberti L (2017) Distance geometry in linearizable norms. In: Nielsen F, Barbaresco F (eds) Geometric science of information, LNCS, vol 10589. Springer, Berlin, pp 830–838
D’Ambrosio C, Liberti L, Poirion PL, Vu K (2019) Random projections for quadratic programming. Math Program B (in revision)
D’Ambrosio C, Liberti L, Poirion PL, Vu K (2019) Random projections for quadratic programming. Tech. Rep. 2019-7-7322, Optimization Online
Dantzig G (1983) Reminiscences about the origins of linear programming. In: Bachem A, Grötschel M, Korte B (eds) Mathematical programming: the state of the art. Springer, Berlin
Dasgupta S, Gupta A (2002) An elementary proof of a theorem by Johnson and Lindenstrauss. Random Struct Algorithms 22:60–65
D’Aspremont A, Bach F, Ghaoui LE (2014) Approximation bounds for sparse principal component analysis. Math Program B 148:89–110
Dattorro J (2015) Convex optimization and Euclidean distance geometry. \({\cal M}\epsilon \beta oo\), Palo Alto
Dauphin Y, Pascanu R, Gulcehre C, Cho K, Ganguli S, Bengio Y (2014) Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. Advances in neural information processing systems. NIPS. NIPS Foundation, La Jolla, pp 2933–2941
Demartines P, Hérault J (1997) Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets. IEEE Trans Neural Netw 8(1):148–154
Deo N, Prabhu G, Krishnamoorthy M (1982) Algorithms for generating fundamental cycles in a graph. ACM Trans Math Softw 8(1):26–42
Dey S, Mazumder R, Molinaro M, Wang G (2017) Sparse principal component analysis and its \(\ell _1\)-relaxation. Tech. Rep. arXiv:1712.00800v1
Dias G, Liberti L (2016) Diagonally dominant programming in distance geometry. In: Cerulli R, Fujishige S, Mahjoub R (eds) International symposium in combinatorial optimization, LNCS, vol 9849. Springer, New York, pp 225–236
Douven I (2017) Abduction. In: Zalta E (ed) The Stanford encyclopedia of philosophy. Stanford University, Stanford
Durrant R, Kabán A (2009) When is ‘nearest neighbour’ meaningful: a converse theorem and implications. J Complex 25:385–397
Eco U (1983) Horns, hooves, insteps. Some hypotheses on three kinds of abduction. In: Eco U, Sebeok T (eds) Dupin, Holmes. Peirce. The Sign of Three. Indiana University Press, Bloomington
Eco U (1984) Semiotics and the philosophy of language. Indiana University Press, Bloomington
Eren T, Goldenberg D, Whiteley W, Yang Y, Morse A, Anderson B, Belhumeur P (2004) Rigidity, computation, and randomization in network localization. IEEE, pp 2673–2684
Euler L (1862) Continuatio fragmentorum ex adversariis mathematicis depromptorum: II Geometria, 97. In: Fuss P, Fuss N (eds) Opera postuma mathematica et physica anno 1844 detecta, vol I. Eggers & C, Petropolis, pp 494–496
Fiedler M (1973) Algebraic connectivity of graphs. Czechoslov Math J 23(2):298–305
Flexer A, Schnitzer D (2015) Choosing \(\ell _p\) norms in high-dimensional spaces based on hub analysis. Neurocomputing 169:281–287
Floreano D (1996) Manuale sulle Reti Neurali Il. Mulino, Bologna
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174
François D, Wertz V, Verleysen M (2007) The concentration of fractional distances. IEEE Trans Knowl Data Eng 19(7):873–886
Friedler F, Huang Y, Fan L (1992) Combinatorial algorithms for process synthesis. Comput Chem Eng 16(1):313–320
Gayraud N (2017) Public remark. Le Monde des Mathématiques Industrielles at INRIA Sophia-Antipolis (MOMI17)
Gilbreth F, Gilbreth L (1921) Process charts: first steps in finding the one best way to do work. In: Proceedings of the annual meeting. American Society of Mechanical Engineers, New York
Gill P (2006) User’s guide for SNOPT version 7.2. Systems Optimization Laboratory, Stanford University, California
Gödel K (1986) On the isometric embeddability of quadruples of points of \(r_3\) in the surface of a sphere. In: Feferman S, Dawson J, Kleene S, Moore G, Solovay R, van Heijenoort J (eds) Kurt Gödel: collected works, vol I, pp (1933b) 276–279. Oxford University Press, Oxford
Gonçalves D, Mucherino A, Lavor C, Liberti L (2017) Recent advances on the interval distance geometry problem. J Glob Optim 69:525–545
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge
Haeffele B, Vidal R (2017) Global optimality in neural network training. In: Proceedings of the conference in computer vision and pattern recognition, CVPR. IEEE, Piscataway, pp 4390–4398
Hagberg A, Schult D, Swart P (2008) Exploring network structure, dynamics, and function using NetworkX. In: Varoquaux G, Vaught T, Millman J (eds) Proceedings of the 7th Python in science conference (SciPy2008), Pasadena, pp 11–15
Hansen P, Jaumard B (1997) Cluster analysis and mathematical programming. Math Program 79:191–215
Henneberg L (1911) Die Graphische Statik der starren Systeme. Teubner, Leipzig
Heron (50AD) Metrica, vol I. Alexandria
Hinneburg A, Aggarwal C, Keim D (2000) What is the nearest neighbor in high dimensional spaces? In: Proceedings of the conference on very large databases, VLDB, vol 26. Morgan Kaufman, San Francisco, pp. 506–515
Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(6):417–441
IBM (2017) ILOG CPLEX 12.8 User’s Manual. IBM
Indyk P (2001) Algorithmic applications of low-distortion geometric embeddings. Foundations of computer science. FOCS, vol 42. IEEE, Washington, DC, pp 10–33
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the symposium on the theory of computing, STOC, vol 30. ACM, New York, pp 604–613
Indyk P, Naor A (2007) Nearest neighbor preserving embeddings. ACM Trans Algorithms 3(3), Art. 31
Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Johnson W, Lindenstrauss J (1984) Extensions of Lipschitz mappings into a Hilbert space. In: Hedlund G (ed) Conference in modern analysis and probability, contemporary mathematics, vol 26. AMS, Providence, pp 189–206
Jolliffe I (2010) Principal component analysis, 2nd edn. Springer, Berlin
Jordan M (1995) Why the logistic function? A tutorial discussion on probabilities and neural networks. Tech. Rep. Computational Cognitive Science TR 9503, MIT
Kane D, Nelson J (2014) Sparser Johnson–Lindenstrauss transforms. J ACM 61(1):4
Kantor I, Matoušek J, Šámal R (2015) Mathematics++: selected topics beyond the basic courses. No. 75 in Student Mathematical Library. AMS, Providence
Khalife S, Liberti L, Vazirgiannis M (2019) Geometry and analogies: a study and propagation method for word representation. In: Statistical language and speech processing, SLSP, vol. 7
Kingma D, Ba J (2015)Adam: A method for stochastic optimization. In: Proceedings of ICLR. San Diego
Knuth D (1997) The art of computer programming, part I: fundamental algorithms, 3rd edn. Addison-Wesley, Reading
Kullback S, Leibler R (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Kuratowski C (1935) Quelques problèmes concernant les espaces métriques non-séparables. Fundam Math 25:534–545
Lavor C, Liberti L, Maculan N (2006) Computational experience with the molecular distance geometry problem. In: Pintér J (ed) Global optimization: scientific and engineering case studies. Springer, Berlin, pp 213–225
Lavor C, Liberti L, Maculan N, Mucherino A (2012) The discretizable molecular distance geometry problem. Comput Optim Appl 52:115–146
Lavor C, Liberti L, Mucherino A (2013) The interval Branch-and-Prune algorithm for the discretizable molecular distance geometry problem with inexact distances. J Glob Optim 56:855–871
Lavor C, Liberti L, Donald B, Worley B, Bardiaux B, Malliavin T, Nilges M (2019) Minimal NMR distance information for rigidity of protein graphs. Discrete Appl Math 256:91–104
Lavor C, Souza M, Carvalho L, Liberti L (2019) On the polynomiality of finding \({}^K\)DMDGP re-orders. Discrete Appl Math 267:190–194
Lehmann S, Hansen L (2007) Deterministic modularity optimization. Eur Phys J B 60:83–88
Levine R, Mason T, Brown D (1995) Lex and Yacc, 2nd edn. O’Reilly, Cambridge
Liberti L (2010) Software modelling and architecture: exercises. Ecole Polytechnique. https://www.lix.polytechnique.fr/~liberti/swarchex.pdf
Liberti L (2009) Reformulations in mathematical programming: definitions and systematics. RAIRO-RO 43(1):55–86
Liberti L (2019) Undecidability and hardness in mixed-integer nonlinear programming. RAIRO-Oper Res 53:81–109
Liberti L, Lavor C (2013) On a relationship between graph realizability and distance matrix completion. In: Migdalas A, Sifaleras A, Georgiadis C, Papathanaiou J, Stiakakis E (eds) Optimization theory, decision making, and operational research applications, proceedings in mathematics & statistics, vol 31. Springer, Berlin, pp 39–48
Liberti L, Lavor C (2016) Six mathematical gems in the history of distance geometry. Int Trans Oper Res 23:897–920
Liberti L, Lavor C (2017) Euclidean distance geometry: an introduction. Springer, New York
Liberti L, Marinelli F (2014) Mathematical programming: Turing completeness and applications to software analysis. J Comb Optim 28(1):82–104
Liberti L, Vu K (2018) Barvinok’s naive algorithm in distance geometry. Oper Res Lett 46:476–481
Liberti L, Lavor C, Maculan N (2008) A branch-and-prune algorithm for the molecular distance geometry problem. Int Trans Oper Res 15:1–17
Liberti L, Cafieri S, Tarissan F (2009) Reformulations in mathematical programming: a computational approach. In: Abraham A, Hassanien AE, Siarry P, Engelbrecht A (eds) Foundations of computational intelligence, vol 3. no 203 in Studies in Computational Intelligence. Springer, Berlin, pp 153–234
Liberti L, Cafieri S, Savourey D (2010) Reformulation optimization software engine. In: Fukuda K, van der Hoeven J, Joswig M, Takayama N (eds) Mathematical software, LNCS, vol 6327. Springer, New York, pp 303–314
Liberti L, Lavor C, Mucherino A, Maculan N (2010) Molecular distance geometry methods: from continuous to discrete. Int Trans Oper Res 18:33–51
Liberti L, Lavor C, Alencar J, Abud G (2013) Counting the number of solutions of \({}^k\)DMDGP instances. In: Nielsen F, Barbaresco F (eds) Geometric science of information, LNCS, vol 8085. Springer, New York, pp 224–230
Liberti L, Lavor C, Maculan N, Mucherino A (2014) Euclidean distance geometry and applications. SIAM Rev 56(1):3–69
Liberti L, Masson B, Lavor C, Lee J, Mucherino A (2014) On the number of realizations of certain Henneberg graphs arising in protein conformation. Discrete Appl Math 165:213–232
Liberti L, Swirszcz G, Lavor C (2016) Distance geometry on the sphere. In: Akiyama J et al (eds) JCDCG\({}^2\), LNCS, vol 9943. Springer, New York, pp 204–215
Liberti L, D’Ambrosio C (2017) The Isomap algorithm in distance geometry. In: Iliopoulos C, Pissis S, Puglisi S, Raman R (eds) Proceedings of 16th international symposium on experimental algorithms (SEA), LIPICS, vol 75. Dagstuhl Publishing, Schloss Dagstuhl, pp 5:1–5:13
Liberti L, Lavor C, Mucherino A (2013) The discretizable molecular distance geometry problem seems easier on proteins. In: Mucherino A, Lavor C, Liberti L, Maculan N (eds) Distance geometry: theory, methods and applications. Springer, New York, pp 47–60
Linial N, London E, Rabinovich Y (1995) The geometry of graphs and some of its algorithmic applications. Combinatorica 15(2):215–245
Majumdar A, Ahmadi A, Tedrake R (2014) Control and verification of high-dimensional systems with dsos and sdsos programming. Conference on decision and control, vol 53. Piscataway, IEEE, pp 394–401
Malliavin T, Mucherino A, Lavor C, Liberti L (2019) Systematic exploration of protein conformational space using a distance geometry approach. J Chem Inf Model 59:4486–4503
Manning C, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge
Mansouri J, Khademi M (2015) Multiplicative distance: a method to alleviate distance instability for high-dimensional data. Knowl Inf Syst 45:783–805
Matoušek J (2013) Lecture notes on metric embeddings. Tech. rep, ETH Zürich
Matoušek J (2008) On variants of the Johnson-Lindenstrauss lemma. Random Struct Algorithms 33:142–156
Maxwell J (1864) On the calculation of the equilibrium and stiffness of frames. Philos Mag 27(182):294–299
McCormick G (1976) Computability of global solutions to factorable nonconvex programs: Part I-Convex underestimating problems. Math Program 10:146–175
McCulloch W (1961) What is a number, that a man may know it, and a man, that he may know a number? Gen Semant Bull 26–27:7–18
Mencarelli L, Sahraoui Y, Liberti L (2017) A multiplicative weights update algorithm for MINLP. EURO J Comput Optim 5:31–86
Menger K (1928) Untersuchungen über allgemeine Metrik. Math Ann 100:75–163
Menger K (1931) New foundation of Euclidean geometry. Am J Math 53(4):721–745
Merris R (1994) Laplacian matrices of graphs: a survey. Linear Algebra Appl 198:143–176
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Burges C, Bottou L, Welling M, Ghahramani Z, Weinberger K (eds) Advances in neural information processing systems, NIPS, vol 26. NIPS Foundation, La Jolla, pp 3111–3119
Miller G (1995) Wordnet: a lexical database for English. Commun ACM 38(11):39–41
Milnor J (1964) On the Betti numbers of real varieties. Proc AMS 15:275–280
Minsky M (1986) The society of mind. Simon & Schuster, New York
Moitra A (2018) Algorithmic aspects of machine learning. CUP, Cambridge
Moro A (2008) The boundaries of Babel. MIT Press, Cambridge
Morris C (1946) Signs. Language and behavior. Prentice-Hall, New York
Mucherino A, Lavor C, Liberti L (2012) Exploiting symmetry properties of the discretizable molecular distance geometry problem. J Bioinform Comput Biol 10(1–15):1242009
Mucherino A, Lavor C, Liberti L, Maculan N (eds) (2013) Distance geometry: theory, methods, and applications. Springer, New York
Newman M, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113
Object Management Group (2005) Unified modelling language: superstructure, v. 2.0. Tech. Rep. formal/05-07-04, OMG
O’Donoghue B, Chu E, Parikh N, Boyd S (2016) Operator splitting for conic optimization via homogeneous self-dual embedding. J Optim Theory Appl 169(3):1042–1068
Paton K (1969) An algorithm for finding a fundamental set of cycles of a graph. Commun ACM 12(9):514–518
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Peirce C (1878) Illustrations of the logic of science, part 6: induction, deduction, and hypothesis. Popul Sci Mon 13:470–482
Penrose R (1989) The emperor’s new mind. Penguin, New York
Pfeffer A (2016) Practical probabilistic programming. Manning Publications, Shelter Island
Popper K (1968) The logic of scientific discovery. Hutchinson, London
Potra F, Wright S (2000) Interior-point methods. J Comput Appl Math 124:281–302
Proni G (2016) Is there abduction in Aristotle? Peirce, Eco, and some further remarks. Ocula 17:1–14
Radovanović M, Nanopoulos A, Ivanović M (2010) Hubs in space: Popular nearest neighbors in high-dimensional data. J Mach Learn Res 11:2487–2531
Rousseau F, Vazirgiannis M (2013) Graph-of-word and TW-IDF: new approach to ad hoc IR. In: Proceedings of CIKM. ACM, New York
Saerens M, Fouss F, Yen L, Dupont P (2004) The principal components analysis of a graph, and its relationships to spectral clustering. In: Boulicaut JF, Esposito F, Giannotti F, Pedreschi D (eds) Proceedings of the European conference in machine learning (ECML), LNAI, vol 3201. Springer, Berlin, pp 371–383
Salgado E, Scozzari A, Tardella F, Liberti L (2018) Alternating current optimal power flow with generator selection. In: Lee J, Rinaldi G, Mahjoub R (eds) Combinatorial optimization (Proceedings of ISCO 2018), LNCS, vol 10856, pp 364–375
Sánchez AB, Lavor C (2020) On the estimation of unknown distances for a class of Euclidean distance matrix completion problems with interval data. Linear Algebra Appl 592:287–305
Saxe J (1979) Embeddability of weighted graphs in \(k\)-space is strongly NP-hard. In: Proceedings of 17th Allerton conference in communications, control and computing, pp 480–489
Schaeffer S (2007) Graph clustering. Comput Sci Rev 1:27–64
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117. https://doi.org/10.1016/j.neunet.2014.09.003. arXiv:1404.7828 [cs.NE]
Schoenberg I (1935) Remarks to Maurice Fréchet’s article Sur la définition axiomatique d’une classe d’espaces distanciés vectoriellement applicable sur l’espace de Hilbert. Ann Math 36(3):724–732
Schumacher M, Roßner R, Vach W (1996) Neural networks and logistic regression: part I. Comput Stat Data Anal 21:661–682
Seshu S, Reed M (1961) Linear graphs and electrical networks. Addison-Wesley, Reading
Singer A (2011) Angular synchronization by eigenvectors and semidefinite programming. Appl Comput Harmon Anal 30:20–36
Smith E, Pantelides C (1999) A symbolic reformulation/spatial branch-and-bound algorithm for the global optimisation of nonconvex MINLPs. Comput Chem Eng 23:457–478
Steinhaus H (1956) Sur la division des corps matériels en parties. Bulletin de l’Académie Polonaise des Sciences Cl. III 4(12):801–804
Tabaghi P, Dokmanić I, Vetterli M (2019) On the move: localization with kinetic Euclidean distance matrices. In: International conference on acoustics, speech and signal processing (ICASSP). IEEE, Piscataway
Tawarmalani M, Sahinidis N (2004) Global optimization of mixed integer nonlinear programs: a theoretical and computational study. Math Program 99:563–591
Tenenbaum J, de Silva V, Langford J (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2322
Thoreau H (1849) Resistance to civil government. In: Peabody E (ed) Æsthetic papers. J. Wilson, Boston
van Rossum G et al (2019) Python language reference, version 3. Python Software Foundation
Vavasis S (1991) Nonlinear optimization: complexity issues. Oxford University Press, Oxford
Vempala S (2004) The Random projection method. No. 65 in DIMACS series in discrete mathematics and theoretical computer science. AMS, Providence
Venkatasubramanian S, Wang Q (2011) The Johnson–Lindenstrauss transform: an empirical study. Algorithm engineering and experiments. ALENEX, vol 13. SIAM, Providence, pp 164–173
Verboon A (2014) The medieval tree of Porphyry: an organic structure of logic. In: Worm A, Salonis P (eds) The Tree. Symbol, allegory and structural device in medieval art and thought, international medieval research, vol 20. Brepols, Turnhout, pp 83–101
Vershynin R (2018) High-dimensional probability. CUP, Cambridge
Vidal R, Ma Y, Sastry S (2016) Generalized principal component analysis. Springer, New York
Vu K, Poirion PL, Liberti L (2018) Random projections for linear programming. Math Oper Res 43(4):1051–1071
Vu K, Poirion PL, D’Ambrosio C, Liberti L (2019) Random projections for quadratic programs over a Euclidean ball. In: Lodi A et al (eds) Integer programming and combinatorial optimization (IPCO), LNCS, vol 11480. Springer, New York, pp 442–452
Vu K, Poirion PL, Liberti L (2019) Gaussian random projections for Euclidean membership problems. Discrete Appl Math 253:93–102
Wikipedia: Civil disobedience (thoreau) (2019). http://en.wikipedia.org/wiki/Civil_Disobedience_(Thoreau). [Online; accessed 190804]
Wikipedia: Computational pragmatics (2019). http://en.wikipedia.org/wiki/Computational_pragmatics. [Online; accessed 190802]
Wikipedia: Diagonally dominant matrix (2019). http://en.wikipedia.org/wiki/Diagonally_dominant_matrix. [Online; accessed 190716]
Wikipedia: Flowchart (2019). http://en.wikipedia.org/wiki/Flochart. [Online; accessed 190802]
Wikipedia: Principal component analysis (2019). http://en.wikipedia.org/wiki/Principal_component_analysis. [Online; accessed 190726]
Wikipedia: Rectifier (neurl networks) (2019). http://en.wikipedia.org/wiki/Rectifier_(neural_networks). [Online; accessed 190807]
Wikipedia: Slutsky’s theorem (2019). http://en.wikipedia.org/wiki/Slutsky%27s_theorem. [Online; accessed 190802]
Williams H (1999) Model building in mathematical programming, 4th edn. Wiley, Chichester
Woodruff D (2014) Sketching as a tool for linear algebra. Found Trends Theor Comput Sci 10(1–2):1–157
Wüthrich K (1989) Protein structure determination in solution by nuclear magnetic resonance spectroscopy. Science 243:45–50
Xu G, Tsoka S, Papageorgiou L (2007) Finding community structures in complex networks using mixed integer optimisation. Eur Phys J B 60:231–239
Yemini Y (1978) The positioning problem—a draft of an intermediate summary. In: Proceedings of the conference on distributed sensor networks. Carnegie-Mellon University, Pittsburgh, pp 137–145
Yemini Y (1979) Some theoretical aspects of position-location problems. In: Proceedings of the 20th annual symposium on the foundations of computer science, pp. 1–8. IEEE, Piscataway
Yun C, Sra S, Jadbabaie A (2018) Global optimality conditions for deep neural networks. In: Proceedings of the 6th international conference on learning representations. ICLR, La Jolla, CA
Zhang L, Mahdavi M, Jin R, Yang T, Zhu S (2013) Recovering the optimal solution by dual random projection. In: Shalev-Shwartz S, Steinwart I (eds) Conference on learning theory (COLT), Proceedings of machine learning research, vol 30, pp 135–157. \(\langle\) http://mlr.org \(\rangle\)
Acknowledgements
I am grateful to J.J. Salazar, the Editor-in-Chief of TOP, for inviting me to write this survey. This work would not have been possible without the numerous co-authors with whom I pursued my investigations in distance geometry, among which I will single out the longest-standing: C. Lavor, N. Maculan, and A. Mucherino. I have first heard of concentration of measure as I passed by D. Malioutov’s office at the T.J. Watson IBM Research laboratory: the door was open, the Johnson-Lindenstrauss lemma was mentioned, and I could not refrain from interrupting the conversation and asking for clarification, as I thought that there must surely be a mistake; incredibly, the result was true, and I am grateful to Dr. Malioutov for hosting the conversation I eavesdropped on. I am very thankful to the co-authors who helped me investigate random projections, in particular P.L. Poirion and K. Vu, without whom none of our papers would have been possible. I learned about the existence of the distance instability result thanks to N. Gayraud, who was in the audience during a talk I gave, and suggested it to me as I expressed puzzlement at the poor quality of k-means clusterings. João Fontes Gonçalves, a student in my M.Sc. course, first made the remark following Eq. (6.1.3) (“why are you optimizing a constant?”). I am very grateful to S. Khalife, D. Gonçalves, and M. Escobar for reading the manuscript and making insightful comments. This research was partly funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement n. 764759 ETN “MINOA”.
Author information
Authors and Affiliations
Corresponding author
Additional information
Dedicated to the memory of Mariano Bellasio (1943–2019).
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This invited paper is discussed in the comments available at https://doi.org/10.1007/s11750-020-00560-3, https://doi.org/10.1007/s11750-020-00561-2, https://doi.org/10.1007/s11750-020-00562-1.
Rights and permissions
About this article
Cite this article
Liberti, L. Distance geometry and data science. TOP 28, 271–339 (2020). https://doi.org/10.1007/s11750-020-00563-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11750-020-00563-0
Keywords
- Euclidean distance
- Isometric embedding
- Random projection
- Mathematical programming
- Machine learning
- Artificial neural networks