Ir al contenido

Documat


Bolstering stochastic gradient descent with model building

  • Ş. İlker Birbil [1] ; Özgür Martin [2] ; Gönenç Onay [3] ; Figen Öztoprak [4]
    1. [1] University of Amsterdam

      University of Amsterdam

      Países Bajos

    2. [2] Mimar Sinan Güzel Sanatlar Üniversitesi

      Mimar Sinan Güzel Sanatlar Üniversitesi

      Turquía

    3. [3] Galatasaray University

      Galatasaray University

      Turquía

    4. [4] Gebze Technical University

      Gebze Technical University

      Turquía

  • Localización: Top, ISSN-e 1863-8279, ISSN 1134-5764, Vol. 32, Nº. Extra 3, 2024 (Ejemplar dedicado a: Mathematical Optimization and Machine Learning), págs. 517-536
  • Idioma: inglés
  • DOI: 10.1007/s11750-024-00673-z
  • Enlaces
  • Resumen
    • Stochastic gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates for solving machine learning problems. These rates are obtained especially when these algorithms are fne-tuned for the application at hand. Although this tuning process can require large computational costs, recent work has shown that these costs can be reduced by line search methods that iteratively adjust the step length. We propose an alternative approach to stochastic line search by using a new algorithm based on forward step model building. This model building step incorporates second-order information that allows adjusting not only the step length but also the search direction. Noting that deep learning model parameters come in groups (layers of tensors), our method builds its model and calculates a new step for each parameter group. This novel diagonalization approach makes the selected step lengths adaptive. We provide convergence rate analysis, and experimentally show that the proposed algorithm achieves faster convergence and better generalization in well-known test problems. More precisely, SMB requires less tuning, and shows comparable performance to other adaptive methods.

  • Referencias bibliográficas
    • Asi H, Duchi JC (2019) The importance of better models in stochastic optimization. Proc Natl Acad Sci 116(46):22924–22930
    • Balles L, Romero J, Hennig P (2017) Coupling adaptive batch sizes with learning rates. In: Elidan G, Kersting K, Ihler A (eds) Proceedings...
    • Bollapragada R, Byrd R, Nocedal J (2018) Adaptive sampling strategies for stochastic optimization. SIAM J Optim 28(4):3312–3343
    • Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev 60(2):223–311
    • Byrd RH, Chin GM, Nocedal J, Wu Y (2012) Sample size selection in optimization methods for machine learning. Math Program 134(1):127–155
    • Byrd RH, Hansen SL, Nocedal J, Singer Y (2016) A stochastic quasi-Newton method for large-scale optimization. SIAM J Optim 26(2):1008–1031
    • Chen Y-L, Na S, Kolar M (2023) Convergence analysis of accelerated stochastic gradient descent under the growth condition. Math Oper Res....
    • Defazio A, Bach F, Lacoste-Julien S (2014) SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives....
    • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition...
    • Huang G, Liu Z, Maaten LVD, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and...
    • Khaled A, Richtárik P (2020) Better theory for SGD in the nonconvex world. arXiv:2002.03329
    • Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations,...
    • Liuzzi G, Palagi L, Seccia R (2022) Convergence under Lipschitz smoothness of ease-controlled random reshuffling gradient algorithms. arXiv:2212.01848
    • Mahsereci M, Hennig P (2017) Probabilistic line searches for stochastic optimization. J Mach Learn Res 18(1):4262–4320
    • Malinovsky G, Mishchenko K, Richtárik P (2022) Server-side stepsizes and sampling without replacement provably help in federated optimization....
    • Mokhtari A, Ribeiro A (2014) RES: Regularized stochastic BFGS algorithm. IEEE Trans Signal Process 62(23):6089–6104
    • Mutschler M, Zell A (2020) Parabolic approximation line search for DNNs. In: Larochelle H, Ranzato M, Hadsell R, Balcan M, Lin H (eds) Advances...
    • Öztoprak F, Birbil Şİ (2018) An alternative globalization strategy for unconstrained optimization. Optimization 67(3):377–392
    • Paquette C, Scheinberg K (2020) A stochastic line search method with expected complexity analysis. SIAM J Optim 30(1):349–376
    • Roux NL, Schmidt M, Bach F (2012) A stochastic gradient method with an exponential convergence rate for finite training sets. In: Proceedings...
    • Schraudolph NN, Yu J, Günter S (2007) A stochastic quasi-Newton method for online convex optimization. In Meila M, Shen X (eds) Proceedings...
    • Tadić V (1997) Stochastic gradient algorithm with random truncations. Eur J Oper Res 101(2):261–284
    • Vaswani S, Mishkin A, Laradji I, Schmidt M, Gidel G, Lacoste-Julien S (2019) Painless stochastic gradient: interpolation, line-search, and...
    • Wang X, Ma S, Goldfarb D, Liu W (2017) Stochastic quasi-Newton methods for nonconvex stochastic optimization. SIAM J Optim 27(2):927–956

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno