Balancing Efficiency and Performance in NLP: a Cross-Comparison of Shallow Machine Learning and Large Language Models via AutoML

Ernesto Luis Estevanell Valladares; Yoan Gutiérrez Vázquez; Andrés Montoyo Guijarro; Rafael Muñoz Guillena; Yudivián Almeida Cruz

Ayuda

Balancing Efficiency and Performance in NLP: a Cross-Comparison of Shallow Machine Learning and Large Language Models via AutoML

Autores: Ernesto Luis Estevanell Valladares, Yoan Gutiérrez Vázquez , Andrés Montoyo Guijarro , Rafael Muñoz Guillena , Yudivián Almeida Cruz
Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 73, 2024, págs. 221-233
Idioma: inglés
Títulos paralelos:
- Equilibrando eficiencia y rendimiento en PLN: comparación cruzada de Machine Learning Tradicional y Grandes Modelos de Lenguaje mediante AutoML
Enlaces
- Texto completo

Dialnet Métricas: 1 Cita

Resumen
- español
  Este estudio analiza críticamente la eficiencia de recursos y el rendimiento de los métodos de Aprendizaje Automático Superficial (SML) frente a los Grandes Modelos de Lenguaje (LLM) en tareas de clasificación de texto explorando el equilibrio entre precisión y sostenibilidad medioambiental. Se introduce una novedosa estrategia de optimización que prioriza la eficiencia computacional y el impacto ecológico junto con las métricas de rendimiento tradicionales aprovechando el Aprendizaje Automático de Maquinas (AutoML). El análisis revela que, si bien los pipelines desarrollados no superan a los modelos SOTA más avanzados en cuanto a rendimiento bruto, reducen significativamente la huella de carbono. Se descubrieron pipelines óptimos de SML con un rendimiento competitivo y hasta 70 veces menos emisiones de carbono que pipelines híbridos o totalmente LLM, como las variantes estándar de BERT y DistilBERT. Del mismo modo, obtenemos pipelines híbridos (que incorporan SML y LLM) con entre un 20% y un 50% menos de emisiones de carbono en comparación con las alternativas fine-tuneadas y sólo una disminución marginal del rendimiento. Esta investigación pone en cuestión la dependencia predominante de los LLM de alta carga computacional para tareas de PLN y subraya el potencial sin explotar de AutoML para esculpir la próxima oleada de modelos de IA con conciencia medioambiental.
- English
  This study critically examines the resource efficiency and performance of Shallow Machine Learning (SML) methods versus Large Language Models (LLMs) in text classification tasks by exploring the balance between accuracy and environmental sustainability. We introduce a novel optimization strategy that prioritizes computational efficiency and ecological impact alongside traditional performance metrics leveraging Automated Machine Learning (AutoML). Our analysis reveals that while the pipelines we developed did not surpass state-of-the-art (SOTA) models regarding raw performance, they offer a significantly reduced carbon footprint. We discovered SML optimal pipelines with competitive performance and up to 70 times less carbon emissions than hybrid or fully LLM pipelines, such as standard BERT and DistilBERT variants. Similarly, we obtain hybrid pipelines (using SML and LLMs) with between 20% and 50% reduced carbon emissions compared to fine-tuned alternatives and only a marginal decrease in performance. This research challenges the prevailing reliance on computationally intensive LLMs for NLP tasks and underscores the untapped potential of AutoML in sculpting the next wave of environmentally conscious AI models.
Referencias bibliográficas
- Anthony, L. F. W., B. Kanding, and R. Selvan. 2020. Carbontracker: Tracking and predicting the carbon footprint of training deep learning...
- Bannour, N., S. Ghannay, A. N´ev´eol, and A.-L. Ligozat. 2021. Evaluating the carbon footprint of nlp methods: a survey and analysis of existing...
- Brown, T., B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss,...
- Chollet, F. 2018. Keras: The python deep learning library.
- Chung, H. W., L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, E. Li, X. Wang, M. Dehghani, S. Brahma, A. Webson, S. S. Gu, Z. Dai, M. Suzgun,...
- Clark, K., M.-T. Luong, Q. V. Le, and C. D. Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators. arXiv...
- Conneau, A., K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov. 2020. Unsupervised...
- Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding....
- Dodge, J., T. Prewitt, R. Tachet des Combes, E. Odmark, R. Schwartz, E. Strubell, A. S. Luccioni, N. A. Smith, N. DeCario, and W. Buchanan....
- Estévez-Velarde, S., Y. Gutiérrez, Y. Almeida-Cruz, and A. Montoyo. 2021. General-purpose hierarchical optimisation of machine learning pipelines...
- Estevez-Velarde, S., Y. Gutiérrez, A. Montoyo, and Y. Almeida-Cruz. 2019. Automl strategy based on grammatical evolution: A case study about...
- Estevez-Velarde, S., Y. Gutiérrez, A. Montoyo, and Y. A. Cruz. 2020. Automatic discovery of heterogeneous machine learning pipelines: An application...
- Faiz, A., S. Kaneda, R. Wang, R. Osi, P. Sharma, F. Chen, and L. Jiang. 2023. Llmcarbon: Modeling the end-to-end carbon footprint of large...
- Fedus, W., B. Zoph, and N. Shazeer. 2022. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal...
- Feurer, M., K. Eggensperger, S. Falkner, M. Lindauer, and F. Hutter. 2020. Autosklearn 2.0: The next generation. arXiv: Learning.
- Floridi, L. and M. Chiriatti. 2020. Gpt-3: Its nature, scope, limits, and consequences. Minds and Machines, 30:681–694.
- González-Carvajal, S. and E. C. Garrido-Merchán. 2020. Comparing bert against traditional machine learning text classification. arXiv preprint...
- Gu, Y., L. Dong, F. Wei, and M. Huang. 2023. Minillm: Knowledge distillation of large language models. In The Twelfth International Conference...
- He, P., J. Gao, and W. Chen. 2023. Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing.
- He, P., X. Liu, J. Gao, and W. Chen. 2021. Deberta: Decoding-enhanced bert with disentangled attention. In International Conference on Learning...
- Holmes, G., A. Donkin, and I. H. Witten. 1994. Weka: a machine learning workbench. pages 357–361. IEEE.
- Honnibal, M., I. Montani, S. Van Landeghem, and A. Boyd. 2020. spacy: Industrial-strength natural language processing in python.
- Hutter, F., L. Kotthoff, and J. Vanschoren. 2019. Automated Machine Learning. Springer.
- Jin, H., Q. Song, and X. Hu. 2019. Autokeras: An efficient neural architecture search system. pages 1946–1956. ACM.
- Kaplan, J., S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei. 2020. Scaling laws for...
- Kotthoff, L., C. Thornton, H. H. Hoos, F. Hutter, and K. Leyton-Brown. 2019. Autoweka: Automatic model selection and hyperparameter optimization...
- Kowsari, K., K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown. 2019. Text classification algorithms: A survey. Information,...
- Lan, Z., M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut. 2019. ALBERT: A lite BERT for self-supervised learning of language representations....
- LeDell, E. and S. Poirier. 2020. H2o automl: Scalable automatic machine learning. In Proceedings of the AutoML Workshop at ICML, volume 2020.
- Lepikhin, D., H. Lee, Y. Xu, D. Chen, O. Firat, Y. Huang, M. Krikun, N. Shazeer, and Z. Chen. 2020. Gshard: Scaling giant models with conditional...
- Lin, Y., Y. Meng, X. Sun, Q. Han, K. Kuang, J. Li, and F. Wu. 2021. Bertgcn: Transductive text classification by combining gcn and bert. arXiv...
- Liu, Y., M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov. 2019. Roberta: A robustly optimized...
- Loper, E. and S. Bird. 2002. Nltk: the natural language toolkit. arXiv preprint cs/0205028.
- Maas, A. L., R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. 2011. Learning word vectors for sentiment analysis. In D. Lin, Y. Matsumoto,...
- Mohr, F., M. D. Wever, and E. Hüllermeier. 2018. Ml-plan: Automated machine learning via hierarchical planning. Machine Learning, 107(8):1495–1515.
- Ng, S. Y., K. M. Lim, C. P. Lee, and J. Y. Lim. 2023. Sentiment analysis using distilbert. In 2023 IEEE 11th Conference on Systems, Process...
- Nori, H., N. King, S. M. McKinney, D. Carignan, and E. Horvitz. 2023. Capabilities of gpt-4 on medical challenge problems. arXiv preprint...
- Olson, R. S. and J. H. Moore. 2019. Tpot: A tree-based pipeline optimization tool for automating machine learning. In Automated Machine Learning....
- OpenAI. 2023. Gpt-4 technical report. Technical report. arXiv:2303.08774.
- Öztürk, E., F. Ferreira, H. Jomaa, L. Schmidt-Thieme, J. Grabocka, and F. Hutter. 2022. Zero-shot automl with pretrained models. In International...
- Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas,...
- Pipalia, K., R. Bhadja, and M. Shukla. 2020. Comparative analysis of different transformer based architectures used in sentiment analysis....
- Qiu, X., T. Sun, Y. Xu, Y. Shao, N. Dai, and X. Huang. 2020. Pre-trained models for natural language processing: A survey. Science China Technological...
- Radford, A., J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. 2019. Language models are unsupervised multitask learners.
- Raffel, C., N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu. 2020. Exploring the limits of transfer learning...
- Rehurek, R. and P. Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New...
- Rodola, G. 2020. Psutil documentation.
- Romano, J. D., T. T. Le, W. Fu, and J. H. Moore. 2021. Tpot-nn: augmenting tree-based automated machine learning with neural network estimators....
- Sanh, V., L. Debut, J. Chaumond, and T. Wolf. 2020. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter.
- Schmidt, V., K. Goyal, A. Joshi, B. Feld, L. Conell, N. Laskaris, D. Blank, J. Wilson, S. Friedler, and S. Luccioni. 2021. Codecarbon: estimate...
- Schwartz, R., J. Dodge, N. A. Smith, and O. Etzioni. 2019. Green ai.
- Sun, C., X. Qiu, Y. Xu, and X. Huang. 2019. How to fine-tune bert for text classification? In Chinese computational linguistics: 18th China...
- Sun, X., X. Li, J. Li, F. Wu, S. Guo, T. Zhang, and G. Wang. 2023. Text classification via large language models. arXiv preprint arXiv:2305.08377.
- Thompson, N. C., K. Greenewald, K. Lee, and G. F. Manso. 2021. Deep learning’s diminishing returns: The cost of improvement is becoming unsustainable....
- Thornton, C., F. Hutter, H. H. Hoos, and K. Leyton-Brown. 2013. Auto-weka: combined selection and hyperparameter optimization of classification...
- Wang, X., C. Na, E. Strubell, S. Friedler, and S. Luccioni. 2023. Energy and carbon considerations of fine-tuning bert. arXiv preprint arXiv:2311.10267.
- Yang, Z., Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le. 2019. Xlnet: Generalized autoregressive pretraining for language...
- Yun-tao, Z., G. Ling, and W. Yong-cheng. 2005. An improved tf-idf approach for text classification. Journal of Zhejiang University-Science...
- Zaheer, M., G. Guruganesh, K. A. Dubey, J. Ainslie, C. Alberti, S. Ontanon, P. Pham, A. Ravula, Q. Wang, L. Yang, et al. 2020. Big bird: Transformers...
- Zhang, X., J. Zhao, and Y. LeCun. 2015. Character-level convolutional networks for text classification. Advances in neural information processing...