Main Article Content
Abstract
Transfer learning has become a key technique for improving the accuracy of neural networks in low-resource, low-data environments. The quantitative comparative analysis of the pre-trained models includes ResNet50, VGG16, BERT, GPT, and the baseline CNN and LSTM models. They are compared across three different application areas: computer vision, natural language processing (NLP), and medical imaging. The five benchmark datasets used were ImageNet, CIFAR-10, SST-2, IMDB, and Chest X-Ray. All experiments used the same preprocessing pipeline and evaluation metrics (accuracy, F1 score, precision, recall, and ROC-AUC). Results showed that models trained on the pre-trained data achieved consistently greater accuracy than the baselines in all domains (9-20%) and F1-score (0.09-0.16) gains. ResNet50 achieved 92% accuracy on CIFAR-10, compared to 72% for the CNN baseline, whereas BERT hit 92% on SST-2, with 80% accuracy for LSTM. VGG16 improved the accuracy of Chest X-Ray classification from 78% to 87% and reduced training time by up to 60%. There were a few instances of minor overfitting and domain mismatch, emphasizing the need for adaptive fine-tuning strategies. The results demonstrate that transfer learning significantly improves convergence speed, generalization, and computational efficiency, making it a promising approach for AI applications across domains such as healthcare, NLP, and autonomous systems.
Keywords
Article Details
References
- Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., and Hesse, C. “Language Models Are Few-Shot Learners,” arXiv, vol. 4, no. 33, 2020. DOI: https://doi.org/10.48550/arXiv.2005.14165
- Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., and Joulin, A. “Unsupervised Learning of Visual Features by Contrasting Cluster Assignments,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2006.09882
- Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. “A Simple Framework for Contrastive Learning of Visual Representations,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.2002.05709
- Raghu, M., Zhang, C., Kleinberg, J., and Bengio, S. “Transfusion: Understanding Transfer Learning,” NeurIPS, 2019. DOI: https://doi.org/10.48550/arXiv.1902.07208
- Xu, M., Wu, M., Chen, K., Zhang, C., and Guo, J. “Unsupervised Domain Adaptation in Remote Sensing,” Remote Sens., 2022. DOI: https://doi.org/10.3390/rs14174380
- Zhang, Y., and Yang, Q. “A Survey on Multi-Task Learning,” IEEE Trans. Knowl. Data Eng., 2021. DOI: https://doi.org/10.1109/TKDE.2021.3070203
- Yu, F., Xiu, X., and Li, Y. “Deep Transfer Learning Survey,” Mathematics, 2022. DOI: https://doi.org/10.3390/math10040564
- Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv, 2018. DOI: https://doi.org/10.18653/v1/N19-1423
- OpenAI, “GPT-4 Technical Report,” arXiv, 2023. DOI: https://doi.org/10.48550/arXiv.2303.08774
- Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N. “An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.2010.11929
- Kolesnikov, A., Beyer, L., Zhai, X., Puigcerver, J., Yung, J., Gelly, S., and Houlsby, N. “Big Transfer (BiT): General Visual Representation Learning,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.1912.11370
- Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N. “An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.2010.11929
- Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. “LoRA: Low-Rank Adaptation of Large Language Models,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2106.09685
- Liang, J., Hu, D., and Feng, J. “Source Hypothesis Transfer for Unsupervised Domain Adaptation,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.2002.08546
- Redko, I., Morvant, E., Habrard, A., Sebban, M., and Bennani, Y. “A Survey on Domain Adaptation Theory,” arXiv, 2022. DOI: https://doi.org/10.48550/arXiv.2004.11829
- Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., and Zhou, M. “MiniLM,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.2002.10957
- Koh, P. W., Sagawa, S., Marklund, H., Xie, S. M., Zhang, M., Balsubramani, A., Hu, W., Yasunaga, M., Phillips, R. L., Gao, I., Lee, T., David, E., Stavness, I., Guo, W., Earnshaw, B. A., Haque, I. S., Beery, S., Leskovec, J., Kundaje, A., and Pierson, E. “WILDS: A Benchmark of In-the-Wild Distribution Shifts,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2012.07421
- Zhou, K., Liu, Z., Qiao, Y., Xiang, T., and Loy, C. C. “Domain Generalization: A Survey,” IEEE TPAMI, 2022. DOI: https://doi.org/10.1109/TPAMI.2022.3195549
- Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., Xiong, H., and He, Q. “A Comprehensive Survey on Transfer Learning,” Proc. IEEE, 2021. DOI: https://doi.org/10.1109/JPROC.2020.3004555
- Tan, M., and Le, Q. V. “EfficientNet,” arXiv, 2019. DOI: https://doi.org/10.48550/arXiv.1905.11946
- He, K., Chen, X., Xie, S., Li, Y., Dollár, P., and Girshick, R. “Masked Autoencoders Are Scalable Vision Learners,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2111.06377
- Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., and Neubig, G. “Pre-train, Prompt, and Predict: A Systematic Survey,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2107.13586
- Touvron, H., Bojanowski, P., Caron, M., Cord, M., El-Nouby, A., Grave, E., Izacard, G., Joulin, A., Synnaeve, G., Verbeek, J., and Jégou, H. “ResMLP,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2105.03404
- Touvron, H., Cord, M., and Jégou, H. “DeiT III,” arXiv, 2022. DOI: https://doi.org/10.48550/arXiv.2204.07118
- Chen, X., Fan, H., Girshick, R., and He, K. “Improved Baselines with Momentum Contrastive Learning,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.2003.04297
- Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P. H., Buchatskaya, E., Doersch, C., Pires, B. A., Guo, Z. D., Azar, M. G., Piot, B., Kavukcuoglu, K., Munos, R., and Valko, M. “Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.2006.07733
- Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., and Sutskever, I. “Learning Transferable Visual Models,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2103.00020
- Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” arXiv, 2019. DOI: https://doi.org/10.48550/arXiv.1907.11692
- Sanh, V., Debut, L., Chaumond, J., and Wolf, T. “DistilBERT,” arXiv, 2019. DOI: https://doi.org/10.48550/arXiv.1910.01108
- Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. “ALBERT: A Lite BERT for Self-supervised Learning of Language Representations,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.1909.11942
- Clark, K., Luong, M.-T., Le, Q. V., and Manning, C. D. “ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.2003.10555
- Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. “Exploring the Limits of Transfer Learning,” arXiv, 2019. DOI: https://doi.org/10.48550/arXiv.1910.10683
- Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., and Le, Q. “XLNet,” arXiv, 2019. DOI: https://doi.org/10.48550/arXiv.1906.08237
- Pfeiffer, J., Kamath, A., Rücklé, A., Cho, K., and Gurevych, I. “AdapterFusion: Non-Destructive Task Composition,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2005.00247
- Ben Zaken, E., Ravfogel, S., and Goldberg, Y. “BitFit: Simple Parameter-Efficient Fine-tuning for Transformer-based Masked Language Models,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2106.10199
- Li, X., and Liang, P. “Prefix-Tuning: Optimizing Continuous Prompts for Generation,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2101.00190
- Liang, J., He, R., and Tan, T. “A Comprehensive Survey on Test-Time Adaptation Under Distribution Shifts,” Int. J. Comput. Vision, 2024. DOI: https://doi.org/10.1007/s11263-024-02004-w
- Lester, B., Al-Rfou, R., and Constant, N. “The Power of Scale for Parameter-Efficient Prompt Tuning,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2104.08691
- Wang, D., Shelhamer, E., Liu, S., Olshausen, B., and Darrell, T. “Tent: Test-Time Adaptation,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.2006.10726
- Wang, Q., Fink, O., Van Gool, L., and Dai, D. “Continual Test-Time Domain Adaptation,” arXiv, 2022. DOI: https://doi.org/10.1109/CVPR52688.2022.01344
- Zhai, X., Wang, X., Mustafa, B., Steiner, A., Keysers, D., Kolesnikov, A., and Beyer, L. “LiT: Zero-Shot Transfer,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2111.07991
- Rosenfeld, J. S., Rosenfeld, A., Belinkov, Y., and Shavit, N. “Prediction of Generalization Error Across Scales,” arXiv, 2019. DOI: https://doi.org/10.48550/arXiv.1909.12673
- Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. “A Simple Framework for Contrastive Learning of Visual Representations,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.2002.05709
- Wang, Z., Luo, Y., Zheng, L., Chen, Z., Wang, S., and Huang, Z. “Online Test-Time Adaptation Survey,” Int. J. Comput. Vision, 2024. DOI: https://doi.org/10.1007/s11263-024-02003-x
- Xie, Q., Luong, M.-T., Hovy, E., and Le, Q. V. “Noisy Student Training,” Proc. CVPR, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.01346
- Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., and Bowman, S. R. “GLUE Benchmark,” arXiv, 2019. DOI: https://doi.org/10.48550/arXiv.1804.07461
References
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., and Hesse, C. “Language Models Are Few-Shot Learners,” arXiv, vol. 4, no. 33, 2020. DOI: https://doi.org/10.48550/arXiv.2005.14165
Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., and Joulin, A. “Unsupervised Learning of Visual Features by Contrasting Cluster Assignments,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2006.09882
Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. “A Simple Framework for Contrastive Learning of Visual Representations,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.2002.05709
Raghu, M., Zhang, C., Kleinberg, J., and Bengio, S. “Transfusion: Understanding Transfer Learning,” NeurIPS, 2019. DOI: https://doi.org/10.48550/arXiv.1902.07208
Xu, M., Wu, M., Chen, K., Zhang, C., and Guo, J. “Unsupervised Domain Adaptation in Remote Sensing,” Remote Sens., 2022. DOI: https://doi.org/10.3390/rs14174380
Zhang, Y., and Yang, Q. “A Survey on Multi-Task Learning,” IEEE Trans. Knowl. Data Eng., 2021. DOI: https://doi.org/10.1109/TKDE.2021.3070203
Yu, F., Xiu, X., and Li, Y. “Deep Transfer Learning Survey,” Mathematics, 2022. DOI: https://doi.org/10.3390/math10040564
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv, 2018. DOI: https://doi.org/10.18653/v1/N19-1423
OpenAI, “GPT-4 Technical Report,” arXiv, 2023. DOI: https://doi.org/10.48550/arXiv.2303.08774
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N. “An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.2010.11929
Kolesnikov, A., Beyer, L., Zhai, X., Puigcerver, J., Yung, J., Gelly, S., and Houlsby, N. “Big Transfer (BiT): General Visual Representation Learning,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.1912.11370
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N. “An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.2010.11929
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. “LoRA: Low-Rank Adaptation of Large Language Models,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2106.09685
Liang, J., Hu, D., and Feng, J. “Source Hypothesis Transfer for Unsupervised Domain Adaptation,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.2002.08546
Redko, I., Morvant, E., Habrard, A., Sebban, M., and Bennani, Y. “A Survey on Domain Adaptation Theory,” arXiv, 2022. DOI: https://doi.org/10.48550/arXiv.2004.11829
Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., and Zhou, M. “MiniLM,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.2002.10957
Koh, P. W., Sagawa, S., Marklund, H., Xie, S. M., Zhang, M., Balsubramani, A., Hu, W., Yasunaga, M., Phillips, R. L., Gao, I., Lee, T., David, E., Stavness, I., Guo, W., Earnshaw, B. A., Haque, I. S., Beery, S., Leskovec, J., Kundaje, A., and Pierson, E. “WILDS: A Benchmark of In-the-Wild Distribution Shifts,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2012.07421
Zhou, K., Liu, Z., Qiao, Y., Xiang, T., and Loy, C. C. “Domain Generalization: A Survey,” IEEE TPAMI, 2022. DOI: https://doi.org/10.1109/TPAMI.2022.3195549
Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., Xiong, H., and He, Q. “A Comprehensive Survey on Transfer Learning,” Proc. IEEE, 2021. DOI: https://doi.org/10.1109/JPROC.2020.3004555
Tan, M., and Le, Q. V. “EfficientNet,” arXiv, 2019. DOI: https://doi.org/10.48550/arXiv.1905.11946
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., and Girshick, R. “Masked Autoencoders Are Scalable Vision Learners,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2111.06377
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., and Neubig, G. “Pre-train, Prompt, and Predict: A Systematic Survey,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2107.13586
Touvron, H., Bojanowski, P., Caron, M., Cord, M., El-Nouby, A., Grave, E., Izacard, G., Joulin, A., Synnaeve, G., Verbeek, J., and Jégou, H. “ResMLP,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2105.03404
Touvron, H., Cord, M., and Jégou, H. “DeiT III,” arXiv, 2022. DOI: https://doi.org/10.48550/arXiv.2204.07118
Chen, X., Fan, H., Girshick, R., and He, K. “Improved Baselines with Momentum Contrastive Learning,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.2003.04297
Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P. H., Buchatskaya, E., Doersch, C., Pires, B. A., Guo, Z. D., Azar, M. G., Piot, B., Kavukcuoglu, K., Munos, R., and Valko, M. “Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.2006.07733
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., and Sutskever, I. “Learning Transferable Visual Models,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2103.00020
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” arXiv, 2019. DOI: https://doi.org/10.48550/arXiv.1907.11692
Sanh, V., Debut, L., Chaumond, J., and Wolf, T. “DistilBERT,” arXiv, 2019. DOI: https://doi.org/10.48550/arXiv.1910.01108
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. “ALBERT: A Lite BERT for Self-supervised Learning of Language Representations,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.1909.11942
Clark, K., Luong, M.-T., Le, Q. V., and Manning, C. D. “ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.2003.10555
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. “Exploring the Limits of Transfer Learning,” arXiv, 2019. DOI: https://doi.org/10.48550/arXiv.1910.10683
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., and Le, Q. “XLNet,” arXiv, 2019. DOI: https://doi.org/10.48550/arXiv.1906.08237
Pfeiffer, J., Kamath, A., Rücklé, A., Cho, K., and Gurevych, I. “AdapterFusion: Non-Destructive Task Composition,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2005.00247
Ben Zaken, E., Ravfogel, S., and Goldberg, Y. “BitFit: Simple Parameter-Efficient Fine-tuning for Transformer-based Masked Language Models,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2106.10199
Li, X., and Liang, P. “Prefix-Tuning: Optimizing Continuous Prompts for Generation,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2101.00190
Liang, J., He, R., and Tan, T. “A Comprehensive Survey on Test-Time Adaptation Under Distribution Shifts,” Int. J. Comput. Vision, 2024. DOI: https://doi.org/10.1007/s11263-024-02004-w
Lester, B., Al-Rfou, R., and Constant, N. “The Power of Scale for Parameter-Efficient Prompt Tuning,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2104.08691
Wang, D., Shelhamer, E., Liu, S., Olshausen, B., and Darrell, T. “Tent: Test-Time Adaptation,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.2006.10726
Wang, Q., Fink, O., Van Gool, L., and Dai, D. “Continual Test-Time Domain Adaptation,” arXiv, 2022. DOI: https://doi.org/10.1109/CVPR52688.2022.01344
Zhai, X., Wang, X., Mustafa, B., Steiner, A., Keysers, D., Kolesnikov, A., and Beyer, L. “LiT: Zero-Shot Transfer,” arXiv, 2021. DOI: https://doi.org/10.48550/arXiv.2111.07991
Rosenfeld, J. S., Rosenfeld, A., Belinkov, Y., and Shavit, N. “Prediction of Generalization Error Across Scales,” arXiv, 2019. DOI: https://doi.org/10.48550/arXiv.1909.12673
Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. “A Simple Framework for Contrastive Learning of Visual Representations,” arXiv, 2020. DOI: https://doi.org/10.48550/arXiv.2002.05709
Wang, Z., Luo, Y., Zheng, L., Chen, Z., Wang, S., and Huang, Z. “Online Test-Time Adaptation Survey,” Int. J. Comput. Vision, 2024. DOI: https://doi.org/10.1007/s11263-024-02003-x
Xie, Q., Luong, M.-T., Hovy, E., and Le, Q. V. “Noisy Student Training,” Proc. CVPR, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.01346
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., and Bowman, S. R. “GLUE Benchmark,” arXiv, 2019. DOI: https://doi.org/10.48550/arXiv.1804.07461