bibtex# Model Card for FLAN-T5 Small ## Table of Contents - [TL;DR](#tldr) - [Model Details](#model-details) - [Usage](#usage) - [Uses](#uses) - [Bias, Risks, and Limitations](#bias-risks-and-limitations) - [Training Details](#training-details) - [Evaluation](#evaluation) - [Environmental Impact](#environmental-impact) - [Citation](#citation) - [Model Card Authors](#model-card-authors) ## TL;DR If you're familiar with T5, FLAN-T5 provides enhanced performance across tasks. For the same parameter count, these models have been fine-tuned on over 1,000 additional tasks, supporting more languages and achieving strong few-shot performance. FLAN-PaLM 540B, for example, achieves state-of-the-art performance on benchmarks like five-shot MMLU at 75.2%. Instruction finetuning has proven effective for usability across a wide range of tasks. --- ## Model Details ### Model Description - **Model Type**: Language Model - **Language(s)**: English, Spanish, Japanese, Persian, Hindi, French, Chinese, Bengali, Gujarati, German, Telugu, Italian, Arabic, Polish, Tamil, Marathi, Malayalam, Oriya, Panjabi, Portuguese, Urdu, Galician, Hebrew, Korean, Catalan, Thai, Dutch, Indonesian, Vietnamese, Bulgarian, Filipino, Central Khmer, Lao, Turkish, Russian, Croatian, Swedish, Yoruba, Kurdish, Burmese, Malay, Czech, Finnish, Somali, Tagalog, Swahili, Sinhala, Kannada, Zhuang, Igbo, Xhosa, Romanian, Haitian, Estonian, Slovak, Lithuanian, Greek, Nepali, Assamese, Norwegian. - **License**: Apache 2.0 - **Related Models**: All FLAN-T5 Checkpoints - **Original Checkpoints**: Available for all original FLAN-T5 models ## Uses ### Direct Use and Downstream Use FLAN-T5 Small is intended for: - **Language Understanding & Generation**: Handling various NLP tasks, including translation, summarization, and question answering. - **Multilingual Support**: Effective across multiple languages with expanded vocabulary. - **Research on Language Models**: Suitable for tasks in zero-shot and few-shot learning, reasoning, and question answering. ## Bias, Risks, and Limitations ### Ethical Considerations and Risks According to Rae et al. (2021), language models like FLAN-T5 can potentially be used for harmful language generation. FLAN-T5 should be assessed for safety and fairness concerns before deployment in sensitive applications. ### Known Limitations FLAN-T5 has not been rigorously tested in real-world applications and may generate inappropriate or biased content based on the training data. ### Sensitive Use FLAN-T5 should not be used for unacceptable cases, such as generating harmful or abusive speech. --- ## Training Details ### Training Data The model was fine-tuned on a diverse set of tasks, enhancing zero-shot and few-shot performance across multiple languages and NLP tasks. Refer to the research paper for a complete list of tasks. ### Training Procedure The model was trained on TPU v3 or TPU v4 pods using the `t5x` codebase and `jax`. Each FLAN model corresponds to a fine-tuned variant of the T5 model at different scales. --- ## Evaluation ### Testing Data, Factors & Metrics The authors evaluated FLAN-T5 Small on 1,836 tasks across multiple languages. For more detailed quantitative evaluation, see the research paper’s Table 3. ### Results FLAN-T5 Small achieved competitive performance across tasks, with benchmarks showing improvement over baseline T5 models. --- ## Environmental Impact ### Carbon Emissions Carbon emissions are estimated using the Machine Learning Impact calculator by Lacoste et al. (2019). - **Hardware Type**: TPU v3 or v4 pods - **Cloud Provider**: Google Cloud Platform (GCP) - **Estimated Carbon Emissions**: Further details required for an exact estimate. --- ## Citation Please cite FLAN-T5 Small as follows:
@misc{https://doi.org/10.48550/arxiv.2210.11416, doi = {10.48550/ARXIV.2210.11416}, url = {https://arxiv.org/abs/2210.11416}, author = {Chung, Hyung Won and Hou, Le and Longpre, Shayne and Zoph, Barret and Tay, Yi and Fedus, William and Li, Eric and Wang, Xuezhi and Dehghani, Mostafa and Brahma, Siddhartha and Webson, Albert and Gu, Shixiang Shane and Dai, Zhuyun and Suzgun, Mirac and Chen, Xinyun and Chowdhery, Aakanksha and Narang, Sharan and Mishra, Gaurav and Yu, Adams and Zhao, Vincent and Huang, Yanping and Dai, Andrew and Yu, Hongkun and Petrov, Slav and Chi, Ed H. and Dean, Jeff and Devlin, Jacob and Roberts, Adam and Zhou, Denny and Le, Quoc V. and Wei, Jason}, keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL), FOS: Computer and information sciences}, title = {Scaling Instruction-Finetuned Language Models}, publisher = {arXiv}, year = {2022}, copyright = {Creative Commons Attribution 4.0 International} } ```
This model card was auto-genearted, incorporating details from the official FLAN-T5 model card. ```