google/flan-t5-xl

  • Modal Card
  • Files & version

Model Card for FLAN-T5 XL

Table of Contents

TL;DR

If you're familiar with T5, FLAN-T5 provides enhanced performance across tasks. For the same parameter count, these models have been fine-tuned on over 1,000 additional tasks, supporting more languages and achieving strong few-shot performance. FLAN-PaLM 540B, for example, achieves state-of-the-art performance on benchmarks like five-shot MMLU at 75.2%. Instruction finetuning has proven effective for usability across a wide range of tasks.


Model Details

Model Description

  • Model Type: Language Model
  • Language(s): English, Spanish, Japanese, Persian, Hindi, French, Chinese, Bengali, Gujarati, German, Telugu, Italian, Arabic, Polish, Tamil, Marathi, Malayalam, Oriya, Panjabi, Portuguese, Urdu, Galician, Hebrew, Korean, Catalan, Thai, Dutch, Indonesian, Vietnamese, Bulgarian, Filipino, Central Khmer, Lao, Turkish, Russian, Croatian, Swedish, Yoruba, Kurdish, Burmese, Malay, Czech, Finnish, Somali, Tagalog, Swahili, Sinhala, Kannada, Zhuang, Igbo, Xhosa, Romanian, Haitian, Estonian, Slovak, Lithuanian, Greek, Nepali, Assamese, Norwegian.
  • License: Apache 2.0
  • Related Models: All FLAN-T5 Checkpoints
  • Original Checkpoints: Available for all original FLAN-T5 models

Uses

Direct Use and Downstream Use

FLAN-T5 XL is intended for:

  • Language Understanding & Generation: Handling various NLP tasks, including translation, summarization, and question answering.
  • Multilingual Support: Effective across multiple languages with expanded vocabulary.
  • Research on Language Models: Suitable for tasks in zero-shot and few-shot learning, reasoning, and question answering.

Out-of-Scope Use

More information is needed on out-of-scope applications.


Bias, Risks, and Limitations

Ethical Considerations and Risks

According to Rae et al. (2021), language models like FLAN-T5 can potentially be used for harmful language generation. FLAN-T5 should be assessed for safety and fairness concerns before deployment in sensitive applications.

Known Limitations

FLAN-T5 has not been rigorously tested in real-world applications and may generate inappropriate or biased content based on the training data.

Sensitive Use

FLAN-T5 should not be used for unacceptable cases, such as generating harmful or abusive speech.


Training Details

Training Data

The model was fine-tuned on a diverse set of tasks, enhancing zero-shot and few-shot performance across multiple languages and NLP tasks. Refer to the research paper for a complete list of tasks.

Training Procedure

The model was trained on TPU v3 or TPU v4 pods using the

t5x
codebase and
jax
. Each FLAN model corresponds to a fine-tuned variant of the T5 model at different scales.


Evaluation

Testing Data, Factors & Metrics

The authors evaluated FLAN-T5 XL on 1,836 tasks across multiple languages. For more detailed quantitative evaluation, see the research paper’s Table 3.

Results

FLAN-T5 XL achieved competitive performance across tasks, often surpassing the baseline T5 models in various benchmarks.


Citation

Please cite FLAN-T5 XL as follows:

@misc{https://doi.org/10.48550/arxiv.2210.11416,
  doi = {10.48550/ARXIV.2210.11416},
  url = {https://arxiv.org/abs/2210.11416},
  author = {Chung, Hyung Won and Hou, Le and Longpre, Shayne and Zoph, Barret and Tay, Yi and Fedus, William and Li, Eric and Wang, Xuezhi and Dehghani, Mostafa and Brahma, Siddhartha and Webson, Albert and Gu, Shixiang Shane and Dai, Zhuyun and Suzgun, Mirac and Chen, Xinyun and Chowdhery, Aakanksha and Narang, Sharan and Mishra, Gaurav and Yu, Adams and Zhao, Vincent and Huang, Yanping and Dai, Andrew and Yu, Hongkun and Petrov, Slav and Chi, Ed H. and Dean, Jeff and Devlin, Jacob and Roberts, Adam and Zhou, Denny and Le, Quoc V. and Wei, Jason},
  keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL), FOS: Computer and information sciences},
  title = {Scaling Instruction-Finetuned Language Models},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution 4.0 International}
}

Model Card Authors

This model card was auto-generated, incorporating details from the official FLAN-T5 model card.