google/flan-t5-base

  • Modal Card
  • Files & version

Model Card for FLAN-T5 Base

Table of Contents

TL;DR

If you're familiar with T5, FLAN-T5 provides enhanced performance. For the same parameter count, these models have been fine-tuned on over 1,000 additional tasks, supporting more languages and achieving strong few-shot performance. FLAN-PaLM 540B, for example, achieves state-of-the-art performance on benchmarks like five-shot MMLU at 75.2%. Instruction finetuning has proven effective for usability across a wide range of tasks.


Model Details

Model Description

  • Model Type: Language Model
  • Language(s): English, Spanish, Japanese, Persian, Hindi, French, Chinese, Bengali, Gujarati, German, Telugu, Italian, Arabic, Polish, Tamil, Marathi, Malayalam, Oriya, Panjabi, Portuguese, Urdu, Galician, Hebrew, Korean, Catalan, Thai, Dutch, Indonesian, Vietnamese, Bulgarian, Filipino, Central Khmer, Lao, Turkish, Russian, Croatian, Swedish, Yoruba, Kurdish, Burmese, Malay, Czech, Finnish, Somali, Tagalog, Swahili, Sinhala, Kannada, Zhuang, Igbo, Xhosa, Romanian, Haitian, Estonian, Slovak, Lithuanian, Greek, Nepali, Assamese, Norwegian.
  • License: Apache 2.0
  • Related Models: All FLAN-T5 Checkpoints
  • Original Checkpoints: Available for all original FLAN-T5 models

Uses

Direct Use and Downstream Use

FLAN-T5 Base is intended for:

  • Language Understanding & Generation: Handling various NLP tasks, including translation, summarization, and question answering.
  • Multilingual Support: Effective across multiple languages with expanded vocabulary.
  • Research on Language Models: Suitable for tasks in zero-shot and few-shot learning, reasoning, and question answering.

Bias, Risks, and Limitations

Ethical Considerations and Risks

According to Rae et al. (2021), language models like FLAN-T5 can potentially be used for harmful language generation. FLAN-T5 should be assessed for safety and fairness concerns before deployment in sensitive applications.

Known Limitations

FLAN-T5 has not been rigorously tested in real-world applications and may generate inappropriate or biased content based on the training data.

Sensitive Use

FLAN-T5 should not be used for unacceptable cases, such as generating harmful or abusive speech.


Training Details

Training Data

The model was fine-tuned on a diverse set of tasks, enhancing zero-shot and few-shot performance across multiple languages and NLP tasks. Refer to the research paper for a complete list of tasks.

Training Procedure

The model was trained on TPU v3 or TPU v4 pods using the

t5x
codebase and
jax
. Each FLAN model corresponds to a fine-tuned variant of the T5 model at different scales.


Evaluation

Testing Data, Factors & Metrics

The authors evaluated FLAN-T5 Base on 1,836 tasks across multiple languages. For more detailed quantitative evaluation, see the research paper’s Table 3.

Results

FLAN-T5 Base achieved competitive performance, including a benchmark score of 77.98, surpassing the baseline model google/t5-v1_1-base with a score of 68.82.


Citation

Please cite FLAN-T5 Base as follows:

@misc{https://doi.org/10.48550/arxiv.2210.11416,
  doi = {10.48550/ARXIV.2210.11416},
  url = {https://arxiv.org/abs/2210.11416},
  author = {Chung, Hyung Won and Hou, Le and Longpre, Shayne and Zoph, Barret and Tay, Yi and Fedus, William and Li, Eric and Wang, Xuezhi and Dehghani, Mostafa and Brahma, Siddhartha and Webson, Albert and Gu, Shixiang Shane and Dai, Zhuyun and Suzgun, Mirac and Chen, Xinyun and Chowdhery, Aakanksha and Narang, Sharan and Mishra, Gaurav and Yu, Adams and Zhao, Vincent and Huang, Yanping and Dai, Andrew and Yu, Hongkun and Petrov, Slav and Chi, Ed H. and Dean, Jeff and Devlin, Jacob and Roberts, Adam and Zhou, Denny and Le, Quoc V. and Wei, Jason},
  keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL), FOS: Computer and information sciences},
  title = {Scaling Instruction-Finetuned Language Models},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution 4.0 International}
}

Model Card Authors

This model cars is auto generated based on the original model card on huggingface.