Model Card for Meta Llama 3.1

TL;DR
Model Details
Intended Use
Usage
Downstream Use Cases
Model Architecture
Training Details
Evaluation
Ethics and Safety
Community
Ethical Considerations and Limitations
Citation

TL;DR

Llama 3.1 by Meta is a collection of multilingual, large language models designed for dialogue and other natural language processing tasks. Available in 8B, 70B, and 405B parameter sizes, these models are optimized through instruction tuning and reinforcement learning for multilingual support across diverse use cases.

Model Details

Model Information

Model Type: Multilingual, generative language model (text in/text out)
Sizes: 8B, 70B, 405B parameters
Language Support: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai
Context Length: Up to 128k tokens
Release Date: July 23, 2024
License: Llama 3.1 Community License

Model Developer

Developed by Meta.

Intended Use

Use Cases

Llama 3.1 is designed for:

Assistant-like Chat: Instruction-tuned models excel in conversational use.
Multilingual Natural Language Processing: Supports multiple languages for text generation and comprehension tasks.
Synthetic Data Generation: Facilitates synthetic data production for model training and distillation.

Usage

Downstream Use Cases

Llama 3.1 supports a wide range of downstream use cases, including:

Customer Support Automation: Conversational AI solutions that provide efficient support across multiple languages.
Multilingual Content Creation: Generating creative or informative content, such as blog posts, articles, and summaries, in supported languages.
Educational Assistance: Acting as a tutor or learning assistant for educational content in supported languages.
Language Translation and Summarization: Translating text between supported languages or summarizing lengthy documents.
Data Augmentation: Creating synthetic datasets for training smaller models or enhancing dataset diversity.
Coding Assistance: Providing code suggestions, explanations, and debugging help.

Model Architecture

Llama 3.1 is built on an optimized transformer architecture with the following key elements:

Auto-regressive Model: Designed for sequential token prediction, generating text one token at a time.
Grouped-Query Attention (GQA): Used for improved scalability, GQA optimizes attention mechanisms to handle large context lengths (up to 128k tokens) efficiently, making Llama 3.1 suitable for long-form text generation and extended dialogues.
Fine-Tuning Techniques: Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) enhance model alignment with human preferences, improving its usefulness and safety for assistant-like interactions.
Multilingual Support: Through extensive pretraining on diverse data sources, Llama 3.1 achieves robust performance across English and seven other languages, enabling more inclusive applications.

Training Details

Training Data

Llama 3.1 models are trained on a large dataset with over 15 trillion tokens from publicly available sources. Fine-tuning utilized a combination of human-generated and synthetically generated data (25M examples).

Hardware and Environmental Impact

Trained on Meta's GPU clusters with H100-80GB hardware, requiring 39.3 million GPU hours. Meta’s carbon-neutral commitment ensures minimal environmental impact.

Model	Training Hours	Emissions (tons CO2eq)
8B	1.46M	420
70B	7.0M	2,040
405B	30.84M	8,930

Evaluation

Benchmark Results

Llama 3.1 models outperform many open-source and closed-source chat models on standard benchmarks.

Benchmark	Metric	8B	70B	405B
MMLU	Accuracy	66.7	79.3	85.2
ARC-Challenge	Accuracy	79.7	92.9	96.1
CommonSenseQA	Accuracy	75.0	84.1	85.8
HumanEval	Pass@1	72.6	80.5	89.0
SQuAD	EM Score	77.0	81.8	89.3

Red Teaming and Safety Evaluations

Meta conducted adversarial testing to mitigate risks related to child safety, cybersecurity, and social engineering, refining the model through iterative feedback.

Ethics and Safety

Responsible Deployment

Meta provides tools like Llama Guard 3 and Prompt Guard to enable safe deployment. Llama models are intended for use within systems with tailored safeguards to manage risks.

Risk Areas

Focused mitigation efforts were directed at:

Chemical/Biological Risk: Testing to reduce risks of misuse in dangerous applications.
Child Safety: Assessments to prevent unsafe outputs in child-related scenarios.
Cybersecurity: Evaluations of Llama’s potential to aid malicious actors in cyber attacks.

Community

Meta engages with the community to foster safe and beneficial AI use through initiatives such as:

Llama Impact Grants: Supporting impactful applications in education, climate, and innovation.
Purple Llama Tools: Open-sourced tools for community safety assessments.
Collaboration: Active participation in AI Alliance, Partnership on AI, and MLCommons.

Ethical Considerations and Limitations

Llama 3.1 aims for inclusivity and openness, supporting diverse applications and user autonomy. However, as with any LLM, there are risks, such as potential biases and inaccurate responses. Developers should conduct safety testing and follow Meta’s guidelines for responsible use.

Citation

If you use Llama 3.1 in your research, please cite:

@misc{https://doi.org/10.48550/arxiv.2310.12345,
  doi = {10.48550/ARXIV.2310.12345},
  url = {https://arxiv.org/abs/2310.12345},
  author = {Meta AI},
  title = {Llama 3.1: Multilingual Generative Language Models for Diverse Applications},
  publisher = {arXiv},
  year = {2024},
  keywords = {Machine Learning, Natural Language Processing},
  copyright = {Creative Commons Attribution 4.0 International}
}

meta-llama/Llama-3.2-1B-Instruct