meta-llama/Meta-Llama-3.1-8B-Instruct

  • Modal Card
  • Files & version
# Model Card for Meta Llama 3.2

## Table of Contents
- [TL;DR](#tldr)
- [Model Details](#model-details)
- [Intended Use](#intended-use)
- [Usage](#usage)
- [Downstream Use Cases](#downstream-use-cases)
- [Model Architecture](#model-architecture)
- [Training Details](#training-details)
- [Evaluation](#evaluation)
- [Ethics and Safety](#ethics-and-safety)
- [Community](#community)
- [Ethical Considerations and Limitations](#ethical-considerations-and-limitations)
- [Citation](#citation)

## TL;DR
Llama 3.2 by Meta is a collection of multilingual large language models optimized for dialogue, agentic retrieval, and summarization tasks. Available in 1B and 3B parameter sizes, the models are instruction-tuned to support multilingual use cases with high performance on industry benchmarks.

---

## Model Details

### Model Information
- **Model Type**: Multilingual, generative language model (text in/text out)
- **Sizes**: 1B (1.23B), 3B (3.21B) parameters
- **Language Support**: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai
- **Context Length**: Up to 128k tokens (1B and 3B quantized models support 8k tokens)
- **Release Date**: September 25, 2024
- **License**: [Llama 3.2 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/LICENSE)

### Model Developer
Developed by Meta.

---

## Intended Use

### Use Cases
Llama 3.2 is intended for:
- **Assistant-like Chat**: Instruction-tuned models excel in conversational and multilingual dialogue.
- **Agentic Applications**: Useful for knowledge retrieval, summarization, mobile AI-powered writing assistants, and query rewriting.
- **On-device Deployment**: Quantized models are designed for constrained environments such as mobile devices, making Llama 3.2 suitable for a variety of on-device tasks.

### Out-of-Scope Use
Use of Llama 3.2 models in a manner that violates laws or Meta’s Acceptable Use Policy is prohibited. Deployments in unsupported languages are also considered out of scope.

---

## Usage



## Downstream Use Cases

Llama 3.2 can support a variety of downstream use cases, including:
- **Customer Service Chatbots**: Deploy multilingual chatbots for efficient customer service support.
- **Knowledge Retrieval Systems**: Enable agentic retrieval of knowledge bases for user queries in supported languages.
- **Summarization Tools**: Generate concise summaries of documents in various languages.
- **Mobile Writing Assistants**: On-device applications for text generation, grammar correction, and writing assistance.
- **Multilingual Text Generation**: Produce content like articles, summaries, and reports across multiple languages.
- **Agentic Applications**: Provide tailored responses and context-aware retrieval for applications requiring dialogue-based interactions.

---

## Model Architecture

Llama 3.2 is based on an optimized transformer architecture with:
- **Auto-regressive Modeling**: Sequential token prediction for effective text generation.
- **Grouped-Query Attention (GQA)**: Improves scalability in inference, allowing for longer context handling up to 128k tokens in the full models.
- **Fine-Tuning Techniques**: Supervised Fine-Tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF) improve model alignment with human-like helpfulness and safety.
- **Quantization Capabilities**: Supports quantization for resource-constrained environments, with quantization-aware training (QAT) and QLoRA for further model size and efficiency optimizations.
- **Logit-based Distillation**: Pretrained on logits from larger Llama 3.1 models to improve performance in smaller 1B and 3B models.

---

## Training Details

### Training Data
Llama 3.2 models were trained on approximately 9 trillion tokens from publicly available sources. Knowledge distillation from Llama 3.1’s 8B and 70B models was employed, using logits as token-level targets to enhance performance.

### Hardware and Environmental Impact
The training required 833,000 GPU hours, with a total estimated CO2eq emissions of 240 tons. Meta’s carbon-neutral commitment mitigates these emissions through renewable energy sources.

| Model       | GPU Hours | Emissions (tons CO2eq) |
|-------------|-----------|------------------------|
| 1B          | 370,000   | 107                    |
| 3B          | 460,000   | 133                    |
| 1B QLoRA    | 1,300     | 0.381                  |
| 3B QLoRA    | 1,600     | 0.461                  |
| Total       | 833,000   | 240                    |

### Quantization
Llama 3.2 employs a custom quantization scheme for optimized performance on constrained devices, including:
- **4-bit Groupwise Quantization**: Applied to linear layers.
- **8-bit Per-token Dynamic Quantization**: For activations.
- **SpinQuant**: Combines post-training quantization (GPTQ) and rotation matrix fine-tuning for compactness.

---

## Evaluation

### Benchmark Results
Llama 3.2 models were evaluated on several industry benchmarks. Below are some results:

| Benchmark          | 1B    | 3B    | Comparison (3.1 8B) |
|--------------------|-------|-------|----------------------|
| MMLU (5-shot)      | 32.2  | 58.0  | 66.7                |
| SQuAD (1-shot)     | 49.2  | 67.7  | 77.0                |
| ARC-Challenge      | 32.8  | 69.1  | 79.7                |
| Instruction Following (IFEval) | 59.5 | 77.4 | 80.4       |

### Safety Evaluations
Meta conducted extensive red teaming to identify potential risks, particularly around sensitive areas such as child safety, cyber attack enablement, and CBRNE content.

---

## Ethics and Safety

### Responsible Deployment
Meta provides tools such as Llama Guard, Prompt Guard, and Code Shield for safe deployment. Llama 3.2 models should be integrated within AI systems with appropriate safeguards to align with safety standards.

### Risk Areas
Specific risk mitigation measures focus on:
1. **CBRNE Risks**: Uplift testing to reduce misuse in malicious applications.
2. **Child Safety**: Expert evaluations to mitigate risks of harmful outputs.
3. **Cybersecurity**: Evaluation of LLMs' potential to assist in cyber attacks, including ransomware and phishing.

---

## Community

Meta actively engages with the AI community to advance responsible AI through:
- **Partnerships**: Collaboration with AI Alliance, Partnership on AI, and MLCommons for standardization and transparency.
- **Llama Impact Grants**: Supporting impactful projects in education, climate, and innovation.
- **Open Source Contributions**: Providing resources, including Purple Llama tools and a bug bounty program, to encourage community-driven safety enhancements.

---

## Ethical Considerations and Limitations

Llama 3.2 aims to support inclusive and diverse use cases. However, potential risks, including bias and misinformation, are inherent. Developers should perform thorough testing and refer to Meta's guidelines for safe deployment.

---

## Citation

If you use Llama 3.2 in your research, please cite:
bibtex

@misc{https://doi.org/10.48550/arxiv.2310.12346, doi = {10.48550/ARXIV.2310.12346}, url = {https://arxiv.org/abs/2310.12346}, author = {Meta AI}, title = {Llama 3.2: Compact Multilingual Large Language Models for Diverse Applications}, publisher = {arXiv}, year = {2024}, keywords = {Machine Learning, Natural Language Processing}, copyright = {Creative Commons Attribution 4.0 International} } ```