Model Card for Gemma 2B Instruct
Table of Contents
TL;DR
Gemma is a family of open, state-of-the-art language models by Google. Built with the same technology as the Gemini models, Gemma models are optimized for a variety of text-to-text generation tasks, including question answering, summarization, and reasoning. They are lightweight, deployable on resource-constrained environments, and available with open weights, promoting democratized access to advanced AI.
Model Details
Model Information
- Model Type: Text-to-text, decoder-only large language model
- Language(s): English
- License: Terms of Use available on the Gemma model page
- Related Models: 2B base, 7B base, and 7B instruct models
Usage
Inputs and Outputs
- Input: Text string (e.g., question, prompt, document to summarize)
- Output: Generated English text (e.g., answer, summary)
Hardware Compatibility
Gemma 2B Instruct is optimized for use in environments with limited resources, making it deployable on devices like laptops, desktops, or custom cloud infrastructures.
Uses
Direct Use and Downstream Use
Gemma is suitable for:
- Text Generation: Creating content such as poems, scripts, and email drafts.
- Question Answering and Summarization: Providing concise answers or summaries of complex texts.
- Conversational AI: Supporting chatbots and virtual assistants for customer service.
- Code Generation: Understanding programming-related queries and generating code snippets.
Bias, Risks, and Limitations
Ethical Considerations and Risks
Large language models like Gemma can generate inappropriate or biased content, reflecting socio-cultural biases in training data. Gemma’s model card outlines potential ethical concerns, including risks related to misinformation, biases, and harmful content.
Known Limitations
- Training Data Biases: The model’s responses may reflect biases present in its training data.
- Complex Task Handling: Gemma performs best with clear prompts; open-ended tasks might yield suboptimal responses.
- Factual Inaccuracy: Responses are based on patterns in the training data, and may contain outdated or incorrect information.
Training Details
Training Dataset
Gemma models were trained on a dataset comprising 6 trillion tokens from various sources, including:
- Web Documents: Diverse English-language content for broad topic coverage.
- Code: Exposure to programming syntax for code generation tasks.
- Mathematical Texts: Training on logic and symbolic reasoning for mathematical queries.
Data Preprocessing
- CSAM Filtering: Rigorous filtering to exclude illegal content.
- Sensitive Data Filtering: Exclusion of personal information and other sensitive data.
- Content Quality Filtering: Additional filtering to ensure data quality and adherence to policy.
Hardware and Software
- Hardware: TPUs (Tensor Processing Units, TPUv5e) optimized for high-bandwidth memory, scalability, and performance.
- Software: JAX and ML Pathways used for efficient model training and orchestration.
Evaluation
Benchmark Results
Gemma models were evaluated on various datasets covering aspects like commonsense reasoning, question answering, and code generation.
Benchmark | Metric | 2B Params | 7B Params |
---|
MMLU | 5-shot, top-1 | 42.3 | 64.3 |
HellaSwag | 0-shot | 71.4 | 81.2 |
PIQA | 0-shot | 77.3 | 81.2 |
TriviaQA | 5-shot | 53.2 | 63.4 |
CommonsenseQA | 7-shot | 65.3 | 71.3 |
GSM8K | maj@1 | 17.7 | 46.4 |
Average | | 45.0 | 56.9 |
Ethics and Safety
Evaluation Approach
Gemma underwent structured evaluations and red-teaming tests for content safety, representational harms, and memorization risks.
Evaluation Results
Evaluation on datasets like BBQ, BOLD, Winogender, Winobias, RealToxicity, and TruthfulQA showed Gemma meets internal safety thresholds.
Benchmark | Metric | 2B Params | 7B Params |
---|
RealToxicity | Average | 6.86 | 7.90 |
Winogender | Top-1 | 51.25 | 54.17 |
TruthfulQA | Average | 44.84 | 31.81 |
Toxigen | Top-1 | 29.77 | 39.59 |
Intended Usage and Limitations
Intended Usage
- Content Creation: Supports creative tasks, chatbots, summarization, and more.
- Research and Education: Assists NLP researchers in developing new algorithms and educational tools.
Limitations
- Data Bias: Biases in training data may influence model responses.
- Complexity and Nuance: Model may struggle with open-ended tasks or nuanced language.
- Factual Accuracy: Responses may contain outdated or incorrect information.
Benefits
Gemma provides high-performance, open-access language models, promoting responsible AI and democratized access to LLM technology. Compared to similarly sized models, Gemma excels in benchmarks, making it a competitive choice for developers and researchers.
Citation
If you use Gemma in your research, please cite:
```bibtex
@misc{https://doi.org/10.48550/arxiv.2210.11416,
doi = {10.48550/ARXIV.2210.11416},
url = {https://arxiv.org/abs/2210.11416},
author = {Google AI},
title = {Gemma: Lightweight Open Large Language Models for Responsible AI},
publisher = {arXiv},
year = {2023},
keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL)},
copyright = {Creative Commons Attribution 4.0 International}
}