Gemma 7B Instruct is part of Google’s open model family, optimized for following instructions in a variety of natural language tasks. Built to handle tasks such as question answering, summarization, and reasoning, this 7B parameter model provides high performance with a focus on ease of deployment across various environments, from personal devices to cloud infrastructure.
Fine-tuning scripts are provided in the Gemma repository. Users can adapt these scripts to optimize the 7B Instruct model for specific datasets, such as UltraChat, and run in environments like Google Colab for further customization.
Gemma 7B Instruct is deployable across various devices, including CPUs and single or multi-GPU setups, with quantization options to enhance performance in constrained environments.
Gemma 7B Instruct is tailored for:
Language models such as Gemma 7B Instruct can inherit biases from training data, posing risks of unintended or inappropriate outputs. Misuse of the model could result in misinformation or harmful text generation.
The Gemma models are trained on a dataset of text data totaling 6 trillion tokens, which includes diverse sources such as:
Data preparation included rigorous filtering and quality control:
Gemma models were tested on a variety of benchmarks to evaluate performance in text generation, factual accuracy, and reasoning tasks.
Benchmark | Metric | Gemma 2B Instruct | Gemma 7B Instruct |
---|---|---|---|
MMLU | 5-shot, top-1 | 42.3 | 64.3 |
HellaSwag | 0-shot | 71.4 | 81.2 |
PIQA | 0-shot | 77.3 | 81.2 |
TriviaQA | 5-shot | 53.2 | 63.4 |
CommonsenseQA | 7-shot | 65.3 | 71.3 |
GSM8K | maj@1 | 17.7 | 46.4 |
Average | 45.0 | 56.9 |
Gemma models underwent structured evaluations, including internal red-teaming and human evaluations on content safety. Tests addressed categories such as text-to-text content safety, representational harms, and memorization risks.
Gemma 7B Instruct met ethical standards in several key benchmarks for safe deployment.
Benchmark | Metric | Gemma 2B Instruct | Gemma 7B Instruct |
---|---|---|---|
RealToxicity | Average | 6.86 | 7.90 |
BBQ Ambig | 1-shot, top-1 | 62.58 | 92.54 |
Winogender | Top-1 | 51.25 | 54.17 |
TruthfulQA | Average | 44.84 | 31.81 |
Toxigen | Top-1 | 29.77 | 39.59 |
Gemma models support a range of applications, including:
Gemma models provide high-performance language generation capabilities, with an emphasis on responsible AI. They offer competitive performance relative to other open models, making advanced AI accessible to a broader audience and fostering innovation.
If you use Gemma in your research, please cite:
```bibtex @misc{https://doi.org/10.48550/arxiv.2210.11416, doi = {10.48550/ARXIV.2210.11416}, url = {https://arxiv.org/abs/2210.11416}, author = {Google AI}, title = {Gemma 7B: Instruction-Tuned Model for General and Conversational AI Tasks}, publisher = {arXiv}, year = {2023}, keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL)}, copyright = {Creative Commons Attribution 4.0 International} }