Gemma 2B Base is part of Google’s lightweight and high-performance open model family. Designed to handle various text generation tasks efficiently, the Gemma models provide accessible large language model capabilities even for resource-limited devices. This 2B parameter model is optimized for tasks such as question answering, summarization, and reasoning with a context length of 8192 tokens.
Preconfigured fine-tuning scripts are available in the Gemma repository. Users can adapt these for the 2B base model to optimize performance on specific datasets, such as UltraChat, and to run in different environments like Google Colab.
Gemma 2B Base is deployable on various devices, including CPUs, single or multi-GPU setups, and with quantization options to optimize performance for constrained environments.
Gemma 2B Base is designed for:
The model may not be suitable for highly sensitive applications or tasks requiring guaranteed factual accuracy and unbiased outputs.
Language models like Gemma 2B Base may inherit biases from training data, potentially leading to biased or inappropriate outputs. Misuse of the model could propagate misinformation or harmful language if not appropriately managed.
The training data includes a diverse set of text sources totaling 6 trillion tokens. Major components:
Gemma models were tested on various benchmarks to assess their performance in text generation, reasoning, and factual accuracy.
Benchmark | Metric | Gemma 2B Base | Gemma 7B Base |
---|---|---|---|
MMLU | 5-shot, top-1 | 42.3 | 64.3 |
HellaSwag | 0-shot | 71.4 | 81.2 |
PIQA | 0-shot | 77.3 | 81.2 |
TriviaQA | 5-shot | 53.2 | 63.4 |
CommonsenseQA | 7-shot | 65.3 | 71.3 |
GSM8K | maj@1 | 17.7 | 46.4 |
Average | 45.0 | 56.9 |
The Gemma models underwent structured evaluations, including internal red-teaming and human evaluations on content safety. Tests covered categories like text-to-text content safety, representational harms, and memorization risks.
Gemma 2B Base achieved scores within acceptable thresholds for ethical standards. Notable benchmarks include:
Benchmark | Metric | Gemma 2B Base | Gemma 7B Base |
---|---|---|---|
RealToxicity | Average | 6.86 | 7.90 |
BBQ Ambig | 1-shot, top-1 | 62.58 | 92.54 |
Winogender | Top-1 | 51.25 | 54.17 |
TruthfulQA | Average | 31.81 | 44.84 |
Toxigen | Top-1 | 29.77 | 39.59 |
Gemma models have various applications:
Gemma provides accessible large language models that support responsible AI practices. These models balance performance and ethical considerations, making them a competitive alternative to similarly sized open models, with an emphasis on fostering innovation and democratizing AI technology.
If you use Gemma in your research, please cite:
```bibtex @misc{https://doi.org/10.48550/arxiv.2210.11416, doi = {10.48550/ARXIV.2210.11416}, url = {https://arxiv.org/abs/2210.11416}, author = {Google AI}, title = {Gemma 2B: Efficient and Open Language Model for Responsible AI}, publisher = {arXiv}, year = {2023}, keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL)}, copyright = {Creative Commons Attribution 4.0 International} }