google/gemma-2b

  • Modal Card
  • Files & version

Model Card for Gemma 2B Base

Table of Contents

TL;DR

Gemma 2B Base is part of Google’s lightweight and high-performance open model family. Designed to handle various text generation tasks efficiently, the Gemma models provide accessible large language model capabilities even for resource-limited devices. This 2B parameter model is optimized for tasks such as question answering, summarization, and reasoning with a context length of 8192 tokens.


Model Details

Model Information

  • Model Type: Text-to-text, decoder-only large language model
  • Language(s): Primarily English
  • License: Terms of Use available on the Gemma model page
  • Related Models: Gemma 2B Instruct, Gemma 7B Base, Gemma 7B Instruct

Usage

Inputs and Outputs

  • Input: Text string (e.g., a question, prompt, or document to summarize)
  • Output: Generated English text in response to the input, such as an answer or summary

Fine-tuning

Preconfigured fine-tuning scripts are available in the Gemma repository. Users can adapt these for the 2B base model to optimize performance on specific datasets, such as UltraChat, and to run in different environments like Google Colab.

Hardware Compatibility

Gemma 2B Base is deployable on various devices, including CPUs, single or multi-GPU setups, and with quantization options to optimize performance for constrained environments.


Uses

Direct Use and Downstream Use

Gemma 2B Base is designed for:

  • Text Generation: Generating creative text formats such as poems, scripts, and marketing copy.
  • Question Answering: Responding to inquiries with relevant information.
  • Summarization: Condensing long documents or texts into concise summaries.
  • Conversational AI: Serving as a foundational model for chatbots and virtual assistants.

Out-of-Scope Use

The model may not be suitable for highly sensitive applications or tasks requiring guaranteed factual accuracy and unbiased outputs.


Bias, Risks, and Limitations

Ethical Considerations and Risks

Language models like Gemma 2B Base may inherit biases from training data, potentially leading to biased or inappropriate outputs. Misuse of the model could propagate misinformation or harmful language if not appropriately managed.

Known Limitations

  • Data Bias: The model may reflect biases present in the training data.
  • Complex Task Handling: Performance may decline with open-ended or highly nuanced tasks.
  • Factual Reliability: The model’s responses may sometimes be outdated or factually incorrect.

Training Details

Training Dataset

The training data includes a diverse set of text sources totaling 6 trillion tokens. Major components:

  • Web Documents: Ensures exposure to various linguistic styles and topics.
  • Code: Enhances model understanding of programming language syntax.
  • Mathematics: Supports logical reasoning and handling of mathematical queries.

Data Preprocessing

  • CSAM Filtering: Excludes harmful and illegal content.
  • Sensitive Data Filtering: Removes personal information and sensitive data.
  • Quality and Safety Filtering: Ensures high-quality data input and alignment with safety policies.

Hardware and Software

  • Hardware: Trained on TPUv5e, optimized for efficient handling of large datasets.
  • Software: JAX and ML Pathways facilitate streamlined training across TPUs, allowing scalability and simplified development processes.

Evaluation

Benchmark Results

Gemma models were tested on various benchmarks to assess their performance in text generation, reasoning, and factual accuracy.

BenchmarkMetricGemma 2B BaseGemma 7B Base
MMLU5-shot, top-142.364.3
HellaSwag0-shot71.481.2
PIQA0-shot77.381.2
TriviaQA5-shot53.263.4
CommonsenseQA7-shot65.371.3
GSM8Kmaj@117.746.4
Average45.056.9

Ethics and Safety

Evaluation Approach

The Gemma models underwent structured evaluations, including internal red-teaming and human evaluations on content safety. Tests covered categories like text-to-text content safety, representational harms, and memorization risks.

Evaluation Results

Gemma 2B Base achieved scores within acceptable thresholds for ethical standards. Notable benchmarks include:

BenchmarkMetricGemma 2B BaseGemma 7B Base
RealToxicityAverage6.867.90
BBQ Ambig1-shot, top-162.5892.54
WinogenderTop-151.2554.17
TruthfulQAAverage31.8144.84
ToxigenTop-129.7739.59

Intended Usage and Limitations

Intended Usage

Gemma models have various applications:

  • Content Creation: Generating poems, scripts, and marketing content.
  • Customer Service and Chatbots: Enhancing conversational AI with responsive dialogue.
  • Educational Tools: Assisting in language learning, grammar correction, and exploratory research.

Limitations

  • Bias: Potential for socio-cultural biases inherited from the training data.
  • Context Limitations: Longer or complex prompts may affect performance.
  • Accuracy: Outputs may contain outdated or incorrect information, impacting use in factual scenarios.

Benefits

Gemma provides accessible large language models that support responsible AI practices. These models balance performance and ethical considerations, making them a competitive alternative to similarly sized open models, with an emphasis on fostering innovation and democratizing AI technology.


Citation

If you use Gemma in your research, please cite:

```bibtex @misc{https://doi.org/10.48550/arxiv.2210.11416, doi = {10.48550/ARXIV.2210.11416}, url = {https://arxiv.org/abs/2210.11416}, author = {Google AI}, title = {Gemma 2B: Efficient and Open Language Model for Responsible AI}, publisher = {arXiv}, year = {2023}, keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL)}, copyright = {Creative Commons Attribution 4.0 International} }