google/gemma-1.1-2b-it

  • Modal Card
  • Files & version

Model Card for Gemma 1.1 2B Instruct

Table of Contents

TL;DR

Gemma is a lightweight, open large language model family by Google, designed for efficient deployment on limited-resource devices. The 1.1 version introduces improvements via Reinforcement Learning from Human Feedback (RLHF), enhancing quality, factuality, instruction following, and multi-turn conversation capabilities. The model is suitable for tasks such as question answering, summarization, and code generation.


Model Details

Model Information

  • Model Type: Text-to-text, decoder-only large language model
  • Language(s): English
  • License: Terms of Use available on the Gemma model page
  • Related Models: 2B and 7B base, 2B and 7B instruct versions

Usage

Inputs and Outputs

  • Input: Text string (e.g., a question, prompt, or document to summarize)
  • Output: Generated English text (e.g., an answer, summary, or code snippet)

Hardware Compatibility

Gemma 1.1 2B Instruct can be deployed in resource-constrained environments such as laptops, desktops, or private cloud infrastructures, making it widely accessible for practical applications.


Uses

Direct Use and Downstream Use

Gemma 1.1 2B Instruct is well-suited for:

  • Text Generation: Creating various text formats like poems, scripts, marketing copy, and drafts.
  • Question Answering and Summarization: Generating responses to queries or providing concise summaries.
  • Conversational AI: Powering chatbots, virtual assistants, or interactive applications.
  • Code Generation: Understanding programming-related prompts and generating code.

Out-of-Scope Use

The model may not perform well on tasks beyond its training scope. Caution should be exercised for applications with high factual accuracy or ethical requirements.


Bias, Risks, and Limitations

Ethical Considerations and Risks

Large language models like Gemma 1.1 2B Instruct may reflect socio-cultural biases present in the training data, potentially resulting in biased or inappropriate outputs. The model may also propagate misinformation if not used responsibly.

Known Limitations

  • Training Data Bias: Biases from training data may impact the model's responses.
  • Handling of Complex Tasks: Gemma 1.1 performs best with specific, structured prompts and may struggle with open-ended tasks.
  • Factual Accuracy: The model’s responses may contain outdated or incorrect information.

Training Details

Training Dataset

Gemma models were trained on a vast dataset comprising 6 trillion tokens from diverse sources:

  • Web Documents: Broad linguistic exposure to various topics.
  • Code: Inclusion of programming syntax for improved code generation abilities.
  • Mathematical Texts: Data for logical reasoning and mathematical problem-solving.

Data Preprocessing

  • CSAM Filtering: Exclusion of harmful content.
  • Sensitive Data Filtering: Removal of personally identifiable information and other sensitive data.
  • Quality Filtering: Additional steps to ensure high-quality and safe content.

Hardware and Software

  • Hardware: Trained on TPUv5e, optimized for high performance and memory.
  • Software: JAX and ML Pathways used for efficient model training, particularly suitable for large-scale foundation models.

Evaluation

Benchmark Results

Gemma models were evaluated on a range of tasks to assess performance across text generation, reasoning, and factuality.

BenchmarkMetricGemma 1.1 IT 2BGemma 1.1 IT 7B
MMLU5-shot, top-142.364.3
HellaSwag0-shot71.481.2
PIQA0-shot77.381.2
TriviaQA5-shot53.263.4
CommonsenseQA7-shot65.371.3
GSM8Kmaj@117.746.4
Average45.056.9

Ethics and Safety

Evaluation Approach

Ethics and safety evaluations include structured tests, red-teaming, and human assessments. Key areas include content safety, representational harms, and memorization risks.

Evaluation Results

The Gemma model family meets internal standards on metrics like content safety, representational harms, and large-scale harm prevention. Key results:

BenchmarkMetricGemma 1.1 IT 2BGemma 1.1 IT 7B
RealToxicityAverage7.038.04
BBQ Ambig1-shot, top-158.9786.06
WinogenderTop-150.1457.64
TruthfulQAAverage44.2445.34
ToxigenTop-129.6438.75

Intended Usage and Limitations

Intended Usage

  • Content Creation and Communication: Useful for generating text, chatbots, and summarization.
  • Research and Education: Supports NLP research, interactive language learning, and exploration of knowledge domains.

Limitations

  • Data Bias: Biases in training data may influence responses.
  • Complexity: The model may struggle with ambiguous or nuanced prompts.
  • Factual Reliability: Responses may not always be factually accurate.

Benefits

Gemma provides high-performance, open-access language models with responsible AI features, aiming to democratize AI technology for developers and researchers. The model's performance is competitive with other open models, making it a valuable tool for various applications.


Citation

If you use Gemma in your research, please cite:

```bibtex @misc{https://doi.org/10.48550/arxiv.2210.11416, doi = {10.48550/ARXIV.2210.11416}, url = {https://arxiv.org/abs/2210.11416}, author = {Google AI}, title = {Gemma 1.1: Instruction-Tuned Lightweight Language Models for Responsible AI}, publisher = {arXiv}, year = {2023}, keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL)}, copyright = {Creative Commons Attribution 4.0 International} }