mistralai/Mistral-7B-v0.1

  • Modal Card
  • Files & version
# Model Card for Mistral-7B-Instruct-v0.1

## Table of Contents
- [TL;DR](#tldr)
- [Model Details](#model-details)
- [Intended Use](#intended-use)
- [Usage](#usage)
- [Model Architecture](#model-architecture)
- [Training Details](#training-details)
- [Limitations](#limitations)
- [Community](#community)
- [Citation](#citation)

## TL;DR
Mistral-7B-Instruct-v0.1 is an instruction-tuned model based on Mistral-7B-v0.1, designed for improved performance in conversational AI and assistant tasks. It provides robust handling of instruction-following prompts in dialogue. This model is accessible through both the Mistral and Hugging Face Transformers libraries.

---

## Model Details

### Model Information
- **Model Type**: Instruction-tuned large language model
- **Model Size**: 7.24B parameters
- **Tensor Type**: BF16
- **Supported Context Length**: Up to 8k tokens
- **License**: Open for community engagement and contributions

### Model Developers
Developed by Mistral AI, with contributions from a diverse team including Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, and others.

---

## Intended Use

### Use Cases
Mistral-7B-Instruct-v0.1 is suitable for:
- **Conversational AI**: Structured, instruction-following dialogue generation.
- **Educational Assistants**: Providing brief explanations or answering questions.
- **Customer Support**: Assisting users with general queries in a conversational format.
- **Knowledge Retrieval**: Generating concise responses for various inquiries.


---

## Model Architecture

Mistral-7B-Instruct-v0.1 is based on the Mistral-7B architecture, with the following design features:
- **Grouped-Query Attention**: Enhances model efficiency in handling attention queries.
- **Sliding-Window Attention**: Improves handling of long input sequences.
- **Byte-fallback BPE tokenizer**: Optimized for robust multilingual tokenization.

---

## Training Details

### Training Data
Mistral-7B-Instruct-v0.1 is fine-tuned using various publicly available datasets, curated to enhance the model’s instruction-following and dialogue generation capabilities. For comprehensive details, refer to the release paper and blog post.

### Inference
Inference can be performed via both Mistral’s proprietary library and Hugging Face Transformers. Mistral’s framework supports low-latency deployments.

---

## Limitations

The model currently does not include moderation mechanisms. Users should exercise caution when deploying Mistral-7B-Instruct-v0.1 in applications that require strict content safety and moderation.

---

## Community

Mistral AI invites community contributions, particularly for enhancing the alignment between Mistral’s tokenizer and Transformers. Contributions, including pull requests to refine the model, are encouraged.

### Known Issues and Troubleshooting
- **Transformer Compatibility**: Users may encounter a `KeyError` related to the 'mistral' key in Transformers library. Updating to `transformers-v4.33.4` or later should resolve this.
- **Tokenizer Alignment**: Contributions to improve tokenizer consistency between Mistral and Transformers are welcome.

---

## Citation

If you use Mistral-7B-Instruct-v0.1 in your research, please cite:
bibtex

@misc{mistralai2024mistral7b, author = {Mistral AI}, title = {Mistral-7B-Instruct-v0.1: Fine-tuned Large Language Model for Instruction Following}, year = {2024}, url = {https://github.com/mistralai/mistral-models}, publisher = {Mistral AI} } ```