bibtex# Model Card for Mistral-7B-Instruct-v0.1 ## Table of Contents - [TL;DR](#tldr) - [Model Details](#model-details) - [Intended Use](#intended-use) - [Usage](#usage) - [Model Architecture](#model-architecture) - [Training Details](#training-details) - [Limitations](#limitations) - [Community](#community) - [Citation](#citation) ## TL;DR Mistral-7B-Instruct-v0.1 is an instruction-tuned model based on Mistral-7B-v0.1, designed for improved performance in conversational AI and assistant tasks. It provides robust handling of instruction-following prompts in dialogue. This model is accessible through both the Mistral and Hugging Face Transformers libraries. --- ## Model Details ### Model Information - **Model Type**: Instruction-tuned large language model - **Model Size**: 7.24B parameters - **Tensor Type**: BF16 - **Supported Context Length**: Up to 8k tokens - **License**: Open for community engagement and contributions ### Model Developers Developed by Mistral AI, with contributions from a diverse team including Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, and others. --- ## Intended Use ### Use Cases Mistral-7B-Instruct-v0.1 is suitable for: - **Conversational AI**: Structured, instruction-following dialogue generation. - **Educational Assistants**: Providing brief explanations or answering questions. - **Customer Support**: Assisting users with general queries in a conversational format. - **Knowledge Retrieval**: Generating concise responses for various inquiries. --- ## Model Architecture Mistral-7B-Instruct-v0.1 is based on the Mistral-7B architecture, with the following design features: - **Grouped-Query Attention**: Enhances model efficiency in handling attention queries. - **Sliding-Window Attention**: Improves handling of long input sequences. - **Byte-fallback BPE tokenizer**: Optimized for robust multilingual tokenization. --- ## Training Details ### Training Data Mistral-7B-Instruct-v0.1 is fine-tuned using various publicly available datasets, curated to enhance the model’s instruction-following and dialogue generation capabilities. For comprehensive details, refer to the release paper and blog post. ### Inference Inference can be performed via both Mistral’s proprietary library and Hugging Face Transformers. Mistral’s framework supports low-latency deployments. --- ## Limitations The model currently does not include moderation mechanisms. Users should exercise caution when deploying Mistral-7B-Instruct-v0.1 in applications that require strict content safety and moderation. --- ## Community Mistral AI invites community contributions, particularly for enhancing the alignment between Mistral’s tokenizer and Transformers. Contributions, including pull requests to refine the model, are encouraged. ### Known Issues and Troubleshooting - **Transformer Compatibility**: Users may encounter a `KeyError` related to the 'mistral' key in Transformers library. Updating to `transformers-v4.33.4` or later should resolve this. - **Tokenizer Alignment**: Contributions to improve tokenizer consistency between Mistral and Transformers are welcome. --- ## Citation If you use Mistral-7B-Instruct-v0.1 in your research, please cite:
@misc{mistralai2024mistral7b, author = {Mistral AI}, title = {Mistral-7B-Instruct-v0.1: Fine-tuned Large Language Model for Instruction Following}, year = {2024}, url = {https://github.com/mistralai/mistral-models}, publisher = {Mistral AI} } ```