TheBloke/Mistral-7B-Instruct-v0.2-GPTQ

  • Modal Card
  • Files & version

I apologize for the oversight. Here is the updated

DESCRIPTION.md
including the use cases, details about GPTQ, and information about the model's architecture:

DESCRIPTION.md

Mistral 7B Instruct v0.2 - GPTQ

Model creator: Mistral AI
Original model: Mistral 7B Instruct v0.2

Description

This repository contains GPTQ model files for Mistral AI's Mistral 7B Instruct v0.2. Multiple GPTQ parameter permutations are provided; see the provided files below for details of the options, their parameters, and the software used to create them. These files were quantized using hardware kindly provided by Massed Compute.

Explanation of GPTQ Parameters:

BranchBitsGSAct OrderDamp %GPTQ DatasetSeq LenSizeExLlamaDesc
main4128Yes0.1VMware Open Instruct40964.16 GBYes4-bit, with Act Order and group size 128g. Uses less VRAM than 64g, slightly lower accuracy.
gptq-4bit-32g-actorder_True432Yes0.1VMware Open Instruct40964.57 GBYes4-bit, with Act Order and group size 32g. Highest possible inference quality, maximum VRAM usage.
gptq-8bit--1g-actorder_True8NoneYes0.1VMware Open Instruct40967.52 GBNo8-bit, with Act Order. No group size, to lower VRAM requirements.
gptq-8bit-128g-actorder_True8128Yes0.1VMware Open Instruct40967.68 GBNo8-bit, with group size 128g for higher inference quality and with Act Order for even higher accuracy.
gptq-8bit-32g-actorder_True832Yes0.1VMware Open Instruct40968.17 GBNo8-bit, with group size 32g and Act Order for maximum inference quality.
gptq-4bit-64g-actorder_True464Yes0.1VMware Open Instruct40964.29 GBYes4-bit, with Act Order and group size 64g. Uses less VRAM than 32g, but with slightly lower accuracy.

Best Use Cases for Mistral 7B Instruct v0.2 Model

Mistral 7B Instruct v0.2 models are optimized for various natural language understanding and generation tasks. The GPTQ quantisation enhances their efficiency, making them suitable for:

  • Interactive Applications: Building chatbots and virtual assistants.
  • Content Generation: Assisting in drafting text content like articles, reports, and stories.
  • Customer Support: Providing automated responses to user queries.
  • Educational Tools: Offering tutoring and answering questions in educational platforms.

The model's architecture and training make it well-suited for tasks requiring understanding and generating human-like text, especially in interactive and real-time applications.

Model Architecture

Mistral 7B Instruct v0.2 is a decoder-only model based on an optimized transformer architecture. It uses Grouped-Query Attention (GQA) and Sliding-Window Attention to enhance performance. The model also employs a Byte-Fallback BPE tokenizer, making it efficient for various text generation tasks. The model has been fine-tuned for instruction-based tasks, making it ideal for generating informative and contextually relevant responses.

About GPTQ

GPTQ (Quantization Parameter Tuning for Generative Pre-trained Transformers) is a technique used to optimize the performance of large language models by reducing their precision. This quantization process decreases the model size and the computational resources required for inference, enabling faster and more efficient deployment without significantly compromising the model's accuracy. GPTQ provides multiple quantization parameter options, allowing users to choose the best configuration based on their hardware capabilities and specific needs.

Credit

This model is quantised by TheBloke