TheBloke/Mistral-7B-Instruct-v0.2-AWQ

  • Modal Card
  • Files & version

DESCRIPTION.md

Mistral 7B Instruct v0.2 - AWQ

Model creator: Mistral AI
Original model: Mistral 7B Instruct v0.2

Description

This repository contains AWQ model files for Mistral AI's Mistral 7B Instruct v0.2.

About AWQ

AWQ is an efficient, accurate, and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings. AWQ models are supported on Linux and Windows with NVIDIA GPUs. macOS users should use GGUF models instead.

Provided Files and AWQ Parameters

Currently, only 128g GEMM models are released. The addition of group_size 32 models and GEMV kernel models is being actively considered. Models are released as sharded safetensors files.

Explanation of AWQ Parameters:

BranchBitsGSAWQ DatasetSeq LenSize
main4128VMware Open Instruct40964.15 GB

Best Use Cases for Mistral 7B Instruct v0.2 Model

Mistral 7B Instruct v0.2 models are decoder-only and are optimised for various natural language understanding and generation tasks. The AWQ quantization enhances their efficiency, making them suitable for:

  • Interactive Applications: Building chatbots and virtual assistants.
  • Content Generation: Assisting in drafting text content like articles, reports, and stories.
  • Customer Support: Providing automated responses to user queries.
  • Educational Tools: Offering tutoring and answering questions in educational platforms.

The model's architecture and training make it well-suited for tasks requiring understanding and generating human-like text, especially in interactive and real-time applications.

Credit

This model is quantised by TheBloke.