Model creator: Mistral AI
Original model: Mistral 7B Instruct v0.2
This repository contains AWQ model files for Mistral AI's Mistral 7B Instruct v0.2.
AWQ is an efficient, accurate, and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings. AWQ models are supported on Linux and Windows with NVIDIA GPUs. macOS users should use GGUF models instead.
Currently, only 128g GEMM models are released. The addition of group_size 32 models and GEMV kernel models is being actively considered. Models are released as sharded safetensors files.
Explanation of AWQ Parameters:
Branch | Bits | GS | AWQ Dataset | Seq Len | Size |
---|---|---|---|---|---|
main | 4 | 128 | VMware Open Instruct | 4096 | 4.15 GB |
Mistral 7B Instruct v0.2 models are decoder-only and are optimised for various natural language understanding and generation tasks. The AWQ quantization enhances their efficiency, making them suitable for:
The model's architecture and training make it well-suited for tasks requiring understanding and generating human-like text, especially in interactive and real-time applications.
This model is quantised by TheBloke.