DESCRIPTION.md

Llama 2 13B Chat - GPTQ

Model creator: Meta Llama 2
Original model: Llama 2 13B Chat

This repository contains GPTQ model files for Meta's Llama 2 13B-chat. Multiple GPTQ parameter permutations are provided; see the provided files below for details of the options, their parameters, and the software used to create them.

GPTQ

GPTQ is a technique used to optimise the performance of large language models by reducing their precision. This quantisation process decreases the model size and the computational resources required for inference, enabling faster and more efficient deployment without significantly compromising the model's accuracy. By providing multiple quantisation parameter options, GPTQ allows users to choose the best configuration based on their hardware capabilities and specific needs.

Provided Files and GPTQ Parameters

Multiple quantisation parameters are provided to allow you to choose the best one for your hardware and requirements.

Explanation of GPTQ Parameters:

Branch	Bits	GS	Act Order	Damp %	GPTQ Dataset	Seq Len	Size	ExLlama	Desc
main	4	128	No	0.01	wikitext	4096	7.26 GB	Yes	4-bit, without Act Order and group size 128g.
gptq-4bit-32g-actorder_True	4	32	Yes	0.01	wikitext	4096	8.00 GB	Yes	4-bit, with Act Order and group size 32g. Highest inference quality.
gptq-4bit-64g-actorder_True	4	64	Yes	0.01	wikitext	4096	7.51 GB	Yes	4-bit, with Act Order and group size 64g. Less VRAM, slightly lower accuracy.
gptq-4bit-128g-actorder_True	4	128	Yes	0.01	wikitext	4096	7.26 GB	Yes	4-bit, with Act Order and group size 128g. Less VRAM, slightly lower accuracy.
gptq-8bit-128g-actorder_True	8	128	Yes	0.01	wikitext	4096	13.65 GB	No	8-bit, with group size 128g for higher inference quality and with Act Order for even higher accuracy.
gptq-8bit-64g-actorder_True	8	64	Yes	0.01	wikitext	4096	13.95 GB	No	8-bit, with group size 64g and Act Order for even higher inference quality.
gptq-8bit-128g-actorder_False	8	128	No	0.01	wikitext	4096	13.65 GB	No	8-bit, with group size 128g for higher inference quality without Act Order to improve AutoGPTQ speed.
gptq-8bit--1g-actorder_True	8	None	Yes	0.01	wikitext	4096	13.36 GB	No	8-bit, with Act Order. No group size, to lower VRAM requirements.

Best Use Cases for Llama 2 13B Chat Model

Llama 2 models are decoder-only models that excel in generating coherent, contextually relevant text. The 13B Chat model is particularly optimized for dialogue and conversational AI applications. Key use cases include:

Customer Support: Automated responses to common customer queries.
Virtual Assistants: Providing helpful and contextually aware information.
Content Creation: Assisting in drafting articles, emails, and other text content.
Educational Tools: Offering tutoring and answering questions in educational platforms.

The model's architecture and training make it well-suited for any task requiring understanding and generating human-like text, especially in interactive and real-time applications. The fine-tuned chat models outperform many open-source chat models and are competitive with some popular closed-source models in terms of helpfulness and safety.

Credit

This model is quantised by TheBloke.

TheBloke/Llama-2-13B-chat-GPTQ

DESCRIPTION.md

Llama 2 13B Chat - GPTQ

Description

GPTQ

Provided Files and GPTQ Parameters

Best Use Cases for Llama 2 13B Chat Model

Credit

Quick Links

Links

Join our community

INTELLITHING Offices

INTELLITHING Offices