Model creator: Qwen AI
Original model: Qwen2 Series (0.5B to 72B parameters)
Qwen2 is a new series of large language models, available in sizes ranging from 0.5B to 72B parameters, including a Mixture-of-Experts (MoE) model. This repository contains the Qwen2-1.5B-Instruct model, which is fine-tuned for instruction-based tasks.
Compared to other open-source language models, including the previous Qwen1.5 models, Qwen2 models demonstrate superior performance in benchmarks related to language understanding, generation, multilingual capabilities, coding, mathematics, and reasoning, often outperforming proprietary models.
Version | Description |
---|---|
v0.1 | Initial release of Qwen2 models, including multiple base and instruction-tuned models across different sizes. The models are pretrained and post-trained with supervised fine-tuning and Direct Preference Optimization (DPO) techniques. |
Qwen2-1.5B-Instruct is optimized for a variety of use cases, including:
These models are fine-tuned to follow instructions closely, making them suitable for AI assistants and other instruction-based applications.
Qwen2 models are based on the Transformer architecture and feature various improvements such as SwiGLU activation, attention QKV bias, and grouped query attention (GQA). The models also employ an advanced tokenizer adaptive to multiple languages and capable of handling both natural language and code with high efficiency.