Qwen/Qwen2-1.5B-Instruct

  • Modal Card
  • Files & version

Qwen2-1.5B-Instruct - Model Summary

Model creator: Qwen AI
Original model: Qwen2 Series (0.5B to 72B parameters)

Description

Qwen2 is a new series of large language models, available in sizes ranging from 0.5B to 72B parameters, including a Mixture-of-Experts (MoE) model. This repository contains the Qwen2-1.5B-Instruct model, which is fine-tuned for instruction-based tasks.

Compared to other open-source language models, including the previous Qwen1.5 models, Qwen2 models demonstrate superior performance in benchmarks related to language understanding, generation, multilingual capabilities, coding, mathematics, and reasoning, often outperforming proprietary models.

Changelog

VersionDescription
v0.1Initial release of Qwen2 models, including multiple base and instruction-tuned models across different sizes. The models are pretrained and post-trained with supervised fine-tuning and Direct Preference Optimization (DPO) techniques.

Best Use Cases for Qwen2-1.5B-Instruct Model

Qwen2-1.5B-Instruct is optimized for a variety of use cases, including:

  • Language Understanding & Generation: Handling general NLP tasks and responding accurately to instructions.
  • Multilingual Support: Performing in multiple languages, with an improved tokenizer for handling natural language and code.
  • Coding and Reasoning: Excels in code generation and mathematical reasoning tasks, suitable for technical and logical challenges.
  • Interactive Applications: Effective for chatbots, virtual assistants, and educational tools that require language comprehension.

These models are fine-tuned to follow instructions closely, making them suitable for AI assistants and other instruction-based applications.

Model Architecture

Qwen2 models are based on the Transformer architecture and feature various improvements such as SwiGLU activation, attention QKV bias, and grouped query attention (GQA). The models also employ an advanced tokenizer adaptive to multiple languages and capable of handling both natural language and code with high efficiency.