prajjwal1/bert-mini

  • Modal Card
  • Files & version

DESCRIPTION.md

BERT Mini - Pytorch Pre-trained Model

Introduction

This is a Pytorch pre-trained model obtained from converting the Tensorflow checkpoint found in the official Google BERT repository. This model is one of the smaller pre-trained BERT variants, along with bert-small and bert-medium. They were introduced in the study "Well-Read Students Learn Better: On the Importance of Pre-training Compact Models" and ported to Hugging Face for the study "Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics". These models are intended to be trained on a downstream task.

Citation

If you use the model, please consider citing both papers:

@misc{bhargava2021generalization,
      title={Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics}, 
      author={Prajjwal Bhargava and Aleksandr Drozd and Anna Rogers},
      year={2021},
      eprint={2110.01518},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@article{DBLP:journals/corr/abs-1908-08962,
  author    = {Iulia Turc and Ming{-}Wei Chang and Kenton Lee and Kristina Toutanova},
  title     = {Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation},
  journal   = {CoRR},
  volume    = {abs/1908.08962},
  year      = {2019},
  url       = {http://arxiv.org/abs/1908.08962},
  eprinttype = {arXiv},
  eprint    = {1908.08962},
  timestamp = {Thu, 29 Aug 2019 16:32:34 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-1908-08962.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Model Architecture

The BERT Mini model is based on the transformer architecture. Specifically, it is a smaller variant of the original BERT model, which uses 4 layers (L=4) and 256 hidden units (H=256). This compact design allows for efficient training and inference while maintaining good performance on various natural language processing tasks. Bert models are encoder only models suitable for capturing contexual meaning in the text.

Best Use Cases for BERT Mini

BERT Mini models are optimized for various natural language understanding tasks. Their compact size makes them suitable for:

  • Text Classification: Classifying texts into predefined categories.
  • Named Entity Recognition (NER): Identifying and classifying named entities in text.
  • Question Answering: Building systems that can answer questions based on context.
  • Sentence Pair Classification: Tasks like natural language inference (NLI) where the relationship between two sentences is determined.

Credit

visit the GitHub repository. Follow @prajjwal_1 on Twitter for updates.