Instructions to use allenai/scibert_scivocab_cased with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use allenai/scibert_scivocab_cased with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("allenai/scibert_scivocab_cased", dtype="auto") - Notebooks
- Google Colab
- Kaggle
SciBERT
This is the pretrained model presented in SciBERT: A Pretrained Language Model for Scientific Text, which is a BERT model trained on scientific text.
The training corpus was papers taken from Semantic Scholar. Corpus size is 1.14M papers, 3.1B tokens. We use the full text of the papers in training, not just abstracts.
SciBERT has its own wordpiece vocabulary (scivocab) that's built to best match the training corpus. We trained cased and uncased versions.
Available models include:
scibert_scivocab_casedscibert_scivocab_uncased
The original repo can be found here.
If using these models, please cite the following paper:
@inproceedings{beltagy-etal-2019-scibert,
title = "SciBERT: A Pretrained Language Model for Scientific Text",
author = "Beltagy, Iz and Lo, Kyle and Cohan, Arman",
booktitle = "EMNLP",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/D19-1371"
}
- Downloads last month
- 14,303
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support