Instructions to use pt-sk/roberta_toxic_classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use pt-sk/roberta_toxic_classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="pt-sk/roberta_toxic_classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("pt-sk/roberta_toxic_classifier") model = AutoModelForSequenceClassification.from_pretrained("pt-sk/roberta_toxic_classifier") - Notebooks
- Google Colab
- Kaggle
metadata
language:
- en
tags:
- toxic comments classification
licenses:
- cc-by-nc-sa
Toxicity Classification Model
This model is trained for toxicity classification task. The dataset used for training is the merge of the English parts of the three datasets by Jigsaw (Jigsaw 2018, Jigsaw 2019, Jigsaw 2020), containing around 2 million examples. We split it into two parts and fine-tune a RoBERTa model (RoBERTa: A Robustly Optimized BERT Pretraining Approach) on it. The classifiers perform closely on the test set of the first Jigsaw competition, reaching the AUC-ROC of 0.98 and F1-score of 0.76.
How to use
from transformers import RobertaTokenizer, RobertaForSequenceClassification
# load tokenizer and model weights
tokenizer = RobertaTokenizer.from_pretrained('pt-sk/roberta_toxic_classifier')
model = RobertaForSequenceClassification.from_pretrained('pt-sk/roberta_toxic_classifier')
# prepare the input
batch = tokenizer.encode('you are amazing', return_tensors='pt')
# inference
model(batch)
Licensing Information
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
