--- license: mit tags: - sentiment-analysis - text-classification - openai-embeddings - pytorch pipeline_tag: text-classification library_name: transformers --- # TextEmbedding3SmallSentimentHead In case you needed a sentiment analysis classifier on top of embeddings from OpenAI embeddings model. ## Model Description - **What this is**: A compact PyTorch classifier head trained on top of `text-embedding-3-small` (1536-dim) to predict sentiment: negative, neutral, positive. - **Data**: Preprocessed from the [Kaggle Sentiment Analysis Dataset](https://www.kaggle.com/datasets/abhi8923shriv/sentiment-analysis-dataset). - **Metrics (val)**: **F1 macro ≈ 0.89**, **Accuracy ≈ 0.89** on a held-out validation split. - **Architecture**: Simple MLP head (256 hidden units, dropout 0.2), trained for 5 epochs with Adam. ## Input/Output - **Input**: Float32 tensor of shape `[batch, 1536]` (OpenAI text-embedding-3-small embeddings). - **Output**: Logits over 3 classes. Argmax → {0: negative, 1: neutral, 2: positive}. ## Usage ```python from transformers import AutoModel import torch # Load model model = AutoModel.from_pretrained( "marcovise/TextEmbedding3SmallSentimentHead", trust_remote_code=True ).eval() # Your 1536-dim OpenAI embeddings embeddings = torch.randn(4, 1536) # batch of 4 examples # Predict sentiment with torch.no_grad(): logits = model(inputs_embeds=embeddings)["logits"] # [batch, 3] predictions = logits.argmax(dim=1) # [batch] # 0=negative, 1=neutral, 2=positive print(predictions) # tensor([1, 0, 2, 1]) ``` ## Training Details - **Training data**: Kaggle Sentiment Analysis Dataset - **Preprocessing**: Text → OpenAI embeddings → 3-class labels {negative: 0.0, neutral: 0.5, positive: 1.0} - **Architecture**: 1536 → 256 → ReLU → Dropout(0.2) → 3 classes - **Optimizer**: Adam (lr=1e-3, weight_decay=1e-4) - **Loss**: CrossEntropyLoss with label smoothing (0.05) - **Epochs**: 5 ## Intended Use - Quick, lightweight sentiment classification for short text once embeddings are available. - Works well for general sentiment analysis tasks similar to the training distribution. ## Limitations - Trained on a specific sentiment dataset; may have domain bias. - Requires OpenAI text-embedding-3-small embeddings as input. - Not safety-critical; evaluate before production use. - May reflect biases present in the training data. ## License MIT