---
license: mit
tags:
- sentiment-analysis
- text-classification
- openai-embeddings
- pytorch
pipeline_tag: text-classification
library_name: transformers
---

# TextEmbedding3SmallSentimentHead

In case you needed a sentiment analysis classifier on top of embeddings from OpenAI embeddings model.

## Model Description

- **What this is**: A compact PyTorch classifier head trained on top of `text-embedding-3-small` (1536-dim) to predict sentiment: negative, neutral, positive.
- **Data**: Preprocessed from the [Kaggle Sentiment Analysis Dataset](https://www.kaggle.com/datasets/abhi8923shriv/sentiment-analysis-dataset).
- **Metrics (val)**: **F1 macro ≈ 0.89**, **Accuracy ≈ 0.89** on a held-out validation split.
- **Architecture**: Simple MLP head (256 hidden units, dropout 0.2), trained for 5 epochs with Adam.

## Input/Output

- **Input**: Float32 tensor of shape `[batch, 1536]` (OpenAI text-embedding-3-small embeddings).
- **Output**: Logits over 3 classes. Argmax → {0: negative, 1: neutral, 2: positive}.

## Usage

```python
from transformers import AutoModel
import torch

# Load model
model = AutoModel.from_pretrained(
    "marcovise/TextEmbedding3SmallSentimentHead", 
    trust_remote_code=True
).eval()

# Your 1536-dim OpenAI embeddings
embeddings = torch.randn(4, 1536)  # batch of 4 examples

# Predict sentiment
with torch.no_grad():
    logits = model(inputs_embeds=embeddings)["logits"]  # [batch, 3]
    predictions = logits.argmax(dim=1)  # [batch]
    # 0=negative, 1=neutral, 2=positive

print(predictions)  # tensor([1, 0, 2, 1])
```

## Training Details

- **Training data**: Kaggle Sentiment Analysis Dataset
- **Preprocessing**: Text → OpenAI embeddings → 3-class labels {negative: 0.0, neutral: 0.5, positive: 1.0}
- **Architecture**: 1536 → 256 → ReLU → Dropout(0.2) → 3 classes
- **Optimizer**: Adam (lr=1e-3, weight_decay=1e-4)
- **Loss**: CrossEntropyLoss with label smoothing (0.05)
- **Epochs**: 5

## Intended Use

- Quick, lightweight sentiment classification for short text once embeddings are available.
- Works well for general sentiment analysis tasks similar to the training distribution.

## Limitations

- Trained on a specific sentiment dataset; may have domain bias.
- Requires OpenAI text-embedding-3-small embeddings as input.
- Not safety-critical; evaluate before production use.
- May reflect biases present in the training data.

## License

MIT