Attune Song Tower v1
Contrastive MLP that embeds Spotify tracks into a shared 128-d L2-normalized space for two-tower music retrieval.
Architecture
| Layer | Size |
|---|---|
| Input | 26-d (12 key one-hot + 14 audio scalars) |
| Hidden | 256-d + ReLU + Dropout(0.1) |
| Output | 128-d + L2-norm |
Loss: InfoNCE (symmetric) with in-batch negatives (batch 512, ฯ=0.07).
Dataset: maharshipandya/spotify-tracks-dataset (~114k tracks).
Files
| File | Description |
|---|---|
song_tower_v1.pt |
TorchScript export โ use for inference/RunPod |
song_tower_best.pt |
Full training checkpoint โ resume training |
song_embeddings_v1.npy |
Pre-computed 128-d embeddings for all tracks |
song_ids_v1.npy |
Parallel Spotify track ID array |
faiss_song_v1.index |
FAISS IndexFlatIP โ ANN search index |
config.json |
Model metadata |
Quick start
import torch
import numpy as np
model = torch.jit.load("song_tower_v1.pt")
model.eval()
# 26-d feature vector (see config.json for spec)
x = torch.zeros(1, 26)
with torch.no_grad():
emb = model(x) # (1, 128)
Feature vector spec (26-d)
[0โ11] key one-hot (C=0 โฆ B=11; -1/missing โ all zeros)
[12] danceability
[13] energy
[14] speechiness
[15] acousticness
[16] instrumentalness
[17] liveness
[18] valence
[19] norm_loudness = clamp((loudness + 60) / 60, 0, 1)
[20] tempo_norm = clamp(tempo / 240, 0, 1)
[21] mode (0 or 1)
[22] explicit (0 or 1)
[23] popularity_norm = popularity / 100
[24] duration_norm = min(duration_ms / 330000, 1)
[25] time_signature_norm = clamp(time_signature / 7, 0, 1)
- Downloads last month
- 96
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support