Attune Song Tower v1

Contrastive MLP that embeds Spotify tracks into a shared 128-d L2-normalized space for two-tower music retrieval.

Architecture

Layer	Size
Input	26-d (12 key one-hot + 14 audio scalars)
Hidden	256-d + ReLU + Dropout(0.1)
Output	128-d + L2-norm

Loss: InfoNCE (symmetric) with in-batch negatives (batch 512, τ=0.07).
Dataset: maharshipandya/spotify-tracks-dataset (~114k tracks).

Files

File	Description
`song_tower_v1.pt`	TorchScript export — use for inference/RunPod
`song_tower_best.pt`	Full training checkpoint — resume training
`song_embeddings_v1.npy`	Pre-computed 128-d embeddings for all tracks
`song_ids_v1.npy`	Parallel Spotify track ID array
`faiss_song_v1.index`	FAISS IndexFlatIP — ANN search index
`config.json`	Model metadata

Quick start

import torch
import numpy as np

model = torch.jit.load("song_tower_v1.pt")
model.eval()

# 26-d feature vector (see config.json for spec)
x = torch.zeros(1, 26)
with torch.no_grad():
    emb = model(x)  # (1, 128)

Feature vector spec (26-d)

[0–11]  key one-hot (C=0 … B=11; -1/missing → all zeros)
[12]    danceability
[13]    energy
[14]    speechiness
[15]    acousticness
[16]    instrumentalness
[17]    liveness
[18]    valence
[19]    norm_loudness = clamp((loudness + 60) / 60, 0, 1)
[20]    tempo_norm = clamp(tempo / 240, 0, 1)
[21]    mode (0 or 1)
[22]    explicit (0 or 1)
[23]    popularity_norm = popularity / 100
[24]    duration_norm = min(duration_ms / 330000, 1)
[25]    time_signature_norm = clamp(time_signature / 7, 0, 1)

Downloads last month: 96

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support