Attune Song Tower v1

Contrastive MLP that embeds Spotify tracks into a shared 128-d L2-normalized space for two-tower music retrieval.

Architecture

Layer Size
Input 26-d (12 key one-hot + 14 audio scalars)
Hidden 256-d + ReLU + Dropout(0.1)
Output 128-d + L2-norm

Loss: InfoNCE (symmetric) with in-batch negatives (batch 512, ฯ„=0.07).
Dataset: maharshipandya/spotify-tracks-dataset (~114k tracks).

Files

File Description
song_tower_v1.pt TorchScript export โ€” use for inference/RunPod
song_tower_best.pt Full training checkpoint โ€” resume training
song_embeddings_v1.npy Pre-computed 128-d embeddings for all tracks
song_ids_v1.npy Parallel Spotify track ID array
faiss_song_v1.index FAISS IndexFlatIP โ€” ANN search index
config.json Model metadata

Quick start

import torch
import numpy as np

model = torch.jit.load("song_tower_v1.pt")
model.eval()

# 26-d feature vector (see config.json for spec)
x = torch.zeros(1, 26)
with torch.no_grad():
    emb = model(x)  # (1, 128)

Feature vector spec (26-d)

[0โ€“11]  key one-hot (C=0 โ€ฆ B=11; -1/missing โ†’ all zeros)
[12]    danceability
[13]    energy
[14]    speechiness
[15]    acousticness
[16]    instrumentalness
[17]    liveness
[18]    valence
[19]    norm_loudness = clamp((loudness + 60) / 60, 0, 1)
[20]    tempo_norm = clamp(tempo / 240, 0, 1)
[21]    mode (0 or 1)
[22]    explicit (0 or 1)
[23]    popularity_norm = popularity / 100
[24]    duration_norm = min(duration_ms / 330000, 1)
[25]    time_signature_norm = clamp(time_signature / 7, 0, 1)
Downloads last month
96
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support