SI-206 · Strategic Data Investigation·Winter 2024 · Team of 4 (Coding Queens)

Three platforms, one song, three different stories of "popular."

We built a Python pipeline pulling chart data from Spotify, Billboard, and Deezer into a single SQLite database — then used it to answer a question artists, A&R teams, and product managers all quietly disagree about: what does popular actually mean?

My role

Team build · Insight synthesis · Strategic interpretation

Team

Coding Queens — 4 collaborators across data ingestion, schema, and analysis

Stack

Python · BeautifulSoup · REST APIs · SQLite · matplotlib

↓ THE INVESTIGATION

If three platforms each call a different song "#1," whose chart is the truth?

Chart data isn't neutral. Each platform encodes its business model into how it ranks songs — Spotify rewards stream velocity, Billboard weighs sales and radio, Deezer surfaces what its (largely European) listeners replay. We built the pipeline to make those biases visible, then asked the harder question a product team would care about: what should you do with that?

Spotify

Web scraping · kworb.net

Stream-driven. Reflects active listening, weighted toward younger US/global audiences.

Billboard Hot 100

Billboard API

Industry-standard. Blends sales, radio, and streams — slower-moving, more durable.

Deezer

Deezer REST API

EU-skewed listener base. Reveals what global audiences (outside US algorithms) play.

3

Data sources unified

300+

Chart entries normalized

1

SQLite schema, 6 tables

3

Strategic findings

↓ THE METHOD

Build the pipeline. Then ignore it. The point was never the code.

The technical work was a means: a clean, joinable dataset across three platforms with no repeating strings and no duplicate IDs. Once that existed, the real work began — sitting with the data and asking what a label exec, an artist's manager, or a Spotify PM would want it to tell them.

01

Pivots in ingestion

Spotipy didn't expose chart data → moved to BeautifulSoup against kworb.net. YouTube Music API capped at 40 songs → swapped for Deezer to hit the 100-row floor.

02

Schema as a normalizer

Six tables, including artist-ID lookup tables for each platform — so "Taylor Swift" could be one entity across Spotify, Billboard, and Deezer joins.

03

From SQL to story

Three calculations: artists by song count, top-10 cross-platform position, cumulative weeks on Billboard. Each one was chosen to surface a different definition of "popular."

↓ FINDING 01 · VISIBILITY ≠ ENDURANCE

Taylor Swift looks dominant. Morgan Wallen actually is.

On a snapshot of "songs currently charting," Taylor Swift had 34 entries to Morgan Wallen's 9 — a 4× lead. But measured by cumulative weeks on Billboard, Wallen had 462 weeks to Swift's 157. A near 3× reversal. The same data, two completely different stories about who's "winning."

Snapshot — songs on charts now

Spotify + Billboard + Deezer, current week

Taylor Swift34
Simeon Views10
Morgan Wallen9
SZA6
Chappell Roan6
Benson Boone5

Endurance — cumulative weeks on Billboard

Across all charting songs by artist

Morgan Wallen462
Zach Bryan216
SZA159
Taylor Swift157
Tyler, The Creator147
Noah Kahan107
So what?

Recency metrics flatter pop. Endurance metrics reward country and rap.

If you're a label deciding who to re-sign, a streaming PM ranking artists for Wrapped, or a brand picking a partner — "songs on the chart this week" will systematically over-credit pop release cycles and under-credit catalog artists. The metric you choose is a strategic decision, not a neutral one.

↓ FINDING 02 · POSITION IS PLATFORM-DEPENDENT

Spotify's top 10 mostly didn't even crack Billboard or Deezer's top 10.

We took Spotify's top 10 songs (the week of the Tortured Poets Department release) and looked up where they sat on the other two charts. Most landed at 11+ — outside the visible top 10 — on Billboard and Deezer. The same songs, ranked by different audiences and different formulas, told a story that was barely overlapping.

Spotify's top 10 — where they ranked elsewhere

Lower is better · 11 = outside the top 10

124681011+SpotifyBillboardDeezer
Fortnight
Down Bad
I Can Do It With a Broken Heart
So Long, London
Tortured Poets Department
Espresso
Million Dollar Baby
Beautiful Things
So what?

A 'chart-topper' in one universe is a mid-tier song in another.

For an A&R team scouting talent: don't triangulate from a single platform. For a product team building discovery: cross-platform signals reveal artists with durable global appeal vs. ones riding a single ecosystem's algorithm. For an artist negotiating a deal: which chart is in your contract changes what "success" earns you.

↓ FINDING 03 · THE LONG TAIL HAS A SHORT NECK

10 artists hold ~40% of the top-100 chart real estate across three platforms.

When we counted every song on every chart and grouped by artist, a small handful of names occupied a disproportionate share of available spots. The "discovery" promise of streaming, on this snapshot, looked more like consolidation.

90SONGS

Top 10 artists by chart appearances

Across Spotify · Billboard · Deezer · current week

Taylor Swift
34 · 38%
Simeon Views
10 · 11%
Morgan Wallen
9 · 10%
SZA
6 · 7%
Chappell Roan
6 · 7%
Benson Boone
5 · 6%
Ariana Grande
5 · 6%
Zach Bryan
5 · 6%
Luke Combs
5 · 6%
Olivia Rodrigo
5 · 6%
So what?

Chart visibility is a winner-take-most market. Design accordingly.

For a streaming product: if your homepage mirrors the charts, you're reinforcing concentration — not solving for discovery. For a label: shelf space is more contested than catalog size suggests. For a policy or research lens: "democratized distribution" hasn't (yet) produced democratized attention.

↓ STRATEGIC IMPLICATIONS

What a product or strategy team should actually do with this.

For product managers

Stop shipping single-source rankings.

Whatever "trending" surface you build will inherit the bias of the chart you pull from. Blend at least two definitions — velocity + endurance — and let users toggle. Transparency is the feature.

For strategists & A&R

Treat 'chart position' as a metric family, not a number.

Ask which chart, which window, which weighting. A snapshot win and a 200-week catalog win signal completely different artist trajectories — and warrant completely different investment.

For researchers & analysts

Pipeline first, then narrative.

The unglamorous work — normalized IDs, joinable schemas, deduped strings — is what made the 'so what?' possible. Without it, you're comparing three apples that turn out to be three different fruits.

↓ REFLECTION

The brief said "collect data." The work was learning to make it mean something.

What I took from SI-206 wasn't the Python — it was the discipline of refusing to stop at "here is a chart." The visualization is the easy part. The hard part is sitting with three numbers that disagree and figuring out which decision they should change. That's the muscle I bring to every research project now.