[This post is courtesy of Ryan Smith, dot net user @ryansmith534, a data scientist formerly at Spotify.Thank you, Ryan! -Ed.]
Every Phish fan undoubtedly has their own answer to this question – but is there a universal truth across all fans? Using setlist data and user ratings from Phish.net, we can attempt to answer this question empirically.
To do this, we can borrow methodology from basketball and hockey analytics, specifically the concept of RAPM (regularized adjusted plus-minus). This metric attempts to quantify an answer to the question: how much does the presence of a given player on the court contribute to a team’s point differential? In our case, the question becomes: how much does the presence of a given song in a setlist contribute to a show’s rating on Phish.net?
We first need to gather the necessary data, a process made significantly easier because of the convenience of the Phish.net API. After doing a bunch of cleaning and manipulation, we get a dataset that looks like this:
We have one row for every show, a column with the show’s rating, and a column for every song in Phish’s repertoire – with a 0 or 1 value representing whether the song was played at a given show.
Phish.net is a non-commercial project run by Phish fans and for Phish fans under the auspices of the all-volunteer, non-profit Mockingbird Foundation.
This project serves to compile, preserve, and protect encyclopedic information about Phish and their music.
Credits | Terms Of Use | Legal | DMCA
The Mockingbird Foundation is a non-profit organization founded by Phish fans in 1996 to generate charitable proceeds from the Phish community.
And since we're entirely volunteer – with no office, salaries, or paid staff – administrative costs are less than 2% of revenues! So far, we've distributed over $2 million to support music education for children – hundreds of grants in all 50 states, with more on the way.