Velocity vs. Endurance in Music: A Quantitative Analysis

The visualization above illustrates the relationship of a track’s max weekly velocity vs the amount of days it spends on the Spotify Top 200 weekly chart. The pattern is quite interesting and there’s alot of insights to unpack in just one graph; to avoid rabbitholes I will focus on the group of songs that indicate negative growth.

According to Chartmetric, velocity is the change in the rank today vs 7 days ago , divided by 7. So, it’s the average rank growth it had everyday in the past week. Velocity is an important KPI that measures a track’s growth. So if an unsigned artist releases a track that goes viral, naturally, major players in the music industry rush in to try and be the first ones to sign them. It can be an exciting time if you’re one of these artists. 347 Aiden is the latest breakout artist who eventually signed a deal with Columbia Records.

But how do some songs which make it to the Top 200 immediately start slowing down? Such as the case with tracks having a negative peak velocity?

One theory I have is that when big artists release a new record, they instantly get placed on the chart due to having a large fanbase who’ll listen to anything they release. Imagine if Eminem drops a new single: his most devout fans (31.8 million followers on Spotify) would probably trip the alarms in the algorithm and push the record up the chart as the single sees a surge of streams in its early days. However, if the song isn’t good, for one reason or another, it’ll lose people’s interest almost immediately. Speaking of the devil, Eminem released “Unaccomodating” on January 17th, 2020; 6 days later he landed at #20 on Spotify’s Top 200 with nearly 6 million streams in one week! Unfortunately, the track only spent 2 days on the chart before it quickly faded away from the mainstream.

One of the interesting insights shown above is that it's easier for an artist to achieve a higher ranking if they have more listeners, regardless of the song quality.
One of the interesting insights shown above is that it’s easier for an artist to achieve a higher ranking if they have more listeners, regardless of the song quality.
Regardless of the size of your listenership, however, your loyal fans can't keep you on the chart for very long. The exception being Panic At The Disco's "High Hopes" which is finishing out its wave of success from 2019.
Regardless of the size of your listenership, however, your loyal fans can’t keep you on the chart for very long. The exception being Panic At The Disco’s “High Hopes” which is finishing out its wave of success from 2019.

In order to test this theory, we need to define our two groups. Group A will be artists who fall in the 25th percentile of listeners in January 2020(<= 3.2 million); Group B will be the more established artists who are above that range(3.2 million – 65 million listeners). Now that we have defined our groups, let’s clearly state our null hypothesis so we know what we are testing. The median peak weekly rank of Group A is equal to the median peak weekly rank of Group B.

Traditionally, 5% is considered the standard significance level, so we’ll use that as our threshold. If the p-value ends up being less than .05, it means that the probability that our observed values isn’t due to chance, therefore rejecting the null hypothesis. However, if the p-value ends up being greater than .05, we fail to reject the null hypothesis. Although I must emphasize that the latter statement doesn’t necessarily mean that the two groups are equal, it just means that there isn’t enough evidence to claim that it isn’t true. Further analysis would require more data and possibly additional features to identify the root cause, but that’s beyond the scope of this post.

Group A’s Median Peak Rank: 94

Group B’s Median Peak Rank: 33


Statistical Tests

Deciding on which statistical test to use depends on a variety of factors such as distribution type, sample size, and the number of groups that you’re comparing. In this case, we will use the Two Sample t-Test, because the data has the following properties:

  • Data is parametric, the peak rank distribution is normalized (p-value is less than the significance level)
  • We have a categorical independent variable with two categories

Scikit-learn is a very useful machine learning library for Python. One of the many tools it has is a stats module, from which we’ll import the code to run the T-test.

Since the p-value is greater than .05, we fail to reject the null hypothesis. Considering, however, our sample is only made up of 56 observations and the p-value is very close to the significance level, I’m going to air on the side of caution and say that we need more data in order to confirm the results.


Insights

As this post is my first deep analysis published on behalf of Bull Analytics, I want to take this opportunity to set expectations for each article moving forward. I realize that there are many well-respected blogs out there that serve a similar purpose. So in order to not waste your time as a reader and someone who as stuff to do, I will strive to end each post with key-takeaways that you can then apply to your own work.

  • Artists with higher listenership are more likely to chart higher, regardless of the song being “good”
  • Records with high growth rates tend to spend less time on the charts than records with a peak velocity between 0 and 10
  • Even though the size of your fans can help push your record towards the top, they can’t hold you up forever. This insights reaffirms the notion that you need to be able to grow your audience beyond the so-called “loyalists” in order to increase streams.

Future Analysis

Listenership [predictor variable] isn’t the only independent variable that has an affect on the time spent on the charts [outcome variable]. Paraphrasing one of my favorite scientific influencers Hank Green, you can pretty much say that about most events in life, because there are very few outcomes that operate on a binary scale. If you actually want to understand the world, you can’t think in yes or no, you have to think in likely or unlikely, hence the importance of using statistics to draw logical conclusions from your data.

@hankgreen1

If you actually want to understand the world, you can’t think in yes or no, you have to think in likely or unlikely.

♬ original sound – Hank Green

Now that we’ve analyzed one variable’s relationship with time, I’ll leave you with this question: what other factors can contribute to our dependent variable? Here are a few ideas of my own:

  • # of social media accounts
  • Social media engagement
  • Average Listener-Share Per User (ALPU)
  • Demographic distribution (which demographic is most engaged? Do artists who have a younger engaged audience perform better than artists with a more older engaged audience
  • Money spent on advertising

I can go on and on with this list but I’d love to hear from you in the comment section below!


If you found this type of quantiative interesting, make sure you subscribe to my newsletter. I’ll be updating this blog on a weekly basis with similar content.

You can also follow me on twitter for bite-sized insights.

Leave a Reply

Your email address will not be published. Required fields are marked *