How to Leverage Spotify API + Genius Lyrics for Data Science Tasks in Python

Maaz Khan
The Startup
Published in
5 min readJan 21, 2021

--

Spotify has burgeoned into becoming the most popular music streaming platform in the world—passing the likes of: Apple Music, Pandora, Tidal, etc. The catalyst for the uptick in popularity stems from the robust catalog of Artists coalesced with its friendly UI and UX; along with the platform pushing Podcasts in 2021, only the sky seems to be the limit for what Spotify can achieve.

Due to its success, Spotify created an API for Python to allow users to access certain metadata from the platform, or to create applications using Spotify’s infrastructure. The data derived from the API includes information about songs, albums, playlists, etc. Akin to “Spotify Wrapped”, an analysis can be performed on users’ top artists, songs, albums, and genres.

This tutorial will focus on three objectives: connecting to the Spotify API, extrapolating album data from the API into a pandas dataframe, and attaching song lyrics to the dataframe from Genius via web scraping.

We’ll be using the latest version of Python (3.8.7).

Step 0: Install Libraries

Below are the libraries that will be needed to execute this tutorial. If you do not have these dependencies already installed, try: “conda install package_name” if you are using a Jupyter Notebook or “pip install package_name” otherwise.

Step 1: Connecting To Spotify API

All information regarding the Spotify API (Spotipy) can be found in the docs here. This tutorial will highlight some of the robust features Spotipy offers, such as capturing all the songs within an album as well as their unique features.

First, we need to register for a Spotify developer account which can be created for free here. After registering, you will have access to unique tokens which will allow for a seamless connection to Spotipy. Do not share these tokens with others as they are meant to be kept private. You will need a client id and secret key. Below is the code necessary to establish connection with Spotipy.

Step 2: Extrapolating Data From Spotify API

Once we have established connection to Spotipy, we need to create a function that will take all the songs from any given album and insert the relevant information into a pandas dataframe. We just need to capture the album URI which can be found by clicking on the three dots in Spotify.

Here is where an album URI is located on Spotify’s desktop application.

Below is the output of the function above. Notice how we now have the URI for each track within the album along with the track name, duration (milliseconds), explicit (boolean), and track number.

Running above function on ‘Blonde’ by Frank Ocean.

Next, we need to create a function that will take a dataframe of all the songs in the album we want to perform our analysis on (this is the output of the above function) and attach features such as danceability, energy, key, and loudness per track. This can be performed by taking the URI of each song.

Output of running the get_track_info function.

Next, we need to merge the data frames together. This can easily be done with the function below. This method can also be performed manually without the use of a custom function.

Step 3: Attaching Song Lyrics From Genius.com

Lastly, we will be web scrapping Genius to attach the lyrics of the songs to our dataframe. We will be utilizing the beautiful soup library to achieve our goal. Because this is a tutorial on the Spotify API more so than web scrapping, I have attached a video tutorial here that goes more in depth with the beautiful soup library. It is a robust topic that requires a tutorial on its own… Luckily, we only need to utilize a few lines to capture our song lyrics.

Function one (scrape_lyrics) is executed in function two (lyrics_onto_frame)… So as long as you have run the code, everything should work by only using the second function.

Note: Song titles and artists names with special characters (+,-, *, ~, etc.) will not be properly scrapped.

Summary

Here is a summary of all the functions we defined above in a working example. Not as daunting as it looks… To recap: we ran a function to get all the songs from our desired album via the Spotify URI into a dataframe. Then, we repeated the process but, instead, created a function to attach metadata of all the songs in our desired album to our dataframe. Lastly, we created a function to scrape the lyrics of all the songs in our album and attached it to our pandas dataframe.

Step 4: Song Popularity (Extra Credit)

If you want the popularity of the songs, there is an extra function I have provided to do just that. Unfortunately, there is no other way to do this other than this method but if you do run into an alternative solution… please let me know!

Analysis Ideas

An example of analysis that can be performed given the data set we created.

This dataset can be versatile in terms of analysis. The first few ideas that come to mind have to do with natural language processing (NLP) on the song lyrics as well as a machine learning model to predict popularity. Another possibility is comparing the audio features from different albums from the same artists. In our case… we can compare Frank Ocean’s first studio album, Channel Orange, and his second studio album, Blonde. Let me know if you can think of other creative ways to extract insights from this data set, I’d love to hear all about it!

--

--