top of page

Introduction

Motivated to deal with big enough dataset and with passion to music,

we decided to investigate the TOP-100 weekly Billboard songs between the years 1970-2018.

 

Based on more than 18,000 songs, we developed two different language models which study the data and then generate new songs. The models take into consideration the structure of each song in the dataset, and by making statistical research we succeeded to generate relatively reliable songs – both in structure and logic.

 

Moreover, the models study the data in a clever way, means the greater weeks a song appears in the Billboard chart, the better odds its words will compose the generated song's lyrics.

This approach increases the odds to generate the next great hit!

Two Types of Language Models:

Trigram-Model: Given a sequence of two words, the model predicts the third word to be added to the generated song by probabilistic analysis of Billboard database.

 

Chars-Model: Given a history of n-1 chars, the model predicts the n's char to be concatenated to the generated song.  In this project n=11 since this string represents about 2 words.

Goals:

1) Conduct a limited experiment with these two models, trying to answer the question Which model produces the most similar songs in aspect of structure compared to the original Billboard songs database. 

​

2) Analyze the content aspect of the generated lyrics, such as common words and sentiment analysis. 

 

3)  As a bonus we earn a tool which generates new hits.

      To support its greatness, we will choose the best song in both aspects, structure and content.

This project was created as a part of Digital Humanities course supervised by Dr. Yael Netzer, Ben Gurion University, Beer Sheva.

Spring 2020.

Developed By

Omer Savion  &   Eden Afori

For Further Information Visit

Git Project

​

Billboard.jpg
bottom of page