STEAM PRODUCTS
DATA ANALYSIS

January 19, 2021

Steam data and its analysis

Steam is a software and videogame platform with more than 30.000 videogames and applications on its store. This platform already has more than 95 million users, leading to massive amount of data which can only be processed by using highly demmanding programs.

In this project we will show the results of our studies and the process we have elaborated by using pyspark and hadoop in order to process this large amounts of data.

RESULTS OF THE STUDY

STUDIES OF PRODUCTS ACCORDING TO THEIR GENRE

The following diagrams belong to our studies of the different genres of the products of the Steam library, comparing them with different parameters such as time users expend on them or the average price of each of the genres.

Number of Genre Appearance

This image shows the most common genres in the Steam store. The study has lead us to see that the most common genre according to the Steam dta is "Indie" which make sense because Steam is the biggest platform fro Indie games.

Most recommended genres


After the study, we have discovered that history related genres are the most liked by the community. Users who like these type of genre seem to be more devoted to recommend and upvote these games. These 10 genres showed in the image above are the most recommended by players.

Average playtime per genre

This headland didn't suprise us, we expected genres such as MMO and MOBAs to be the genres in which users expend more time. Also, we can see that software and programming applications are on this top due to the time necessary to develop projects on these apps.

Product sales

The following diagrams resemble different relationships related to the products' sales. In these part we have analysed and compared different parameters such as sales and price or sales throughout the year.

Sales and average hours spent by users

Through this study, we have discovered that it exists a relationship between sales and playtime spent by players in an application. It is important to say that there are only a few games in the higher sale scale, and these games are the most famous games in the Steam library, so that's why their average playtime is so high compared with other results such as the average playtime per gender.

Sales and average hours spent by users

We might see a correlation between sales and price, but we might not have enough information to get to a detailed conclusion. Something curious we've found is that two sale ranges have only a few games on them and everyone of them is free to play.

Games and software releases per month

These is the sum of every Steam game's month release date. With this graphic we can see that companies prefer to release their games in mid-Spring and mid-Fall.

Games and software releases per year

This is the evolution of the quantity of games in Steam over the years from the release of Valve's first game to Summer of 2019. It is obvious that the quantity has grown over the years and it will even grow further in the near future.

Developers and price

We have also studied the prices for different developers. These information might not be very conclusive by itself, and there is no much we can get from it. The number one developer whith the highest price is a Finnish company who accidentally put its game price into 400$. More info here.

Top played videogames

We have also collected the top played applications in the Steam library. Thanks to this, we learnt that Valve developed games are the most played in the workshop. Valve is the owner of steam and has developed some games a few years ago. Nowadays, their games are still the most influential in their site.