January 19, 2021
Steam is a software and videogame platform with more than 30.000 videogames
and applications on its store. This platform already has more than 95 million users,
leading to massive amount of data which can only be processed by using highly demmanding
programs.
In this project we will show the results of our studies and the process we have elaborated by using
pyspark and hadoop in order to process this large amounts of data.
This image shows the most common genres in the Steam store.
The study has lead us to see that the most common genre according to the Steam dta
is "Indie" which make sense because Steam is the biggest platform fro Indie games.
After the study, we have discovered that history related genres are the most liked by the community.
Users who like these type of genre seem to be more devoted to recommend and upvote these games.
These 10 genres showed in the image above are the most recommended by players.
We have also compared each genre and its average price. The results show us that the software that
offers 3D modeling and character design are the most expensive products according to steam tags.
This headland didn't suprise us, we expected genres such as MMO and MOBAs to be the genres in which users
expend more time. Also, we can see that software and programming applications are on this top due to the time
necessary to develop projects on these apps.
Through this study, we have discovered that it exists a relationship between sales and playtime spent by
players in an application. It is important to say that there are only a few games in the higher sale scale,
and these games are the most famous games in the Steam library, so that's why their average playtime is so high
compared with other results such as the average playtime per gender.
We might see a correlation between sales and price, but we might not have enough information
to get to a detailed conclusion. Something curious we've found is that two sale ranges have only a
few games on them and everyone of them is free to play.
These is the sum of every Steam game's month release date. With this graphic
we can see that companies prefer to release their games in mid-Spring and
mid-Fall.
This is the evolution of the quantity of games in Steam over the years from
the release of Valve's first game to Summer of 2019. It is obvious that
the quantity has grown over the years and it will even grow further in the
near future.
We have also studied the prices for different developers. These information might not be very
conclusive by itself, and there is no much we can get from it. The number one developer whith
the highest price is a Finnish company who accidentally put its game price into 400$.
More info here.
We have also collected the top played applications in the Steam library.
Thanks to this, we learnt that Valve developed games are the most played in the workshop.
Valve is the owner of steam and has developed some games a few years ago. Nowadays, their games are
still the most influential in their site.