Cluster and visualise the physical and social media based audience behaviour at roskilde festival
Rio to Roskilde ” A big music festival is just like a city”. You need safety, citiziens services, you generate waste.
Cluster and visualise the physical and social media based audience behaviour at Roskildefestival 2015, taking into account the 180 concerts, the music genres, the 8 stages, 21 camps, 8 festival days, the social networks used, and the geospatial datapoints collected from approx. 38K guests
Hint: Make clever use of dimensionality reduction, bridge the physical and social media world, evaluate data quality, and use time and location data to your advantage.
The picture below shows an example of the data.
Environment: IBM Bluemix
If found necessary for the challenge, the use of Bluemix can be extended with open source components, but not commercial tools.
Among other things, Bluemix hosts Jupiter-based notebooks running in Apache Spark 1.6. Thus students will thus be able to work with Sci Kit Learn, NetworkX, Numpy, Pandas, etc… + the Spark modules such as the algorithms in MLlib, GraphX etc. Also RStudio, Object Storage, SQL and non-SQL databases, a graph database compatible with Apache Tinkerpop + about 25 REST-API based cognitive Watson services, such as Tone Analyser, Image Recognition, Personality Insight, etc.
You will get access to anonymous geospatial data recorded within the Roskildefestival area and in the period.
It is strongly discouraged to make any attempt to draw any relationship to individual festival guests or private addresses outside the festival. Describe and visualise your clusters with respect for privacy
How to access:
- All students whom would like access to Bluemix please apply for a Bluemix student code as described in: 2016 Bluemix Student Code.pdf
- Download PDF document here;
- All students whom would like to make use of the IBM Challenge it is a prerequisite that you have applied for a IBM Student Code before Friday 15.4 Kickoff