This repository holds code snippets from the group assignments (with https://github.com/cjcarvajal) at Universidad de Los Andes Big Data Analysis module (MINE-4102) in the Winter term 2017. They cover various big data/analytics topics and technology:
- Crawling the university intranet and subscribing RSS feeds to extract and aggregate information using regex and xquery
- Mining the Wikipedia XML dump for Person --> Place relations using Hadoop/MapReduce and visualizing it using Alchemy
- Twitter sentiment & polarity analysis using nltk and scikit-learn
- Integrating data from Kaggle, IMDB (RSS), and MovieStackexchange (RSS) and enriching them using NER and DBpedia.