Skip to content

Four assignments covering various big data/analytics topics and technology.

Notifications You must be signed in to change notification settings

Illuminae/bigdata_coursework

Repository files navigation

bigdata_coursework

This repository holds code snippets from the group assignments (with https://github.com/cjcarvajal) at Universidad de Los Andes Big Data Analysis module (MINE-4102) in the Winter term 2017. They cover various big data/analytics topics and technology:

Assignment 1 - Information discovery

  • Crawling the university intranet and subscribing RSS feeds to extract and aggregate information using regex and xquery

Assignment 2 - Scalable processing of semistructured data

  • Mining the Wikipedia XML dump for Person --> Place relations using Hadoop/MapReduce and visualizing it using Alchemy

Assignment 3 - Twitter sentiment analysis using MongoDB

  • Twitter sentiment & polarity analysis using nltk and scikit-learn

Assignment 4 - Information integration and named entity recognition

  • Integrating data from Kaggle, IMDB (RSS), and MovieStackexchange (RSS) and enriching them using NER and DBpedia.

About

Four assignments covering various big data/analytics topics and technology.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published