Labs

Installing and configuring all the software needed for this course on your machine might be tedious. We have prepared a virtual machine (VM) with the majority of tools which you can download and use it. You can download the machine from here (size: ~7.46GB). The username/password is csdeptucy.

We would like to kindly ask you to bring your own laptop (with VM installed on it) in the lab.

The schedule shown below is tentative and subject to change.

Week Description Useful Links Material
2 Introduction to Apache Hadoop Lab01.pdf | Lab01-english-version.pdf
Source Code
Dataset
 
3 Programming with Apache Hadoop   Lab02.pdf | Lab02-english-version.pdf
Source code
Solution
 
4 Programming with Apache Hadoop   Lab03.pdf | Lab03-english-version.pdf
Dataset
Source code
Solution

5 Introduction to Python
Implementation of Similarity/Distance Measures
  Lab04.pdf,
Lab4.py,
Lab4.csv,
Solution
 
6 Classification and Clustering in Python Lab05.pdf,
iris_data.csv,
fleet_data.csv,
wine_data.csv,
LAB05_Classification-Task1.py,
Task1 Solution,
LAB05_Clustering.py,
Task2 Solution
 
7 Getting Started with Apache Mahout
Item-based and User-based Recommendation
  Lab06 | Lab06-english-version.pdf
Source Code
ml-1m.zip
Solution
 
8 Introduction to Apache Spark   Lab07
kmeans-example.py
kmeans-fleet.py
Solution
 
9 Text Clustering and Classification in Python   Lab08
Lab8-description.pdf
labeledTrainData.tsv
Solution
 
10 Data Preparation and Cross Validation   Lab09
remove-outliers.py
scaling.py
 
11 Dimensionality Reduction: Feature Selection and Extraction Lab10  
12 Timeseries Forecasting Lab11
Lab11_Timeseries_forecasting.py
Holt-Winters-algorithms.py
airline-passengers.csv
Lab11_DTW.py
dtw_functions.py
dtw_train.csv, dtw_test.csv
 
13 No Lab