Labs

Installing and configuring all the software needed for this course on your machine might be tedious. We have prepared a virtual machine (VM) with the majority of tools which you can download and use it. You can download the machine from here (size: ~7.46GB). The username/password is csdeptucy.

We would like to kindly ask you to bring your own laptop (with VM installed on it) in the lab.

The schedule shown below is tentative and subject to change.

Week Description Useful Links Material
2 Introduction to Apache Hadoop Lab01.pdf | Lab01-english-version.pdf
Source Code
Dataset
 
3 Programming with Apache Hadoop   Lab02.pdf | Lab02-english-version.pdf
Source code
 
4 Programming with Apache Hadoop   Lab03.pdf
Dataset
Source code

5 Introduction to Python
Implementation of Similarity/Distance Measures
  Lab04.pdf,
Lab4.py,
Lab4.csv,
 
6 Classification and Clustering in Python Lab05.pdf,
iris_data.csv,
fleet_data.csv,
wine_data.csv,
LAB05_Classification-Task1.py,
LAB05_Clustering.py,
 
7 Getting Started with Apache Mahout
Item-based and User-based Recommendation
  Lab06 | Lab06-english-version.pdf
Source Code
ml-1m.zip
 
8 Introduction to Apache Spark   Lab07
kmeans-example.py
kmeans-fleet.py
 
9 Text Clustering and Classification in Python   Lab08
Lab8-description.pdf
labeledTrainData.tsv
 
10 Text Clustering in Apache Mahout (Java)
  • Using kMeans
Lab09
Source Code
reuters21578.tar.gz
 
11 Data Preparation & Dimensionality Reduction   Lab10
remove-outliers.py
scaling.py
 
12 Timeseries Forecasting Lab11
Lab11_Timeseries_forecasting.py
Holt-Winters-algorithms.py
airline-passengers.csv
Lab11_DTW.py
dtw_functions.py
dtw_train.csv, dtw_test.csv
 
13 No Lab