Labs

Installing and configuring all the software needed for this course on your machine might be tedious. We have prepared a virtual machine (VM) which you can download and use it. You can download the machine from here (size: ~7.46GB). The username/password is csdeptucy.

We are kindly ask you to bring your own laptop (with VM installed on it) in the lab.

Week Description Material
2 Introduction to Apache Lucene LAB01.pdf, dataset.zip  
3 Apache Solr LAB02.pdf  
4 Apache Hadoop LAB03.pdf
Source Code
Dataset

5 Apache Hadoop   LAB04.pdf
WordCount
 
6 Apache Hadoop  

Lab05.pdf

Source Code
Dataset
Solution


7 Apache Nutch LAB06.pdf  
8 Δεν πραγματοποιήθηκε λόγω μεγάλης απουσίας του ακροατηρίου      
9 Apache Tika LAB07.pdf

LAB07.zip
 
10 Text Clustering in Apache Mahout (Java)
  • Using kMeans
Lab08
Source Code
reuters21578.tar.gz
 
11 Text Clustering and Classification in Python Lab09
Lab9-description.pdf
labeledTrainData.tsv
 
12 Assignment 2 Demonstration All students are kindly requested to be present.  
13 No Lab