Schedule

Date   Description Bibliography
Slides
4/09/2018   Introduction and Boolean Retrieval

Chapter 1, Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Optional reading:

  • A. Moffat, J. Zobel, D. Hawking, Recommended reading for IR research students, ACM SIGIR Forum, vol. 39, no. 2, pp. 3-14, 2005.
  • See Sergey Brin, speaking on Search, Google and Life, UC Berkeley, Oct. 2005.

 

11/09/2018   Text encoding: tokenization, stemming, lemmatization, stop words, phrases

Chapter 2, Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Optional reading:

  • Bahle, D., Williams, H. E., and Zobel, J. 2002. Efficient phrase querying with an auxiliary index. In Proceedings of the 25th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Tampere, Finland, August 11 - 15, 2002).
11/09/2018   Dictionaries & Tolerant retrieval

Chapter 3, Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Optional reading:

  • J. Zobel and P. Dart. Finding approximate matches in large lexicons. Software - practice and experience 25(3), March 1995.
  • K. Kukich. Techniques for automatically correcting words in text. ACM Computing Surveys 24(4), Dec 1992.
18/09/2018  

Google on Campus

Tuesday, September 18th 2018, 14:00 – 17:00

Room 010, Social Activities Centre, University of Cyprus

Speaker: Christos Karamanos 

2pm - 3pm Talk: "Life of a Google Software Engineer":
Want to hear about what it’s like to work at Google and some of the cool stuff that our full time engineers and interns work on? Come learn firsthand from a Google Software Engineer! We’ll also share info about some of the opportunities we have for technical students.

3.30pm - 5pm: CV/Interview Workshop:
Come learn the ins and outs of the technical interviews so you can nail your next one! Google software engineers will pull back the curtains on the interview process and give awesome tips & practice questions too.

 

Event Overview

Google is coming to campus and we can’t wait to see all of you again. Want to hear about what it’s like to work at Google and some of the cool stuff that our full time engineers and interns work on? Come learn firsthand from a Google Software Engineer! We’ll also share info about some of the opportunities we have for technical students.

What: On campus outreach can consist of workshops and presentations held with student organizations, classes, or a general assembly.

Who’s Invited: Any tech-related majors (Computer Science, Computer Engineering, Electrical Engineering, Math, Physics, etc.) are welcome.

Why: Learn more about Google products, projects, job/ internship opportunities, culture, and more directly from a Googler 

 

Speaker/Trainer:

Christos Karamanos received a Master of Engineering (MEng) degree in Computing (Software Engineering) from Imperial College London in 2015. Between the third and fourth year of the degree he completed a 6-month internship with Google in Site Reliability Engineering. Following graduation, he started at Google London as a full time Software Engineer in Site Reliability Engineering. Today he is a Senior Software Engineer working on the production system for Google Ads (AdWords, Adsense and other advertiser products).

25/09/2018   Index construction

Chapter 4, Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Optional reading:

  • Shanks, V. R. and Williams, H. E. 2003. Index construction for linear categorisation. In Proceedings of the Twelfth international Conference on information and Knowledge Management (New Orleans, LA, USA, November 03 - 08, 2003).
  • Dean, J. and Ghemawat, S. 2004. MapReduce: simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation - Volume 6 (San Francisco, CA, December 06 - 08, 2004).
  • See the video of Jeff Dean's (Google Inc) colloquium Google: A Behind-the-Scenes Look at the University of Washington, October 2004; covers aspects of MapReduce and the systems behind the search engine.
02/10/2018  

Index construction

Index compression

 

Chapters 4, 5 Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Optional reading:

  • Büttcher, S. and Clarke, C. L. 2007. Index compression is good, especially for random access. In Proceedings of the Sixteenth ACM Conference on Conference on information and Knowledge Management (Lisbon, Portugal, November 06 - 10, 2007).

09/10/2018

 

 

Vector Space Retrieval & Computing Scores in a complete search system

 

Chapters 6, 7, 8 Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Optional reading:

  • Zobel, J. and Moffat, A. 1998. Exploring the similarity space. SIGIR Forum 32, 1 (Apr. 1998).
16/10/2018  

Relevance Feedback & Query Expansion

XML retrieval 

Chapters 8, 9. Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Chapter 10, Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze.

Optional reading:

  • Anh, V. N., de Kretser, O., and Moffat, A. 2001. Vector-space ranking with effective early termination. In Proceedings of the 24th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (New Orleans, Louisiana, United States).

23/10/2018

 

  Data classification / Data clustering

Chapters 13, 14. Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Chapters 16, 17. Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Optional  reading:

  • Jain, A. K., Murty, M. N., and Flynn, P. J. 1999. Data clustering: a review. ACM Comput. Surv. 31, 3 (Sep. 1999), 264-323
  • See the video of Ulrike von Luxburg's (Max Planck Institute for Biological Cybernetics) colloquium Lectures on Clustering at the PASCAL Bootcamp in Machine Learning.
  • Jain, A. K., Murty, M. N., and Flynn, P. J. 1999. Data clustering: a review. ACM Comput. Surv. 31, 3 (Sep. 1999), 264-323
  • See the video of Yee Whye Teh's (University College London) colloquium Hierarchical Clustering at the EPSRC Winter School in Mathematics for Data Modelling.
30/10/2018   Midterm
  • Topics: Chapters 1-9, Manning, Raghavan, Schutze.
  • The midterm exam will last 120 minutes.

06/11/2018

 

 

Data classification/ Data Clustering

Web search Basics

 

Chapter 20. Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Crawling Techniques (Chapter  6, Modeling the Internet and the Web- Probabilistic Methods and Algorithms, by Pierre Baldi, Paolo Frasconi, Padhraic Smyth, Wiley, 2003.)

Georgios John Fakas, Zhi Cai, Nikos Mamoulis: Diverse and proportional size-l object summaries using pairwise relevance. VLDB J. 25(6): 791-816 (2016)2015

Georgios John Fakas, Zhi Cai, Nikos Mamoulis: Diverse and Proportional Size-l Object Summaries for Keyword Search. SIGMOD Conference 2015: 363-375

Optional reading:

  •  How search works
  •  Search Engine Users: Internet searchers are confident, satisfied and trusting -- but they are also unaware and naive, by Deborah Fallows, Pew Internet Research report, January 23, 2005.
  • Kobayashi, M. and Takeda, K. 2000. Information retrieval on the web. ACM Comput. Surv. 32, 2 (Jun. 2000), 144-173.
  • An Investigation of Web Crawler behavior: Characterization and Metrics. M. D. Dikaiakos, A. Stassopoulou, L. Papageorgiou. Computer Communications, May 2005. Vol. 28, Issue 8, pp. 880-897, Elsevier (available online through Elsevier's portal; locally in pdf).
  • Crawling the Infinite Web Baeza-Yates, R.; Castillo, C. , Journal of Web Engineering, February, Volume 6, Number 1, p.49--72, (2007)

 

13/11/2018

 

Crawling and Indexing

 

Chapter 20. Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Crawling Techniques (Chapter  6, Modeling the Internet and the Web- Probabilistic Methods and Algorithms, by Pierre Baldi, Paolo Frasconi, Padhraic Smyth, Wiley, 2003.)

Optional reading:

  •  How search works
  •  Search Engine Users: Internet searchers are confident, satisfied and trusting -- but they are also unaware and naive, by Deborah Fallows, Pew Internet Research report, January 23, 2005.
  • Kobayashi, M. and Takeda, K. 2000. Information retrieval on the web. ACM Comput. Surv. 32, 2 (Jun. 2000), 144-173.
  • An Investigation of Web Crawler behavior: Characterization and Metrics. M. D. Dikaiakos, A. Stassopoulou, L. Papageorgiou. Computer Communications, May 2005. Vol. 28, Issue 8, pp. 880-897, Elsevier (available online through Elsevier's portal; locally in pdf).
  • Crawling the Infinite Web Baeza-Yates, R.; Castillo, C. , Journal of Web Engineering, February, Volume 6, Number 1, p.49--72, (2007)
20/11/2018   Link Analysis

 

Chapter 21, Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Chapter 5, Mining Massive Datasets, by Anand Rajaraman and Jeff Ullman, Cambridge University Press, 2011

Link Analysis (Chapter  5, Modeling the Internet and the Web- Probabilistic Methods and Algorithms, by Pierre Baldi, Paolo Frasconi, Padhraic Smyth, Wiley, 2003.)

Optional reading:  

27/11/2018   Projects Presentation