Schedule

Date   Description Bibliography
Slides
16/01/2017   Introduction and Boolean Retrieval

Chapter 1, Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Optional reading:

  • A. Moffat, J. Zobel, D. Hawking, Recommended reading for IR research students, ACM SIGIR Forum, vol. 39, no. 2, pp. 3-14, 2005.
  • See Sergey Brin, speaking on Search, Google and Life, UC Berkeley, Oct. 2005.

 

23/01/2017   Text encoding: tokenization, stemming, lemmatization, stop words, phrases

Chapter 2, Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Optional reading:

  • Bahle, D., Williams, H. E., and Zobel, J. 2002. Efficient phrase querying with an auxiliary index. In Proceedings of the 25th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Tampere, Finland, August 11 - 15, 2002).
30/01/2017   Dictionaries & Tolerant retrieval

Chapter 3, Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Optional reading:

  • J. Zobel and P. Dart. Finding approximate matches in large lexicons. Software - practice and experience 25(3), March 1995.
  • K. Kukich. Techniques for automatically correcting words in text. ACM Computing Surveys 24(4), Dec 1992.
30/01/2017   Index construction

Chapter 4, Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Optional reading:

  • Shanks, V. R. and Williams, H. E. 2003. Index construction for linear categorisation. In Proceedings of the Twelfth international Conference on information and Knowledge Management (New Orleans, LA, USA, November 03 - 08, 2003).
  • Dean, J. and Ghemawat, S. 2004. MapReduce: simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation - Volume 6 (San Francisco, CA, December 06 - 08, 2004).
  • See the video of Jeff Dean's (Google Inc) colloquium Google: A Behind-the-Scenes Look at the University of Washington, October 2004; covers aspects of MapReduce and the systems behind the search engine.
06/02/2017  

Index compression

Vector Space Retrieval

Chapters 5,6 Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Optional reading:

  • Büttcher, S. and Clarke, C. L. 2007. Index compression is good, especially for random access. In Proceedings of the Sixteenth ACM Conference on Conference on information and Knowledge Management (Lisbon, Portugal, November 06 - 10, 2007).
13/02/2017  

Vector Space Retrieval &Computing Scores in a complete search system

 

Chapters 6, 7, 8 Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Optional reading:

  • Zobel, J. and Moffat, A. 1998. Exploring the similarity space. SIGIR Forum 32, 1 (Apr. 1998).
20/02/2017   Relevance Feedback & Query Expansion

Chapters 8, 9. Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Chapter 10, Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze.

Optional reading:

  • Anh, V. N., de Kretser, O., and Moffat, A. 2001. Vector-space ranking with effective early termination. In Proceedings of the 24th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (New Orleans, Louisiana, United States).
06/03/2017   Midterm
  • Topics: Chapters 1-9, Manning, Raghavan, Schutze.
  • The midterm exam will last 120 minutes.
13/03/2017  

XML retrieval and Querying the Data Web

Software retrieval on Clouds

M. Jarrar and M. D. Dikaiakos."Querying the Data Web -The MashQL approach." IEEE Internet Computing, 2010.

M. D. Dikaiakos, A. Katsifodimos, G. Pallis. Minersoft: Software Retrieval in Grid and Cloud Computing Infrastructures, ACM Transactions on Internet Technology (ACM TOIT), 2012.

Optional reading:

  • Overview of XML, XPATH, Semistructured data.
  • G. Pallis: Cloud Computing: The New Frontier of Internet Computing, IEEE Internet Computing, 13(5):70-73, Sep. 2010.
  • Optional reading: S. Amer-Yahia, M. Lalmas, "XML Search: Languages, INEX, and Scoring." SIGMOD Record, Vol. 35, No. 4, December 2006.
20/03/2017   Data classification / Data clustering

Chapters 13, 14. Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Chapters 16, 17. Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Optional  reading:

  • Jain, A. K., Murty, M. N., and Flynn, P. J. 1999. Data clustering: a review. ACM Comput. Surv. 31, 3 (Sep. 1999), 264-323
  • See the video of Ulrike von Luxburg's (Max Planck Institute for Biological Cybernetics) colloquium Lectures on Clustering at the PASCAL Bootcamp in Machine Learning.
  • Jain, A. K., Murty, M. N., and Flynn, P. J. 1999. Data clustering: a review. ACM Comput. Surv. 31, 3 (Sep. 1999), 264-323
  • See the video of Yee Whye Teh's (University College London) colloquium Hierarchical Clustering at the EPSRC Winter School in Mathematics for Data Modelling.
27/03/2017  

Web search Basics, Crawling and Indexing

 

Chapter 20. Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Crawling Techniques (Chapter  6, Modeling the Internet and the Web- Probabilistic Methods and Algorithms, by Pierre Baldi, Paolo Frasconi, Padhraic Smyth, Wiley, 2003.)

Optional reading:

  •  How search works
  •  Search Engine Users: Internet searchers are confident, satisfied and trusting -- but they are also unaware and naive, by Deborah Fallows, Pew Internet Research report, January 23, 2005.
  • Kobayashi, M. and Takeda, K. 2000. Information retrieval on the web. ACM Comput. Surv. 32, 2 (Jun. 2000), 144-173.
  • An Investigation of Web Crawler behavior: Characterization and Metrics. M. D. Dikaiakos, A. Stassopoulou, L. Papageorgiou. Computer Communications, May 2005. Vol. 28, Issue 8, pp. 880-897, Elsevier (available online through Elsevier's portal; locally in pdf).
  • Crawling the Infinite Web Baeza-Yates, R.; Castillo, C. , Journal of Web Engineering, February, Volume 6, Number 1, p.49--72, (2007)
03/04/2017   Crawling and Indexing / Link Analysis

 

Chapter 21, Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Chapter 5, Mining Massive Datasets, by Anand Rajaraman and Jeff Ullman, Cambridge University Press, 2011

Link Analysis (Chapter  5, Modeling the Internet and the Web- Probabilistic Methods and Algorithms, by Pierre Baldi, Paolo Frasconi, Padhraic Smyth, Wiley, 2003.)

Optional reading:  

24/04/2017   Projects Presentation