Welcome to Minersoft's software retrieval evaluation dataset page

Here, you can find the dataset used to evaluate Minersoft's performance in Software Retrieval.

You can find a prototype of our search engine that uses a limited set of indexes in the following link:
Minersoft Prototype

Indexes

The files provided are inverted indexes built with Apache Lucene
Navigate to the indexes directory for a list of availlable files:
We provide 18 inverted indexes. There are another 2, which we do not provide because they are very large to be provided through a web server. However (for those who are interested to get them) we can provide them upon request. The indexes contain many different zones (a.k.a lucene fields). You can use the Luke package to get familliar with the different fields. You will need some lucene code to be able to query the indexes.

Relevance Judgements

We provide the relevance judgements that we used to evaluate Minersoft's software retrieval performance. We use 219 queries and a relevance value ranging from 0-2 (0 not relevant, 1 relevant, 2 more relevant). We used the NDCG and NCG information retrieval metrics. In the zip file you will find two excel spreadsheets. One for the queries using the stemmed fields and one for the non-stemmed fields:

Queries Used

We have used 220 queries for this evaluation, from 3 different categories.
  • General Queries

    file converter;terminal emulator;log analyzer;rendering text;linear algebra package;fftw library;earthquake analysis;java virtual machine;statistical analysis software;ftp client;regular expression;sigmoid function;histogram plot;binary tree;zip deflater;pdf reader;ray tracing;web server;torrent client;xml parser;audio compression;video compression;access control;ajax javascript;applet;authentication token;certificate retrieval;compression algorithm;configuration stores;cpp opengl;cpp testing framework;development ide;flow analysis;gis;globus token;graph analysis;graph api;graph compression;graph transformation;graph visualization software;image transformation package;inverted index;java profiler;license extraction;linear optimization;manager desktop gnome;matrix transformation;network simulation;neural network;osgi;parallel r;photo editing;python library;python rest xml;routing algorithm implementation;r project;security authentication;spatial analysis;speech recognition;sql server;statistics package;support vector machines;text analysis software;video streaming analysis;weather information;web services;window manager;file backup software;html editing;partitioning software;network scanners;traffic sniffers;

  • Navigational Queries

    imagemagick;octave;numerical;computations;lapack;library;gsl;library;glite;data;management;xerces;xml;subversion;client;gcc;fortran;thrudb;lucene;jboss; rails;ruby;mpich;autodock;docking;atlas;software;mysql;client;lighttpd;wordpress;redmine;libapr;apache2;aegis;ajax;alice;amga;ant;arc;atlas;biomed;bison;bzr; library;compchem;cosmo;cpan;cpp;eclipse;equinox;esr;fortran;fusion;ganga;gaussian;geclipse;gfortran;glite;globus;gnuplot;gtk;java;jquery;kalman;filter;lapack; lhcb;libc;make;matlab;matlab;plot;maven;nagios;monitoring;tool;nsc;openmp;randomkit.h;sqlite;svn;tcl;tex;texlive;tomcat;unicore;wget;workflows;sasl;openssl; yum;wine;ffmpeg;nmap;uxterm;xterm;automake;perl;gzip;gawk;groff;bzip2;vorbis;curl;gdbm;kudzu;uuid;valgrind;mime;gamin;rsync;cscope;sysfsutils;setserial;ntp; usbutils;guile;libogg;gphoto2;libwnck;ctags;elfutils;libxklavier;openldap;logwatch;lftp;audiofile;libao;dos2unix;xmlsec1;krb5;log4j;pyorbit;orbit2;

  • Versional Queries

    distcache version:1.*;atk version:1.*;vorbis version:0.*;xmlsec1 version:1.2.*;netpbm version:10.*;spread library version:2.*;libncurses version:5.4;libxml version:2.*;libcrypto version:0.9*;ruby version:1.8.*;perl version:5.10.*;openssl version:0.9.*;gdbm version:2.0.*;uuid version:1.2;gamin version:0.1.7;xdelta version:2.*;sysfs version:2.0.*;ogg version:0.*;vorbis version:0.*;seaudit version:1.2.6;mikmod version:2.*; cdio version:7.1.*;vte version:4.4.*;sasl version:7.1.11;pcre version:0.*;plot version:2.2.1;ccid version:1.*;torrent version:9.2.*;expat version:0.*;acl version:1.1.*;

Enquiries

Please contact Asterios Katsifodimos(asteriosk@cs.ucy.ac.cy) for any further information/details or feedback.