In this assignment, you will run an experiment to study the effects of relevance feedback on the recall, precision and Mean Average Precision (MAP) values of an IR system.

computer science


In this assignment, you will run an experiment to study the effects of relevance feedback on the recall, precision and Mean Average Precision (MAP) values of an IR system. The IR system will use a vector space model with cosine similarity (tf-idf weighting). You will run the study on the TIME dataset, provided along with the assignment.

Part A: Cosine Similarity and Rocchio’s algorithm [40pts]

  • You will implement a cosine similarity measure with tf-idf weighting. Your index should contain the information that you will need to calculate the cosine similarity measure such as tf and idf values. You may reuse code from the previous assignments as needed
  • Implement the Rocchio algorithm for query refinement. Your system should display results and then prompt the user for providing positive and negative feedback. Use ?= 1, ?= 0.75, and ?= 0.15 as parameters for the Rocchio’s algorithm.

Part B: Experimental study [35pts]

  • Run your system for at least 3 queries from the test bed. Pick queries that have 5 or more relevant documents (see TIME.REL file). For each query, you will perform a series of 5 relevance feedback and plot the change in precision, recall and MAP
  • You will prepare a report on the experimental study where you will provide at least the following details for each of the queries:

o Query text and ID (provided in the testbed)
o Precision, recall and MAP values of the query
o IDs of documents which are Positive and Negative feedback provided for each query

during each iteration of the Rocchio algorithm
o For each iteration of the Rocchio algorithm, provide the terms of the new query and

their weights

  • Your report will have 3 plots (precision vs Rocchio iteration, recall vs Rocchio iteration, and MAP measure vs Rocchio iteration) that depict the progressive change in the performance values over the iterations of the Rocchio algorithm, for the three queries.
  • Also discuss any query drift that you may observe in your results.
  • Note: The queries provided in the testbed have varying number of relevant documents (see

TIME.REL file). This can be a problem when calculating the performance values, if k is kept constant during the retrieval. For the experimental study, assume that the number of relevant


documents is provided to the system along with the query. In other words, the value of k will change with the query.

Part C: Pseudo Relevance Feedback [25pts] 

Related Questions in computer science category