// Copyright 2015, University of Freiburg, // Chair of Algorithms and Data Structures. // Author: Elmar Haussmann . // NOTE: this file should be seen as addition to the TIP file // of sheet 1. Only the additional methods are shown here. // The solution of sheet 2 also gives you methods for evaluating // a benchmark. // Class for a simple inverted index. class InvertedIndex // This methods first computes the sparse term-document matrix using the // (already built) inverted index. For LSI it also performs SVD using // dimensionality k and only the m most frequent terms. Intermediate // results are stored as members of this class. void preprocessing_vsm(int k, int m) // Execute the query using the (full) term-document matrix in the vsm. // You will need to create a vector from the query and use the term-document // matrix from the function above. The result should be exactly the same as // when using the inverted index!!! (Easy to verify) void process_query_vsm(String query) // Execute the query by mapping the query vector to latent space. // The term-document matrix in latent space should be precomputed in // preprocessing_vsm. Once you have implemented interpolation between // original BM25 and LSI lambda controls the combination. void process_query_lsi(String query, float lambda) // Compute the term-term association matrix T. Write the 50 term pairs with // highest values to a file "term_pairs.txt". Only consider unique pairs of // different terms. void related_term_pairs() // Main program: // // 1. Arguments: // 2. Construct inverted index from given file // 3. Perform VSM preprocessing // 4. Run benchmark without LSI (only BM25). Print the results. // 5. Run benchmark only with LSI. Print the results. // 6. Run benchmark interpolated via lambda with LSI and BM25. Print the results. void main