↑↓ Navigate • Space: Toggle Answer
1 Compression — VBC, Gamma, Gap Encoding ★★★★ Very High (4/4)
WS24/25
Given: d₁: phone expensive buy d₄: computer buy d₉: expensive buy
  1. a) Provide the GAP encoding for the postings list of "expensive". Compress using Variable Byte Code.
  2. b) Is this compression lossy or lossless? Explain the difference.

Solution

Step 1 — Build Inverted Index

  • buy → [1, 4, 9]
  • expensive → [1, 9]
  • phone → [1]
  • computer → [4]

Step 2 — Gap Encode "expensive"

Postings [1, 9] → gaps [1, 8] (9−1=8)

Step 3 — VBC Encode

1 → binary 0000001 → 10000001
8 → binary 0001000 → 10001000

Part 1.3 — Lossy vs Lossless

VBC is lossless — you can decode it back to the exact original gap values and reconstruct the posting list perfectly.

Lossless: Original data can be perfectly reconstructed (VBC, Gamma, Gap encoding)

Lossy: Some information is permanently lost (case folding, stemming, stop word removal)

1B Compression — Gamma Code ★★★★ Very High (4/4)
WS19/20
Given: d₁: phone expensive buy d₃: expensive buy d₁₃: buy
Write the gap sequence for "expensive" in gamma code.

Solution

Step 1 — Postings for "expensive"

expensive → [1, 3]

Step 2 — Gap Encode

Postings [1, 3] → gaps [1, 2] (3−1=2)

Step 3 — Gamma Encode

1 → binary "1" → length 1 → prefix: (none) → 0
2 → binary "10" → length 2 → prefix: 1 one + zero = "10" → append "0" → 100

How Gamma Works:

  1. Write number in binary (e.g., 6 = 110)
  2. Count bits = length (e.g., 3)
  3. Write (length − 1) ones then a zero as prefix
  4. Append binary WITHOUT the leading 1
2 Inverted Index Construction ★★★ High (3/4)
WS24/25
Given: d₁: phone expensive buy d₄: computer buy d₉: expensive buy
Create the Inverted Index. No normalization required.

Solution

Term df Posting List
buy 3 1 → 4 → 9
expensive 2 1 → 9
phone 1 1
computer 1 4

That's it. No Boolean queries, no term-doc matrix. ~1 minute.

2B Inverted Index Construction ★★★ High (3/4)
WS19/20
Given: d₁: phone expensive buy d₃: expensive buy d₁₃: buy
Create a full inverted index.

Solution

TermdfPosting List
buy31 → 3 → 13
expensive21 → 3
phone11

Same format, different doc IDs. On the exam this feeds directly into compression.

3 Edit Distance (Damerau-Levenshtein) ★★★ High (3/4)
WS24/25
Calculate the Damerau-Levenshtein distance of "bca" and "cba". Show the matrix and explain each step.

Solution — Distance = 1 (single transposition)

c b a
0 1 2 3
b 1 1 1 2
c 2 1 1★ 2
a 3 2 2 1

★ Transposition fires at d[2][2]: source "bc" matches target "cb" (adjacent swap) → d[0][0]+1 = 1. Final cell inherits the 1 since s[3]=t[3]='a' (match).

Operations allowed in Damerau-Levenshtein:

  • Insertion
  • Deletion
  • Substitution
  • Transposition (swapping adjacent characters)
3B Edit Distance (Damerau-Levenshtein) ★★★ High (3/4)
WS19/20
Calculate the Damerau-Levenshtein distance between "abc" and "acb". Fill in a matrix. Explain.

Solution — Distance = 1 (single transposition)

acb
0123
a1012
b2111
c3211★

★ Transposition fires at d[3][3]: source "bc" matches target "cb" → d[1][1]+1 = 0+1 = 1.

Exam tip: Both exam versions are distance=1 with one transposition. The exam tests whether you can fill the matrix and narrate the transposition step.

4 Permuterm Index ★★ Medium (2/4) Not in coursework
WS24/25
Given: d₁: to two d₂: too is
  1. a) What does the permuterm index look like?
  2. b) Given query t*o — what terms would it result in? Explain.

Solution

How Permuterm Works:

  1. Append $ to each term: "hello" → "hello$"
  2. Generate ALL rotations: hello$, ello$h, llo$he, lo$hel, o$hell, $hello
  3. Store each rotation pointing to the original term in a B-tree
  4. To search wildcard: rotate query so * is at the END, then prefix lookup

Permuterm Rotations:

  • "to" → to$: to$, o$t, $to
  • "two" → two$: two$, wo$t, o$tw, $two
  • "too" → too$: too$, oo$t, o$to, $too
  • "is" → is$: is$, s$i, $is

Query t*o:

Append $: t*o$ → Rotate so * is at end: o$t* → Search for prefix o$t

Matches: o$t (from "to"), o$tw (from "two"), o$to (from "too")

Result: to, two, too

4B Permuterm Index ★★ Medium (2/4) Not in coursework
WS21/22
Given: d₁: he is d₂: she is
  1. a) How do you represent the terms in the permuterm index?
  2. b) How do you look up *he and which elements are retrieved?

Solution

Permuterm Rotations:

  • "he" → he$: he$, e$h, $he
  • "is" → is$: is$, s$i, $is
  • "she" → she$: she$, he$s, e$sh, $she

Query *he:

Append $: *he$ → Rotate so * is at end: he$* → Search for prefix he$

Matches: he$ (from "he"), he$s (from "she")

Result: he, she

1 Vector Space Plotting ★★★★ Very High (4/4)
WS24/25
Documents: d₁: yes yes no d₂: yes yes yes yes no d₃: yes yes no no no no no (Grid provided 0–5 on both axes)
Draw the documents into the vector space. No weighting or log-scaling required.

Solution

How to Solve:

  1. Count raw term frequencies for each term dimension
  2. Plot as points: (tf_term1, tf_term2)
  3. No weighting, no normalization, no log scaling

Coordinates:

  • d₁ = (yes=2, no=1) → point (2, 1)
  • d₂ = (yes=4, no=1) → point (4, 1)
  • d₃ = (yes=2, no=5) → point (2, 5)

This question always leads into cosine similarity or HAC clustering in the next subtask.

1B Vector Space Plotting ★★★★ Very High (4/4)
WS21/22
Documents: A: yes yes B: no no no no Q: no no no no yes yes (Axes: x = "yes", y = "no")
Plot the documents and query into the vector space.

Solution

Coordinates:

  • A = (yes=2, no=0) → point (2, 0)
  • B = (yes=0, no=4) → point (0, 4)
  • Q = (yes=2, no=4) → point (2, 4)

A sits on the x-axis, B on the y-axis, Q is in the upper-right quadrant. This setup feeds directly into cosine similarity.

2 Cosine Similarity ★★★ High (3/4)
WS21/22
Calculate cosine similarity between A=(2,0) and Q=(2,4), and between B=(0,4) and Q=(2,4). Which is more similar?

Solution

Formula:

cos(q, d) = (q⃗ · d⃗) / (|q⃗| × |d⃗|)

Calculations:

cos(A, Q) = (2×2 + 0×4) / (√4 × √20) = 4 / (2 × 4.47) = 4/8.94 = 0.447

cos(B, Q) = (0×2 + 4×4) / (√16 × √20) = 16 / (4 × 4.47) = 16/17.89 = 0.894

Answer:

B is more similar (despite Q containing both "yes" and "no", Q's "no" count dominates)

2B Cosine Similarity ★★★ High (3/4)
WS19/20
Documents: A: read read → (2, 0) B: write → (0, 1) Q: read read write → (2, 1)
Calculate cosine similarity between each document and Q. Which is more similar?

Solution

Formula:

cos(q, d) = (q⃗ · d⃗) / (|q⃗| × |d⃗|)

Calculations:

cos(A, Q) = (2×2 + 0×1) / (√4 × √5) = 4 / (2 × 2.236) = 4/4.472 = 0.894

cos(B, Q) = (0×2 + 1×1) / (√1 × √5) = 1 / 2.236 = 0.447

Answer:

A is more similar to Q. ~1 minute with calculator. No TF-IDF, no log scaling — just raw counts.

1 Precision-Recall Curves ★★ Medium (2/4)
WS24/25
Given: Result = (1, 3, 5, 9, 2) Correct = (1, 2, 4, 8, 9)
Draw the corresponding Precision-Recall graph (without interpolation). Give the P-R coordinates.

Solution

Rank Doc Relevant? P@k R@k
1 1 1/1 = 1.000 1/5 = 0.200
2 3 1/2 = 0.500 1/5 = 0.200
3 5 1/3 = 0.333 1/5 = 0.200
4 9 2/4 = 0.500 2/5 = 0.400
5 2 3/5 = 0.600 3/5 = 0.600

P-R Coordinates:

(0.2, 1.0), (0.2, 0.5), (0.2, 0.333), (0.4, 0.5), (0.6, 0.6)

Docs 4, 8 never retrieved → recall never reaches 1.0

1B Precision-Recall Curves ★★ Medium (2/4)
WS21/22
Given: Ranking = {1, 3, 4, 7, 9, 11, 8} Gold = {1, 2, 3, 4, 8}
Draw the complete precision-recall curve.

Solution

RankDocRelevant?P@kR@k
111/1 = 1.0001/5 = 0.200
232/2 = 1.0002/5 = 0.400
343/3 = 1.0003/5 = 0.600
473/4 = 0.7503/5 = 0.600
593/5 = 0.6003/5 = 0.600
6113/6 = 0.5003/5 = 0.600
784/7 = 0.5714/5 = 0.800

P-R Coordinates:

(0.2, 1.0), (0.4, 1.0), (0.6, 1.0), (0.6, 0.75), (0.6, 0.6), (0.6, 0.5), (0.8, 0.571)

Doc 2 never retrieved → recall never reaches 1.0. Note the first 3 results are all relevant — precision stays at 1.0 early.

1 Naive Bayes ★★ Medium (2/4)
WS24/25 · Conceptual
Given: D documents, C classes, M terms, T tokens per document
  1. a) How many variables does training create?
  2. b) Time complexity of classifying T_test with t tokens? (Big-O)
  3. c) What is add-1 smoothing? What is it needed for? Provide the formula.

Solution

7.1 — Training Variables:

C(M + 1) or equivalently C + CM

  • C priors
  • C × M conditional probabilities

7.2 — Test Time Complexity:

O(C × t) — for each class, multiply t term probabilities

7.3 — Add-1 Smoothing:

P̂(t|c) = (T_ct + 1) / (ΣT_ct' + B)
where B = vocabulary size

Why needed: Add 1 to every term count to prevent zero probabilities. A single unseen word zeros out the entire class probability.

2 Naive Bayes — Full Calculation ★★ Medium (2/4)
WS21/22
Training data: German → {Sausage Cheese, Sausage Wine} French → {Cheese Baguette, Wine, Wine Wine}
Calculate ALL parameters of the Naive Bayes model with add-one smoothing.

Solution

Vocabulary: V = {Sausage, Cheese, Wine, Baguette} → B = |V| = 4

Priors:

  • P(German) = 2/5 = 0.4
  • P(French) = 3/5 = 0.6

Token counts:

German: Sausage(2) Cheese(1) Wine(1) Baguette(0) = 4 total

French: Sausage(0) Cheese(1) Wine(3) Baguette(1) = 5 total

Add-one smoothed conditionals:

denom_German = 4 + 4 = 8 | denom_French = 5 + 4 = 9

Term count(Ger) P(t|Ger) count(Fr) P(t|Fr)
Sausage 2 3/8 0 1/9
Cheese 1 2/8 1 2/9
Wine 1 2/8 3 4/9
Baguette 0 1/8 1 2/9

This exam version STOPS here — no test document to classify.

Total parameters: 2 priors + 2×4 = 8 conditionals = 10 parameters

1 HAC Dendrograms ★★★★ Very High (4/4) Not in coursework
WS24/25
Given: A(1,1), B(1,3), C(5,3)
Draw the dendrograms for both single-link and complete-link clustering. Explain your solution and the difference.

Solution

Distance Matrix:

  • d(A,B) = √(0+4) = 2
  • d(B,C) = √(16+0) = 4
  • d(A,C) = √(16+4) = √20 ≈ 4.47

Step 1 (both methods):

Merge A,B (smallest distance = 2)

Step 2 — Single-link:

d({A,B}, C) = min(d(A,C), d(B,C)) = min(4.47, 4) = 4

Merge {A,B} with C at distance 4

Step 2 — Complete-link:

d({A,B}, C) = max(d(A,C), d(B,C)) = max(4.47, 4) = 4.47

Merge {A,B} with C at distance 4.47

Dendrogram Structure:

Single-Link:           Complete-Link:
4.47|                  4.47|___________
  4 |___________         4 |
  3 |           |        3 |
  2 |___        |        2 |___        |
  1 |   |       |        1 |   |       |
    A   B       C          A   B       C
                

Key Difference:

Single-link can create "chaining" (long, spread-out clusters). Complete-link creates more compact, spherical clusters.

1B HAC Dendrograms ★★★★ Very High (4/4) Not in coursework
WS24/25 · from vector plot
Given (from vector space plotting): d₁(2, 1), d₂(4, 1), d₃(2, 5)
Draw the dendrograms for both single-link and complete-link clustering. Explain the difference.

Solution

Distance Matrix:

  • d(d₁,d₂) = √(4+0) = 2
  • d(d₁,d₃) = √(0+16) = 4
  • d(d₂,d₃) = √(4+16) = √20 ≈ 4.47

Step 1 (both methods):

Merge d₁,d₂ (smallest distance = 2)

Step 2 — Single-link:

d({d₁,d₂}, d₃) = min(4, 4.47) = 4

Step 2 — Complete-link:

d({d₁,d₂}, d₃) = max(4, 4.47) = 4.47

Same structure as Version A — practice with different point labels to build fluency.

2 PageRank ★★★ High (3/4)
WS24/25
Given: 3-node graph (1↔2↔3, all with self-loops and cross-links) Teleportation rate = 0.15
  1. a) What is the probability-transition matrix?
  2. b) PageRank values? Hint: you may only need one step with uniform initial vector. Name the process.

Solution

The Steps:

  1. Draw/read the link graph → note outgoing links per node
  2. Build transition matrix M (column-stochastic): M[i][j] = 1/outdegree(j) if j→i, else 0
  3. Apply teleportation: P = (1-α)M + (α/n)·1 where α = teleportation rate
  4. Power iteration: start with x⁰ = (1/n, ..., 1/n), compute x^(k+1) = P·x^k until convergence

Transition Matrix M:

M = | 0    0.5  0.5 |
    | 0.5  0.5  0   |
    | 0.5  0    0.5 |
                

Google Matrix P (α = 0.15, n = 3):

P = 0.85M + (0.15/3)·1 = 0.85M + 0.05·1

P = | 0.05   0.475  0.475 |
    | 0.475  0.475  0.05  |
    | 0.475  0.05   0.475 |
                

Power Iteration from x⁰ = (1/3, 1/3, 1/3):

x¹ = P·x⁰ = (1/3, 1/3, 1/3) — already converged! (due to symmetric structure)

Stationary Distribution:

π = (1/3, 1/3, 1/3)

The process is called the Power Method or Power Iteration.

3 PageRank — Reverse Design ★★★ High (3/4)
WS20/21
Constraint: PR(d₁) = PR(d₂), PR(d₁) > PR(d₃)
  1. a) Add links to d₁, d₂, d₃ that satisfy the constraints.
  2. b) Create the transition matrix. Show it works via 2 iterations of the power method.

Solution

Design Strategy:

  1. Equal PR for d₁ and d₂ → make them symmetric (mutual links)
  2. Higher than d₃ → d₃ gives its vote to d₁ and d₂, but receives no incoming links

Links:

d₁→d₂, d₂→d₁, d₃→d₁, d₃→d₂

Transition Matrix M:

        d₁  d₂  d₃
  d₁  [ 0   1   0.5 ]
  d₂  [ 1   0   0.5 ]
  d₃  [ 0   0   0   ]
                

Google Matrix P (α = 0.15, n = 3):

  P = [ 0.05   0.90   0.475 ]
      [ 0.90   0.05   0.475 ]
      [ 0.05   0.05   0.05  ]
                

Power Iteration from x⁰ = (1/3, 1/3, 1/3):

x¹ = (0.475, 0.475, 0.05)

x² = (0.475, 0.475, 0.05) — converged after 1 step

Verification:

PR(d₁) = PR(d₂) = 0.475 > PR(d₃) = 0.05 ✓

Key insight: symmetry between d₁↔d₂ guarantees equal PR. d₃ has no incoming links → it gets only teleportation (0.15/3 = 0.05).

1 Lossy vs Lossless Compression ★★ Medium (2/4) Not in coursework
WS21/22
  1. a) Explain the difference between lossy and lossless compression.
  2. b) Provide two examples for lossy compression of the dictionary.
  3. c) Can you think of a method for lossy compression of the postings lists?

Solution

a) Definitions:

Lossless: Original data can be perfectly reconstructed — VBC, Gamma, Gap encoding, dictionary-as-a-string + blocking.

Lossy: Some information is permanently lost, but retrieval quality is acceptable.

b) Lossy Dictionary Examples:

  • Case folding: "THE" → "the" (loses case distinction)
  • Stemming: "running" → "run" (loses morphological information)
  • Stop word removal: "the", "a", "is" dropped entirely

c) Lossy Postings Compression:

Static index pruning: Drop postings for low-tf documents — a term that appears once in a 10,000-word document is unlikely to be relevant.

2 Classifier Decision Boundaries ★★ Medium (2/4) Not in coursework
WS19/20
Given data points from two classes, draw a separation that could have been created by: (a) Perceptron (b) SVM (c) kNN with k=1.

Solution — How to Identify Boundaries

ClassifierBoundary ShapeKey Feature
SVMStraight lineMaximum margin between classes (widest gap)
PerceptronStraight lineAny separating line (not necessarily max margin)
Naive BayesStraight line (axis-aligned)Based on feature independence assumption
kNN k=1Irregular, wrappingVoronoi tessellation — follows nearest points
kNN large kSmoother, less complexMore neighbors → smoother boundary

Rule of Thumb:

  • Straight line with max margin → SVM
  • Straight line (vertical/horizontal) → Naive Bayes
  • Irregular/jagged boundary wrapping around points → kNN k=1
  • Smooth but non-linear → kNN with larger k
3 XOR Problem & Neural Networks ★★ Medium (2/4) Not in coursework
WS20/21
Documents: A: He walks street B: She walks beach B: He walks beach A: She walks street Features: f₁ = "He" in doc, f₂ = "street" in doc
  1. a) Which classifiers can perfectly discriminate? (SVM / MaxEnt / NB / kNN)
  2. b) Find different features so ALL classifiers work.
  3. c) Draw a neural network with weights that classifies using f₁, f₂.

Solution

a) Why This Is XOR:

Docf₁ (He)f₂ (street)Class
She walks street01A
She walks beach00B
He walks street11B
He walks beach10A

Class A when features differ, Class B when features match — this is XOR. No linear classifier can solve XOR → NB, SVM, MaxEnt all fail.

Only kNN with k=1 works — each point is closest to itself.

b) Linearly Separable Features:

Need features where classes cluster together. E.g., a single combined feature like "He XOR street".

c) Neural Network (2→2→1):

  • Hidden H₁: (f₁ AND NOT f₂) — weights w₁=1, w₂=−1, bias=−0.5
  • Hidden H₂: (NOT f₁ AND f₂) — weights w₁=−1, w₂=1, bias=−0.5
  • Output: H₁ OR H₂ — weights both=1, bias=−0.5
  • Step activation: fire if sum > 0