Home > programming > Reservoir sampling

Reservoir sampling


Reservoir sampling is a family of randomized algorithms for randomly choosing n samples from a list S containing N items, where n is either a very large or unknown number. Typically N is large enough that the list doesn’t fit into main memory. The most common example was labelled Algorithm R by Jeffrey Vitter in his paper on the subject.

This simple O(N) algorithm as described in the Dictionary of Algorithms and Data Structures consists of the following steps (assuming that the arrays are one-based, and that the number of items to select, n,  is smaller than the size of the source array, S):

array R[k];    // result

integer i, j;

// fill the reservoir array
for each i in 1 to k do
    R[i] := S[i]
done;

// replace elements with gradually decreasing probability
for each i in k+1 to length(S) do
    j := random(1, i);   // important: inclusive range
    if j <= k then
        R[j] := S[i]
    fi
done
Advertisements
Categories: programming
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: