Recently, I’ve been working through Project Euler in order to improve my core programming skills.

One of those recurring problems requires efficiently calculating and testing for prime numbers. The first algorithm that comes to mind is The Sieve of Eratosthenes. The Sieve, is one of many prime sieves, and is a simple yet time efficient algorithm for finding all the primes below a certain limit.

The Algorithm

  1. Make a table one entry for every number \(2 \leq n \leq limit\)
  2. Starting at 2, cross out all multiples of 2, not counting 2 itself.
  3. Move up to the next number that hasn’t been crossed out
  4. Repeat Step 2-3 up till \(\sqrt(n)\)
The Sieve of Eratosthenes can be shown to have a time complexity of \(\mathcal{O}(n\log{}\log{n})\).

Visually we can depict each loop removing values from the list of real numbers until all that is left are the primes.

sieve

Unfortunately, this solution starts to become less viable for larger problem sizes.

Since it requires a table of every number to the last integer in memory, the space complexity of sieves generally grows in the order of \(\mathcal{O}(n)\). In order to deal with memory issues there are sieve algorithms called segmented sieves which map and distribute the problem into smaller sizes and are computed in parallel.

Here’s how a basic sieve would look in Python.

The Code

I’ve written out the function definition using type annotations, which explicitly describe the type of arguments a function take and what types they return. This is a great documentation tool built-in, but only available for Python 3 and not Python 2.
# Type Annotation Prototype
def foo(x: type) -> type:

Sieve

def sieve(n: int) -> list:
    """Sieve away and only primes are left."""
    primes = 2*[False] + (n-1)*[True]
    for i in range(2, int(n**0.5+1)):
        for j in range(i*i, n+1, i):
            primes[j] = False
    return [prime for prime, checked in enumerate(primes) if checked]

It’s important to notice we implement lists as the main data structure. Although the algorithm requires an in memory table, we can construct it using any tool. Here is another sieve but with it’s guts swapped with sets.

Run it

Sets: Unordered Collections

def set_sieve(n: int) -> set:
    """
    Sets are mutable, unordered collections which are useful
    for quick membership testing and math operations.
    """
    primes = set(range(2, n+1))
    for i in range(2,int(n**0.5+1)):
        if i in primes:
            primes -= set(range(i*i, n+1, i))
    return primes

In this example, the primary difference with using sets and lists, is the lack of a list-comprehension for composing the function return value. As well, the second for-loop is substituted with one -=, binary assignment operator, which for sets has been overloaded with a difference update method or the mathematical complement \(\forall \{\text{n}|\text{ n }\in \text{A : n }\notin\text{ B}\}\)

Utilizing sets provides us a cleaner syntax for algorithms involving math. In this example, we first create a universal set, and iteriatively delete it’s factors. It would also be correct to construct a null set and iteratively insert all factors first. Once all factors have been collected, perform a removal from the universal set.

Insertion Sets:

def set_insertion_sieve(n: int) -> set:
    """Performing insertion over deletion"""
    factors = set()
    for i in range(2,int(n**0.5+1)):
        if i not in factors:
            factors |= set(range(i*i, n+1, i))
    return set(range(2,n+1)) - factors

Algorithmic Analysis

The question is, is it really possible characterize runtime without actually running a single benchmark? For much larger programs, it might not be, but for this isolated case, it’s possible to make some good predictions.

Both set_sieve and set_insertion_sieve perform similar operations until the second iterative block. Because set operations are primarily implemeneted as hashes, we can assert that both set insertion and set deletion are \(\mathcal{O}(1)\) time operations in relation to problem-size and \(\mathcal{O}(n)\) time operations in relation hash-size. Therefore, it’s possible that either of these soltuions could run faster because with each iteration of set_sieve the hash-size decreases, while set_insertion_sieve hash size increases.

But what if we knew the proportion of primes vs non-primes in a series?

Prime Number Theorem

Early intuition would have biased us about the general abundance of primes and non-primes. But the prime number theorem is a mathematical proof between the amount of primes and non-primes that exist in the set of real numbers.

$$ \large\lim_{x\to \infty} \frac{\pi(x)}{\frac{x}{\ln(x))}} = 1 \tag{def} $$

It’s basic definition is that as we move across the x-axis of real numbers \(\pi(x)\) a function computing number of primes at x, we can expect as \(x\to \infty\) that \(\pi(x) \to \frac{x}{\ln(x)}\), and the entire expression approaches 1.

$$ \large \pi(x) \tag{asymtotic} \thicksim \large \frac{x}{\ln(x))} $$

From the asymptotic expression, we can also express the function for computing all factors of \(n\) to then be

$$\begin{align} \large\lim_{n\to \infty} n\left(1 - \frac{1}{\ln(n)}\right) \tag{factors} \end{align}$$

We find insertion actually runs faster by an order of \(\frac{1}{\ln(n)}\). But this is a diminishing optimization which converges back to the original sieve speed at larger problem sizes.

Data Structures

Lists and sets are general purpose data-structures and are useful for solving many different problems. However, their general nature cause them to be less useful for specialized tasks, or when high performance is needed.

It’s also easy to fall into the trap of using general tools when better options are available.

Maslow tells us

"If the only tool you have is a hammer everything looks like a nail."

The Psychology of Science (1964)

While the choice data structure is orthoganal to correctness, it’s important to use the best tool for the job.

So what is the most optimal data structure to perform these calculation?

Enter: The Array

import numpy as np

def np_sieve(n: int) -> np.ndarray:
    """
    Sieve with it's guts swapped with numpy
    ndarray
    """
    primes = np.ones(n+1, dtype=np.bool)
    for i in np.arange(2, n**0.5+1, dtype=np.uint32):
        if primes[i]:
            primes[i*i::i] = False
    return np.nonzero(primes)[0][2:]

Numpy is a third party library containing array based data structures and fast vectorizable methods for numerical operations. Here we operate on numpy’s n-dimmensional arrays which allocate fixed strides of memory containing statically typed elements. The downside of implementing a statically typed subset within a dynamically typed language is to forego many of the niceties python has to offer. However, the main benefit is random access to individual array elements in memory, instantly without traversal.

Everything inside a numpy array must be homogenous, and to be efficient we must know the exact length of the array beforehand. This is because the task of extending or contracting an array once initialized requires re-copying all elements into a new array of new length. We also have to be strict when initializing bools ints and floats with proper fixed-widths for memory allocation.

In our example we are filling up an array with 8-bit bools all set to True. We then iterate through an array of factors set to 32-bit unsigned integers, and allocate it’s multiples as False. Finally, we return all values from the boolean array from 2 to the end which are still True. However, unlike slicing python lists which generates a whole new copy from the original, slicing numpy arrays returns is an in-memory view and is a much cheaper operation. Therefore, the vector operation primes[i*i::i] is just an in-memory view of the same primes array requiring no more levels of indirection or memory allocation to construct.

Running the Code

Testing for Primality

So now that we’ve written the sieve in a bunch of different ways. How do we know that each way is correct.

The most obvious way to figure if a number’s prime, is to try dividing the number by all the numbers between \(2 \leq x \leq n-1\).

def all_primes(primes: iter) -> bool:
    for prime in primes:
        if any(prime % n == 0 for n in range(2, int(prime**0.5))):
            return False
    return True
Now lets test to see if our sieves can correctly find the first 100 prime numbers
>>> all_primes(sieve(10**3))
True
>>> all_primes(np_sieve(10**3))
True
>>> all_primes(set_sieve(10**3))
True
>>> all_primes(set_insertion_sieve(10**3))
True

A Timer

Now knowing that our implementations are correct, lets see how fast they run. We will construct a timer taking advantage of what is known as a context manager. Basically, a context manager allows us to use the with construct and will perform an operation before and after by overloading the __enter__ and __exit__ methods. We can create a context manager merely by using a decorator from the contextlib module.

import time
from contextlib import contextmanager

@contextmanager
def timer(label):
    start = time.time()
    try:
        yield
    finally:
        end = time.time()
    print('{label}: {time:03.3f} sec'.format(
        label=label, time=end-start)
    )
It’s also possible to use Python’s built in timeit module, which can deal with much more complex and isolated timing instances.

Some Benchmarks

Now lets see how fast our seives can find the first million digits.
>>> with timer('sieve'):
        sieve(10**6)

sieve: 0.454 sec
>>> with timer('set_sieve'):
        set_sieve(10**6)

set_sieve: 0.735 sec
>>> with timer('set_insertion_sieve'):
        set_insertion_sieve(10**6)

set_factor_sieve: 0.587 sec
>>> with timer('numpy'):
        np_sieve(10**6)

numpy: 0.008 sec

Things to Note

  • Overall lists are better optimized than sets for inline operations.
  • As we expected set_insertion_sieve performed better than set_sieve, but only marginally.
  • Numpy arrays are fast! They outperform other built in data structures by 2 orders of magnitude!

Comments

comments powered by Disqus