Database of primes up to one hundred billion on the knee

image

The best-known algorithm for finding all primes not greater than a given one is the sieve of Eratosthenes . It works great for numbers up to billions, maybe up to tens of billions, if neatly written. However, everyone who likes to have fun with prime numbers knows that you always want to have as many on hand as possible. Once, to solve one problem on a hackerrank, I needed an in-memory database of primes up to one hundred billion. With maximum memory optimization, if you represent odd numbers in a Eratosthenes sieve with a bit array, its size will be about 6 gigabytes, which did not fit into the memory of my laptop. There is a modification of the algorithm that is much less demanding in memory (dividing the original range of numbers into several pieces and processing one piece at a time) -segmented sieve of Eratosthenes , but it is more difficult to implement, and the whole result will not fit into the memory anyway. Below, I bring to your attention an algorithm that is almost as simple as the Eratosthenes sieve, but that gives double optimization by memory (that is, a database of primes up to one hundred billion will occupy about 3 gigabytes, which should already fit into the memory of a standard laptop).

First, let us recall how the sieve of Eratosthenes works: in an array of numbers from 1 to N, numbers divisible by two are crossed out, then numbers divisible by 3, and so on. The numbers remaining after crossing out will be simple:

2 3 . 5 . 7 . 9 . 11 . 13 . 15 . 17 . 19 . 21 . 23 . 25 . 27  . 29  ...
2 3 . 5 . 7 . . . 11 . 13 .  . . 17 . 19 .  . . 23 . 25 .  .  . 29  ...
2 3 . 5 . 7 . . . 11 . 13 .  . . 17 . 19 .  . . 23 .  . .  .  . 29  ...

C # code
class EratospheneSieve
{
    private bool[] Data;

    public EratospheneSieve(int length)
    {
        Data = new bool[length];

        for (int i = 2; i < Data.Length; i++) Data[i] = true;

        for (int i = 2; i * i < Length; i++)
        {
            if (!Data[i]) continue;
            for (int d = i * i; d < Length; d += i) Data[d] = false;
        }
    }

    public int Length => Data.Length;

    public bool IsPrime(int n) => Data[n];
}


On my laptop, it counts up to 15 billion seconds and consumes a gigabyte of memory.

The algorithm we will use is called Wheel Factorization . To understand its essence, we write natural numbers with a tablet of 30 pieces in a row:

 0  1  2  3  4  5 ... 26 27 28 29
30 31 32 33 34 35 ... 56 57 58 59
60 61 62 63 64 65 ... 86 87 88 89
...

And cross out all the numbers divisible by 2, 3, and 5:

.  1 . . . . .  7 . . . 11 . 13 . . . 17 . 19 . . . 23 . . . . . 29
. 31 . . . . . 37 . . . 41 . 43 . . . 47 . 49 . . . 53 . . . . . 59
. 61 . . . . . 67 . . . 71 . 73 . . . 77 . 79 . . . 83 . . . . . 89
...

It can be seen that in each row numbers are crossed out at the same positions. Since we crossed out only composite numbers (with the exception, in fact, of the numbers 2, 3, 5), then prime numbers can only be in unmarked positions.

Since there are eight such numbers in each row, this gives us the idea to represent each thirty numbers with one byte, each of the eight bits of which will indicate the simplicity of the number in the corresponding position. This coincidence is not only aesthetically beautiful, but also greatly simplifies the implementation of the algorithm!

image

Technical implementation


Firstly, our goal is to reach one hundred billion, so four byte numbers can no longer be dispensed with. Therefore, our algorithm will look something like this:

class Wheel235
{
    private byte[] Data;

    public Wheel235(long length)
    {
        ...
    }
    public bool IsPrime(long n)
    {
        ...
    }
    ...
}

For ease of implementation, we will round length up to the nearest number divisible by 30. Then

class Wheel235
{
    private byte[] Data;

    public Wheel235(long length)
    {
        // ensure length divides by 30
        length = (length + 29) / 30 * 30; 
        Data = new byte[length/30];
        ...
    }

    public long Length => Data.Length * 30L;

    ...
}

Next, we need two arrays: one translates the index 0..29 into the bit number, the second, on the contrary, the bit number into the thirty position. Using these arrays and a bit of bit arithmetic, we build a one-to-one correspondence between natural numbers and bits in the Data array. For simplicity, we use only two methods: IsPrime (n), which allows you to find out if the number is prime, and ClearPrime (n), which sets the corresponding bit to 0, marking the number n as compound:

class Wheel235
{
    private static int[] BIT_TO_INDEX = new int[] { 1, 7, 11, 13, 17, 19, 23, 29 };

    private static int[] INDEX_TO_BIT = new int[] {
        -1, 0,
        -1, -1, -1, -1, -1, 1,
        -1, -1, -1, 2,
        -1, 3,
        -1, -1, -1, 4,
        -1, 5,
        -1, -1, -1, 6,
        -1, -1, -1, -1, -1, 7,
    };

    private byte[] Data;

    ...

    public bool IsPrime(long n)
    {
        if (n <= 5) return n == 2 || n == 3 || n == 5;
        int bit = INDEX_TO_BIT[n % 30];
        if (bit < 0) return false;
        return (Data[n / 30] & (1 << bit)) != 0;
    }

    private void ClearPrime(long n)
    {
        int bit = INDEX_TO_BIT[n % 30];
        if (bit < 0) return;
        Data[n / 30] &= (byte)~(1 << bit);
    }
}

Note that if the number is divisible by 2, 3 or 5, then the bit number will be -1, which means that the number is obviously composite. Well and, of course, we are processing a special case n <= 5.

Now, in fact, the algorithm itself, which almost literally repeats the sieve of Eratosthenes:

    public Wheel235(long length)
    {
        length = (length + 29) / 30 * 30;
        Data = new byte[length / 30];
        for (long i = 0; i < Data.Length; i++) Data[i] = byte.MaxValue;

        for (long i = 7; i * i < Length; i++)
        {
            if (!IsPrime(i)) continue;
            for (long d = i * i; d < Length; d += i) ClearPrime(d);
        }
    }

Accordingly, the whole resulting class:

looks like that
class Wheel235
{
    private static int[] BIT_TO_INDEX = new int[] { 1, 7, 11, 13, 17, 19, 23, 29 };

    private static int[] INDEX_TO_BIT = new int[] {
        -1, 0,
        -1, -1, -1, -1, -1, 1,
        -1, -1, -1, 2,
        -1, 3,
        -1, -1, -1, 4,
        -1, 5,
        -1, -1, -1, 6,
        -1, -1, -1, -1, -1, 7,
    };

    private byte[] Data;

    public Wheel235(long length)
    {
        // ensure length divides by 30
        length = (length + 29) / 30 * 30;
        Data = new byte[length / 30];
        for (long i = 0; i < Data.Length; i++) Data[i] = byte.MaxValue;

        for (long i = 7; i * i < Length; i++)
        {
            if (!IsPrime(i)) continue;
            for (long d = i * i; d < Length; d += i) ClearPrime(d);
        }
    }

    public long Length => Data.Length * 30L;

    public bool IsPrime(long n)
    {
        if (n >= Length) throw new ArgumentException("Number too big");
        if (n <= 5) return n == 2 || n == 3 || n == 5;
        int bit = INDEX_TO_BIT[n % 30];
        if (bit < 0) return false;
        return (Data[n / 30] & (1 << bit)) != 0;
    }

    private void ClearPrime(long n)
    {
        int bit = INDEX_TO_BIT[n % 30];
        if (bit < 0) return;
        Data[n / 30] &= (byte)~(1 << bit);
    }
}


There is only one small problem: this code does not work for numbers larger than about 65 billion! When you try to start it with such numbers, the program crashes with an error:

Unhandled Exception: System.OverflowException: Array dimensions exceeded supported range.

The problem is that in c # arrays cannot have more than 2 ^ 31 elements (apparently because internal implementation uses a four-byte array index). One option is to create, for example, a long [] array instead of a byte [] array, but this will complicate bit arithmetic a bit. For simplicity, we will go the other way, creating a special class that simulates the desired array, holding two short arrays inside. At the same time, we will give him the opportunity to save ourselves to disk so that you can calculate primes once, and then use the database again:

    class LongArray
    {
        const long MAX_CHUNK_LENGTH = 2_000_000_000L;
        private byte[] firstBuffer;
        private byte[] secondBuffer;

        public LongArray(long length)
        {
            firstBuffer = new byte[Math.Min(MAX_CHUNK_LENGTH, length)];
            if (length > MAX_CHUNK_LENGTH) secondBuffer = new byte[length - MAX_CHUNK_LENGTH];
        }

        public LongArray(string file)
        {
            ...
        }

        public long Length => firstBuffer.LongLength + (secondBuffer == null ? 0 : secondBuffer.LongLength);

        public byte this[long index]
        {
            get => index < MAX_CHUNK_LENGTH ? firstBuffer[index] : secondBuffer[index - MAX_CHUNK_LENGTH];
            set
            {
                if (index < MAX_CHUNK_LENGTH) firstBuffer[index] = value; 
                else secondBuffer[index - MAX_CHUNK_LENGTH] = value;
            }
        }

        public void Save(string file)
        {
            ...
        }

    }


Finally, combine all the steps in

final version of the algorithm
class Wheel235
{
    class LongArray
    {
        const long MAX_CHUNK_LENGTH = 2_000_000_000L;
        private byte[] firstBuffer;
        private byte[] secondBuffer;

        public LongArray(long length)
        {
            firstBuffer = new byte[Math.Min(MAX_CHUNK_LENGTH, length)];
            if (length > MAX_CHUNK_LENGTH) secondBuffer = new byte[length - MAX_CHUNK_LENGTH];
        }

        public LongArray(string file)
        {
            using(FileStream stream = File.OpenRead(file))
            {
                long length = stream.Length;
                firstBuffer = new byte[Math.Min(MAX_CHUNK_LENGTH, length)];
                if (length > MAX_CHUNK_LENGTH) secondBuffer = new byte[length - MAX_CHUNK_LENGTH];
                stream.Read(firstBuffer, 0, firstBuffer.Length);
                if(secondBuffer != null) stream.Read(secondBuffer, 0, secondBuffer.Length);
            }
        }

        public long Length => firstBuffer.LongLength + (secondBuffer == null ? 0 : secondBuffer.LongLength);

        public byte this[long index]
        {
            get => index < MAX_CHUNK_LENGTH ? firstBuffer[index] : secondBuffer[index - MAX_CHUNK_LENGTH];
            set
            {
                if (index < MAX_CHUNK_LENGTH) firstBuffer[index] = value; 
                else secondBuffer[index - MAX_CHUNK_LENGTH] = value;
            }
        }

        public void Save(string file)
        {
            using(FileStream stream = File.OpenWrite(file))
            {
                stream.Write(firstBuffer, 0, firstBuffer.Length);
                if (secondBuffer != null) stream.Write(secondBuffer, 0, secondBuffer.Length);
            }
        }

    }

    private static int[] BIT_TO_INDEX = new int[] { 1, 7, 11, 13, 17, 19, 23, 29 };

    private static int[] INDEX_TO_BIT = new int[] {
        -1, 0,
        -1, -1, -1, -1, -1, 1,
        -1, -1, -1, 2,
        -1, 3,
        -1, -1, -1, 4,
        -1, 5,
        -1, -1, -1, 6,
        -1, -1, -1, -1, -1, 7,
    };

    private LongArray Data;

    public Wheel235(long length)
    {
        // ensure length divides by 30
        length = (length + 29) / 30 * 30;
        Data = new LongArray(length / 30);
        for (long i = 0; i < Data.Length; i++) Data[i] = byte.MaxValue;

        for (long i = 7; i * i < Length; i++)
        {
            if (!IsPrime(i)) continue;
            for (long d = i * i; d < Length; d += i) ClearPrime(d);
        }
    }

    public Wheel235(string file)
    {
        Data = new LongArray(file);
    }

    public void Save(string file) => Data.Save(file);

    public long Length => Data.Length * 30L;

    public bool IsPrime(long n)
    {
        if (n >= Length) throw new ArgumentException("Number too big");
        if (n <= 5) return n == 2 || n == 3 || n == 5;
        int bit = INDEX_TO_BIT[n % 30];
        if (bit < 0) return false;
        return (Data[n / 30] & (1 << bit)) != 0;
    }

    private void ClearPrime(long n)
    {
        int bit = INDEX_TO_BIT[n % 30];
        if (bit < 0) return;
        Data[n / 30] &= (byte)~(1 << bit);
    }

}


As a result, we got a class that can calculate primes not exceeding one hundred billion and save the result to disk (I had this code on my laptop for 50 minutes; the size of the created file was about 3 gigabytes, as expected):

        long N = 100_000_000_000L;
        Wheel235 wheel = new Wheel235(N);
        wheel.Save("./primes.dat");

And then read from the disk and use:

        DateTime start = DateTime.UtcNow;
        Wheel235 loaded = new Wheel235("./primes.dat");
        DateTime end = DateTime.UtcNow;
        Console.WriteLine($"Database of prime numbers up to {loaded.Length:N0} loaded from file to memory  in {end - start}");
        long number = 98_000_000_093L;
        Console.WriteLine($"{number:N0} is {(loaded.IsPrime(number) ? "prime" : "not prime")}!");

All Articles