2012-04-14

Computing π with Chudnovsky and GMP

[Disclaimer: I'm not a math guy, so there are probably all kinds of inexactitudes in here. Feel free to add corrections to the comments.]

Pi, Pixelated

A post went by on reddit recently showing an employer trying to weed out candidates. The idea was that you were to find the first 7-digit palindrome in $\pi$, and that was the email address to send your resumestuffs. (The first 10-digit palindrome is 0136776310.)

Now, searching a string for palindromes is a pretty trivial exercise, as is finding the first million digits of $\pi$ on the Internet.

So the only thing that's left is to write a program to compute $\pi$ for us! How do we do that? Let us count the ways!

No, let's not. There are too many. We'll just cover a few, and then show an implementation of the fastest in C using the GNU MultiPrecision library, GMP.

First, a bit of history

People have been figuring ways to compute $\pi$ for ages, and most of them were very very tedious and time-consuming. Archimedes figured it out to 3 decimal places. 750 years later, we managed to get 4 digits. Another 900 years passed, and we got to 16. In the late 1800s, hapless William Shanks busted his buns for 15 years to compute $\pi$ to 707 places using John Machin's formula (seen in the app, above). In a stroke of bad luck, however, he blew it 527 digits in, so the remainder of the estimate is incorrect. [Nooooooo...!] Fortunately, no one was willing to spend the time to double-check his work, and he died in the blissful happiness of a job well-done.

Archimedes had used a geometric calculation to estimate $\pi$, but later on we mathed-up and figured out infinite series. It turns out, this opens up some serious doors, since $\pi$ has a habit of showing up all over the place.

The "go to" formula for many years was John Machin's. Following that in the early 20th century, super genius Srinivasa Ramanujan came up with some good speed-demons. In 1987 the Chudnovsky brothers came up with the blistering Chudnovsky algorithm, which computes over 14 digits of $\pi$ per iteration! (These three algorithms are featured in the app at the top of this page.)

Estimation with Infinite Series

If we have a formula that we can solve for $\pi$, and there is an infinite series that can produce a result for that solution, then we're golden.

What does that mean?

Here's an example: we know $\sin\pi/2=1$, so we can solve for $\pi$ like so: $\pi=2\arcsin1$. Now... is there an infinite series for arcsin? Sure, it's:

$$\arcsin z = \sum_{k=0}^\infty{\frac{(2k)!z^{2k+1}}{2^{2k}(k!)^2(2k+1)}}$$

Notes for non-math types: sigma, Σ, is a summation, and basically means we're going to loop $k$ from $0$ to infinity, and keep a running sum of the results of the formula inside the summation. The exclamation point, !, is the factorial which is the product of all integers between that number and 1. And, of course, the superscripts are exponents.

So the basic plan of attack is to write a for-loop to do the summation, and then compute the final result, like so:

// Pseudocode

// compute pi = 2 * arcsin(1)

sum = 0
z = 1  // since we're computing arcsin(1)

for (k = 0; k < ∞; k++) {
    sum = sum + giant_arcsin_formula(k, z)  // from above
}

pi = 2 * sum   // and there was much rejoicing

You might have noticed a slight implementation detail in the above code. Notably, it literally has an infinite loop in it, and you have to be somewhere by sometime, so who can wait that long?

So we give up after a while. We run it for a few thousand iterations, and we decide that's good enough, and call it a day. But as you saw with the app at the top of this page, some series converge (approach $\pi$) very very VERY slowly. So the quest has been on for the last few hundred years to come up with equations that converge more and more quickly so we can wrap it up and have some dinner.

Using arcsin to compute $\pi$ this way is really abysmally slow, so we'll be focusing on Chudnovsky instead.

Getting Precise with GMP

You might have noticed that the app at the top of this page gives out after just a few digits. This is because JavaScript uses 64-bit IEEE-754 floating point numbers, and that's as much precision as they offer. Also, some intermediate calculations might eventually overflow and fail, as well (since there are a lot of powers and factorials involved.)

How do we get more precision? How do we get precision of our choice, say, 4000 bits instead of just 64? We need a library that does Arbitrary Precision Arithmetic. One of my favorites is the GNU MultiPrecision Library, AKA gmplib or libgmp or GMP.

This library allows you to set the precision of your computations, and then has a variety of useful functions to help (e.g. arithmetic, exponentiation, factorial, etc.) Here's an example that computes $\sqrt2$ to 1024 bits:

// compile with: gcc -Wall -o sqrt2 sqrt2.c -lgmp

#include <stdio.h>
#include <stdlib.h>
#include <gmp.h>

int main(void)
{
    mpf_t result; // floating point variable
    char *output; // holds the output result string
    mp_exp_t exp; // holds the exponent for the result string

    mpf_set_default_prec(1024); // set default precision in bits

    mpf_init(result); // allocate and init variable

    mpf_sqrt_ui(result, 2); // compute sqrt(2)

    // convert to string, and print:
    output = mpf_get_str(NULL, &exp, 10, 0, result);
    printf("%.*s.%s\n", (int)exp, output, output+exp);

    // clean up
    free(output);     // mpf_get_str() calls malloc()
    mpf_clear(result);

    return 0;
}

And here is the result of a run:

$ ./sqrt2
1.4142135623730950488016887242096980785696718753769480731766797379907
324784621070388503875343276415727350138462309122970249248360558507372
126441214970999358314132226659275055927557999505011527820605714701095
599716059702745345968620147285174186408891986095523292304843087143214
50839762603627995251407989687253396

As you can see, we have way more digits than will fit in a standard long double; we have 1024-bits-worth, to be precise.

How many decimal digits are there in 1024 bits, anyway? How can we figure it out? (Fun Fact: did you know "bit" is a contraction of "binary digit"? Of course you did!)

Thinking about it intuitively, we know that 1 bit can encode two values: 1 or 0. We know that 2 bits can store 4 values: 0, 1, 2, and 3. But for decimal numbers, we need to store the values 0-through-9... so we need more bits.

3 bits gets us 0-7 ($2^3$ different values), which is almost there, but 4 bits gets us 0-15 ($2^4$), which is too many. Sure we can encode decimal numbers at one decimal digit per nibble—that's called Binary Coded Decimal—but it wastes part of a bit.

So intuitively, it's between 3 and 4 bits. Like 3.4 bits per decimal digit or something. How can we do better?

In general, an $x$-decimal-digit number can encode $10^x$ different values. Similarly, a $y$-bit (binary digit) number can encode $2^y$ different values. And for any $y$-bit value, there is some $x$-decimal-digit number that can hold the same value.

And we know the relation between $x$ and $y$ is some constant factor (we speculated 3.4 bits-per-decimal-digit, earlier). So we can say that $y=fx$ where $f$ is probably around $3.4$, and have fun:

$$\begin{align} 10^x &= 2^y \\ 10^x &= 2^{fx} && \small{\text{since }y=fx} \\ fx &= \log_2(10^x) && \small{\text{solve for }fx} \\ fx &= x\log_2{10} \\ f &= \log_2{10} \\ &= 3.32192809488\\ \end{align}$$

Hey, so it's 3.32192-blahblahblah bits-per-decimal-digit! Our 3.4 guess wasn't bad at all. And now we know how many bits we need to hold a certain number of digits:
double bits_per_digit = 3.3219280948873626;
int digits = 100;

int bits = digits * bits_per_digit + 1; // +1 in case of round-off

GMP can do a lot more than just square roots. There are oodles of functions, some of which will make you especially happy if you're trying to implement RSA. For Chudnovsky, we just need square root, exponent, factorial, and arithmetic.

Chudnovsky, we meet at last

I'm not even going to begin to pretend I know how the Chudnovsky equation came about:

$$\frac{426880\sqrt{10005}}{\pi} = \sum_{k=0}^\infty \frac{(6k)!(13591409+545140134k)} {(3k)!(k!)^3(-640320)^{3k}}$$

but there it is in all its glory. Solve for $\pi$, and you're done!

Again, it's just like the arcsine example, above, where we run the series for so many iterations until we feel the estimate is good enough. Now, for Chudnovsky, we know how many decimal digits it generates per iteration, so it's easy to compute the number of iterations needed for a desired result. (I don't know how they figured how many digits per iteration; maybe someone can enlighten us.)

Here's the inner part of the loop from the source file, chudnovsky.c, that runs the summation. Obviously the larger iterations is, the longer it's going to run, and the better the $\pi$ approximation will be.

for (k = 0; k < iterations; k++) {
    threek = 3*k;

    mpz_fac_ui(a, 6*k);  // (6k)!

    mpz_set_ui(b, 545140134); // 13591409 + 545140134k
    mpz_mul_ui(b, b, k);
    mpz_add_ui(b, b, 13591409);

    mpz_fac_ui(c, threek);  // (3k)!

    mpz_fac_ui(d, k);  // (k!)^3
    mpz_pow_ui(d, d, 3);

    mpz_ui_pow_ui(e, 640320, threek); // -640320^(3k)
    if ((threek&1) == 1) { mpz_neg(e, e); }

    // numerator (in A)
    mpz_mul(a, a, b);
    mpf_set_z(A, a);

    // denominator (in B)
    mpz_mul(c, c, d);
    mpz_mul(c, c, e);
    mpf_set_z(B, c);

    // result
    mpf_div(F, A, B);

    // add on to sum
    mpf_add(sum, sum, F);
}

You should be able to match the comments in the source up to the equation to see the individual bits at work. (Note this code is not optimized for speed in the least; it merely matches the equation as closely as possible.)

$ ./chudnovsky 20000
3.14159265358979323846264338327950288419716939937510582097
4944592307816406286208998628034825342117067982148086513282
3066470938446095505822317253594081284811174502841027019385
2110555964462294895493038196442881097566593344612847564823
... etc.

Faster!

One cool thing about the summation is that you can break it up between cores. You could have one core do half the work, and another core do the other half, and then add their results together when they're both done. It's embarrassingly parallel. It should be possible to take something like OpenMP, which I blogged about earlier, and speed up the computation considerably.

I'm not going to bother implementing that, since Hanhong Xue and David Carver did it better. Links to that source is below.

Beej's Bit Bucket