Monday, November 15, 2010

Brief History and Motivation Behind the Sage Coercion Model

In Sage (like in Magma), most objects are either elements or parents. Think of a "parent" as a set. This Parent/Element idea is a powerful algebraic approach to implementing mathematical objects on a computer, which does not exist in Mathematica, Maple, PARI, Maxima, and many other math software platforms.

I learned about this approach from using Magma:

%magma
R<x> := PolynomialRing(Integers());
print Parent(x);
///
Univariate Polynomial Ring in x over Integer Ring

(In this blog post I'll put %magma above code that gets input to Magma; all other code gets input to Sage. The input and output is separated by ///.)

In Sage:

R.<x> = ZZ[]
parent(x)
///
Univariate Polynomial Ring in x over Integer Ring

x.parent()
///
Univariate Polynomial Ring in x over Integer Ring


isinstance(ZZ, Parent)
///
True

isinstance(2, Parent)
///
False


Automatic Coercions:

"The primary goal of coercion is to be able to transparently do arithmetic, comparisons, etc. between elements of distinct parents."

When I used to try to get people to use Magma, perhaps the number one complaint I heard about Magma was that doing arithmetic with objects having distinct parents was difficult and frustrating.

For the first year, in Sage, there was a very simple coercion system:

  • If you try to compute a + b or a * b, first somehow put b into the parent of a, then do the arithmetic.

That seriously sucked.  E.g., 

    Mod(2,7) + 6

was completely different than

    6 + Mod(2,7)!

The first was Mod(1,7), and the second was the integer 8.   This makes understanding code difficult and unpredictable.

So I rewrote coercion to be a bit better (this was a really painful rewrite that I mostly did myself over several hard months of work):

  • If you try to compute a + b (or a*b), check for a "canonical coercions" from the parent of a into b, or failing that, from the parent of b into a.  If there aren't any raise an error.  If there is one, use it.  There won't be both unless there is some canonical isomorphism. 
  • There are some axioms about what a canonical coercion is.  At least it is homomorphism. 

Then we decided that there is a canonical homomorphism Z --> Z/7Z, but there is not one Z/7Z --> Z since there is no ring homomorphism in this direction, hence the above makes sense in either order.  

One implication of this new model was that parent objects have to be immutable, i.e., you can't fundamentally change them after you make them.  This is why in Sage you must specify the name of the generator of a polynomial ring at creation time, and can't change it.  In Magma, it is typical to specify the name only later if you want.

Objects must be immutable because the canonical maps between them depend on the objects themselves, and we don't want them to just change left and right at runtime. 


%magma
R := PolynomialRing(RationalField(), 2);
f := R.1^3 + 3*R.2^3 - 4/5;
print f;
///
$.1^3 + 3*$.2^3 - 4/5
[ $.1, $.2 ]

%magma
AssignNames(~R, ["x", "y"]);
print f;
[R.1, R.2]
///
x^3 + 3*y^3 - 4/5
[x, y]

%magma
AssignNames(~R, ["z", "w"]);
print f;
///
z^3 + 3*w^3 - 4/5


Now in Sage:
R = PolynomialRing(QQ)
///
TypeError: You must specify the names of the variables.

R.<x,y> = PolynomialRing(QQ)
f = x^3 + 3*y^3 - 4/5; f
///
x^3 + 3*y^3 - 4/5

Note: In Sage, you can can use a with block to temporarily change the names if you really need to for some reason.  This is allowed since at the end of the with block the names are guaranteed to be changed back.


with localvars(R, ['z','w']):
    print f
print "back?", f    
///
z^3 + 3*w^3 - 4/5
back? x^3 + 3*y^3 - 4/5


But this new model had a major problem too, e.g., if x in Z[x] then "x + 1/2" would FAILS!   This is because 1/2 does not coerce into Z[x] (the parent of x), and x does not coerce into Q (the parent of 1/2).

 

Maybe the implementors of Magma have the answers?  Evidently not. 


%magma
R<x> := PolynomialRing(Integers());
x + 1/2;
///
Runtime error in '+': Bad argument types
Argument types given: RngUPolElt[RngInt], FldRatElt

Robert Bradshaw did though, and now it is in Sage:


R.<x> = ZZ[]
x + 1/2
///
x + 1/2

His new design is (for the most part) what Sage actually uses now.

He launched an effort in 2008 (see the Dev Days 1 Wiki) to implement a rewrite of the coercion model to his new design.  This ended up swallowing up half the development effort at the workshop, and was a massive amount of work, since every parent structure and element had to have some modifications made to it. 

This meant people changing a lot of code all over Sage that they didn't necessarily understand, and crossing their fingers that the doctest test suite would catch their mistakes.    This was SCARY.   After much work, none of this went into Sage.  It was just way too risky.  This failure temporarily (!) burned out some developers. 

Robert Bradshaw, on the other hand, persisted and came up with a new approach that involved migrating Sage code gradually.  I.e., he made it so that the old coercion model was still fully supported simultaneously with the new one, then he migrated a couple of parent structures, and got the code into Sage.   I'm sure not everything is migrated, even today.  There are two points to what he did:

  1. He extended the rules so x + 1/2 works, i.e., the result of a+b need not live in the parent of a or the parent of b.
  2. He made implementing coercion much more top down: simply implement various methods in a class that derives from Parent.  This meant that instead of coercion being rules and conventions that people have to understand and implement in their own code all over Sage, they just implement a small amount of code and the rules (and benefits) are all enforced automatically. 


The Coercion Model

The coercion model is explained here: http://sagemath.org/doc/reference/coercion.html

 

Monday, November 8, 2010

Getting Started With Cython

Getting Started With Cython



Quote about Cython:

Andrew Tipton says "I'm honestly never going back to writing C again. Cython gives me all the expressiveness of Python combined with all the performance and close-to-the-metal-godlike-powers of C. I've been using it to implement high-performance graph traversal and routing algorithms and to interface with C/C++ libraries, and it's been an absolute amazing productivity boost."  Yep.


Cython has two major use cases

  1. Extending the CPython interpreter with fast compiled modules,
  2. Interfacing Python code with external C/C++ libraries.

Cython supports type declarations

  1. For changing code from having dynamic Python semantics into having static-and-fast (but less generic) C semantics.
  2. Directly manipulating C data types defined in external libraries.

Tutorial: Building Your First Cython Code by Hand

It happens in two stages:
  1. A .pyx file is compiled by Cython to a .c or .cpp file.
  2. The .c or .cpp file is compild by a C compiler (such as GCC) to a .so file.
Let's try it now:
First, create a file sum.pyx that contains the following code (see this directory for original code files):

def sum_cython(long n):
    cdef long i, s = 0
    for i in range(n):
        s += i
    return s
Then use Cython to compile it:


Since we're using Sage, you can do

bash$ sage -cython sum.pyx
bash$ ls
sum.c  sum.pyx
Notice the new file sum.c.  Compile it with gcc as follows on OS X: 

bash$ sage -sh
bash$ gcc -I$SAGE_ROOT/local/include/python2.6 -bundle -undefined dynamic_lookup sum.c -o sum.so 

On Linux, do:

bash$ sage -sh
bash$ gcc -I$SAGE_ROOT/local/include/python2.6 -shared -fPIC sum.c -o sum.so
Finally, try it out.


You must run Sage from the same directory that contains the file sum.so. When you type import sum below, the Python interpreter sees the file sum.so, opens it, and it contains functions and data that define a compiled "Python C-extension module", so Python can load it (like it would like a module like sum.py).

bash$ sage
-------------------------------------------------
| Sage Version 4.6, Release Date: 2010-10-30     
| Type notebook() for the GUI, and license() for
-------------------------------------------------
sage: import sum
sage: sum.sum_cython(101)
5050
sage: timeit('sum.sum_cython(101)')
625 loops, best of 3: 627 ns per loop
sage: timeit('sum.sum_cython(101)', number=10^6)    # better quality timing
1000000 loops, best of 3: 539 ns per loop

Finally, take a look at the (more than 1000 line) autogenerated C file sum.c:
bash$ wc -l sum.c
    1178 sum.c
bash$ less sum.c
...

Notice code like this, which illustrates that Cython generates code that supports both Python2 and Python3:


#if PY_MAJOR_VERSION < 3
  #define __Pyx_BUILTIN_MODULE_NAME "__builtin__"
#else
  #define __Pyx_BUILTIN_MODULE_NAME "builtins"
#endif

#if PY_MAJOR_VERSION >= 3
  #define Py_TPFLAGS_CHECKTYPES 0
  #define Py_TPFLAGS_HAVE_INDEX 0
#endif
The official Python docs say: "If you are writing a new extension module, you might consider Cython. It translates a Python-like language to C. The extension modules it creates are compatible with Python 3.x and 2.x."  

If you scroll down further you'll get past the boilerplate and see the actual code:

...
  /* "/Users/wstein/edu/2010-2011/581d/notes/2010-11-08/sum.pyx":2
 * def sum_cython(long n):
 *     cdef long i, s = 0             # <<<<<<<<<<<<<<
 *     for i in range(n):
 *         s += i
 */
  __pyx_v_s = 0;

  /* "/Users/wstein/edu/2010-2011/581d/notes/2010-11-08/sum.pyx":3
 * def sum_cython(long n):
 *     cdef long i, s = 0
 *     for i in range(n):             # <<<<<<<<<<<<<<
 *         s += i
 *     return s
 */
  __pyx_t_1 = __pyx_v_n;
  for (__pyx_t_2 = 0; __pyx_t_2 < __pyx_t_1; __pyx_t_2+=1) {
    __pyx_v_i = __pyx_t_2;

    /* "/Users/wstein/edu/2010-2011/581d/notes/2010-11-08/sum.pyx":4
 *     cdef long i, s = 0
 *     for i in range(n):
 *         s += i             # <<<<<<<<<<<<<<
 *     return s
 */
    __pyx_v_s += __pyx_v_i;
  }
...
There is a big comment that shows the original Cython code with context and a little arrow pointing at the current line (these comment blocks with context were I think the first thing I personally added to Pyrex... before, it just gave that first line with the .pyx filename and line number, but nothing else).  Below that big comment, there is the actual C code that Cython generates.  For example, the Cython code  s += i is turned into the C code __pyx_v_s += __pyx_v_i;.  


The Same Extension From Scratch, for Comparison

If you read Extending and Embedding Python you'll see how you could write a C extension module from scratch that does the same thing as sum.so above. Let's see what this is like, for comparison. Given how simple sum.pyx is, this isn't so hard. When creating more complicated Cython code---e.g., new extension classes, more complicated type conversions, and memory management---writing C code directly quickly becomes unwieldy.
First, create a file sum2.c as follows:


#include <Python.h>

static PyObject * 
sum2_sum_c(PyObject *self, PyObject *n_arg)
{
    long i, s=0, n = PyInt_AsLong(n_arg);
    
    for (i=0; i<n; i++)  {
 s += i;
    }
    PyObject* t = PyInt_FromLong(s);
    return t;
}

static PyMethodDef Sum2Methods[] = {
    {"sum_c", sum2_sum_c, METH_O, "Sum the numbers up to n."},
    {NULL, NULL, 0, NULL} /* Sentinel */
};

PyMODINIT_FUNC
initsum2(void)
{
    PyObject *m;
    m = Py_InitModule("sum2", Sum2Methods);
}
Now compile and run it as before: 
bash$ sage -sh
bash$ gcc -I$SAGE_ROOT/local/include/python2.6 -bundle -undefined dynamic_lookup sum2.c -o sum2.so 
bash$ sage
...
sage: import sum2
sage: sum2.sum_c(101)
5050
sage: import sum
sage: sum.sum_cython(101)
5050
sage: timeit('sum.sum_cython(1000000r)')
125 loops, best of 3: 2.54 ms per loop
sage: timeit('sum2.sum_c(1000000r)')
125 loops, best of 3: 2.03 ms per loop
Note that this is a little faster than the corresponding Cython code. This is because the Cython code is more careful, checking various error conditions, etc.  

Note that the C code is 5 times as long as the Cython code.

Building Extensions using Setuptools Instead

In nontrivial projects, the Cython step of transforming your code from .pyx to .c is typically done by explicitly calling cython somehow (this will change in the newest version of Cython), but the step of running the C compiler is usually done using either distutils or setuptools. To use the tools, one creates a file "setup.py" which defines the extensions in your project, and Python itself then runs a C compiler for you, with the proper options, includes paths, etc.

Let's create a new setuptools project that includes the sum and sum2 extensions that we defined above. First, create the following file and call it setup.py. This should be in the same directory as sum.c and sum2.c.
from setuptools import setup, Extension

ext_modules = [
    Extension("sum", ["sum.c"]),
    Extension("sum2", ["sum2.c"])
    ]

setup(
    name = 'sum',
    version = '0.1',
    ext_modules = ext_modules)
Then type 
bash$ rm *.so  # make sure something happens
bash$ sage setup.py develop
...
bash$ ls *.so
sum.so sum2.so


Notice that running
setup.py develop

resulted in Python generating the right gcc commmand lines for your platform. You don't have to do anything differently on Linux, OS X, etc.

If you change sum2.c, and want to rebuild it, just type sage setup.py develop again to rebuild sum2.so If you change sum.pyx, you have to manually run Cython:
sage -cython sum.pyx

then again do sage setup.py develop to rebuild sum.so. Try this now. In sum.pyx, change
for i in range(n):

to
for i in range(1,n+1):

then rebuild:
 
bash$ sage -cython sum.pyx
...
bash$ sage setup.py develop
...
bash$ sage
...
sage: import sum
sage: sum.sum_cython(100)
5050

There are ways to make setup.py automatically notice when sum.pyx changes, and run Cython. A nice implementation of this will be in the next Cython release. See the setup.py and build_system.py files of Purple sage for an example of how to write a little build system write now (before the new version of Cython).

An Automated Way to Experiment

Given any single Cython file such as sum.pyx, in Sage you can do
sage: load sum.pyx
Compiling sum.pyx...
sage: sum_cython(100)
5050
Behind the scenes, Sage created a setup.py file, ran Cython, made a new module, compiled it, and imported everything it defines into the global namespace.   If you look in the spyx subdirectory of the directory listed below, before you exit Sage (!), then you'll see all this. 
sage: SAGE_TMP
'/Users/wstein/.sage//temp/deep.local/14837/'

You can also do
sage: attach sum.pyx

Then every time sum.pyx changes, Sage will notice this and reload it. This can be useful for development of small chunks of Cython code.

You can also use the Sage notebook, and put %cython as the first line of a notebook cell. The rest of the cell will be compiled exactly as if it were written to a .pyx file and loaded as above. In fact, that is almost exactly what happens behind the scenes.

Next Time

Now that we understand at a reasonably deep level what Cython really is and does, it is time to learn about the various constructs of the language:
  1. How to create extension classes using Cython.
  2. How to call external C/C++ library code.

We will rewrite our sum.pyx file first to use a class. Then we'll rewrite it again to make use of the MPIR (or GMP) C library for arithmetic, and again to make use of the C++ NTL library.

Wednesday, November 3, 2010

Cython, Sage, and the Need for Speed

Cython seriously rocks, at least for much of what I need. It's still the killer feature of Python/Sage, IMHO. And meetings like EuroScipy last summer really confirmed that, where almost every other talk used Cython.

History


Greg Ewing wrote "Pyrex" in 2002--2004..., which I guess he named
after some cooking ware. It is amazing, but to understand this you
must take a quick tour of Extending and embedding and the Python/C API reference. Pyrex let you write basically Python-ish code that gets magically turned into C extension code.

In 2007 I forked it (discuss why briefly) and named Cython after this punk rock guy.

At that time, Robert Bradshaw and Stefen Behnel spent a lot of time improving Cython, implementing tons of _optimizations_ and new features.

Cython is now very popular in the "Scientific computing using Python" world. It is also heavily used in Sage.

Are You Serious?

If you want to use a computer for math research, and you are serious (not some lazy person who fiddles then gives up), you will likely run into situations where you need code to run fast. Writing such code only in Python (or any other interpreter) is often impossible.

If you want to write fast code on a computer, and don't want to mess with assembler, the only option right now is C, or something with equivalent speed... Cython! By "fast" I mean 100-1000 times what you'll get out of Python on certain tasks. I also mean code that is evil, scary, and dangerous... if you aren't careful with preconditions.

Compiled versus Interpreted Code

Here's how interpreter code usually runs.
    1. Check a bunch of conditions then do one single thing.
    2. Check a bunch of conditions then do one single thing.
    ...
    10^6. Check a bunch of conditions then do one single thing.
Here's how compiled (C, Cython, etc.) can can be written:

    1. Check some conditions (optional, but a good idea);
    2. Do very unsafe stuff with no checks at all (but they 
       in theory should be safe given 1).
    ...
    10^6. Do very unsafe stuff with no checks at all (but they 
       in theory should be safe given 1).
The problem is that all the checks in step 1 (in either case) can easily take over 100 times as long as "do very unsafe stuff".

TYPICAL EXAMPLE:
sage: def sum_sage(n):
...       s = 1
...       for i in range(n):
...           s += i
...       return s
sage: timeit('sum_sage(100000r)')
5 loops, best of 3: 84.6 ms per loop
sage: %python
sage: def sum_python(n):
...       s = 1
...       for i in range(n):
...           s += i
...       return s    
sage: 84.6/5.88
14.3877551020408
sage: timeit('sum_python(100000r)')
125 loops, best of 3: 5.88 ms per loop
sage: %cython
sage: def sum_cython(int n):
...       cdef int i, s = 1
...       for i in range(n):
...           s += i
...       return s
sage: timeit('sum_cython(100000r)')
625 loops, best of 3: 61.6 ┬Ás per loop
sage: 5.88 / 0.061
96.3934426229508   

Let me explain what's going in the above. How, e.g., in the first one (sum_sage), the program is doing a sort of monologue: "I have to add a Python int to a Sage int. I don't have any code to do that directly (that would get too complicated, and they are so big and complicated and different objects, and they might change, oh my). So I'll convert the Python int to Sage int, because that's the only conversion I know. OK, I do that via (it used to be base 10 string parsing!) some code Gonzalo Tornaria wrote that is scary complicated... and once that is done, I got my new MPIR-based Sage integer, which I think add to s. The addition takes some memory that points to the two MPIR integers, and since Python numbers are supposed to be immutable, I make yet another MPIR number (wrapped in a Python object), which is the result of asking MPIR to add them. MPIR numbers are also very complicated objects, involving stuff like limbs, and C structs, which hardly anybody fully understands. Despite these integers happening to be small, there is still quite some overhead in the addition, but it happens (taking a small fraction of the total runtime). Then we move on to the next step in the loop!"

With sum_python, the loop is similar, but MPIR isn't involved, and there are no conversions. This buys a 14-fold speedup. But it is still not super fast, since many new Python objects get created, the code is for "potentially huge integers", hence a potentially complicated data structure has to be checked for, etc.

With sum_cython, the integers are only C ints, which are a 32 or 64-bit location in memory. Doing "s += i" just modifies in place that position in memory. There's no conversions or type checks done at all at run time. It's really fast... 1386 times faster than the first version!!!

Key point: If you truly understand what is going on, you'll see that this isn't Sage being fundamentally broken. Instead, you'll hopefully be able to look at a block of Sage code and have a clue about how to figure out what it is really doing in order to see whether writing a new implementation of the same algorithm using Cython (which will likely mean directly working with C level data structures) is likely to give you a big speedup. If you look at the innermost statement in a loop, and there's a big monologue about what is really going on, then you might get a 1000-fold speedup by using Cython.

In mathematics, general theorems -- once we have them -- are almost always much better than proofs of special cases. In math, proving a special case can often seem more awkward and unnatural than proving the general case (e.g., how would you proof that ever integer of the form a^2 + 7*a + 5 factors uniquely as a product of primes!?). With general theorems in math, the statements are often simple and clear so applying them is easier than applying theorems that are only about some very special case, which has often more elaborate hypothesis. In mathematics, usually a general theorem is simply all around much better than a theorem about some very special cases (especially if both are available).

In contrast, when writing computer programs, algorithms to solve very general cases of problems often have significant drawbacks in terms of speed (and sometimes complexity) over algorithms for special cases. Since you are mathematicians, you should constantly guard against your instincts from math research which can point you in exactly the wrong direction for writing very fast code. Often implementations of very general algorithms _are_ easier to understand, and are much less buggy than a bunch of implementations of special cases. However, there are also usually very severe performance penalties in implementing only the general case. Watch out.

A huge part of understanding the point of Cython for writing fast math code is that you must accept that you're going to write a lot of "ugly" (from a mathematicians perspective) code that only deals with special cases. But it's beautiful from the perspective of somebody who absolutely needs fast code for a specific research application; your fast code can lead to whole new frontiers of research.

Continuing the example from above:
sage: sum_python(10^8)
4999999950000001
sage: sum_cython(10^8)
887459713

Yep, we just implemented only a special case in Cython!

---------------

Useful things to do if you want Cython enlightenment (all at once, no order):
  • Definitely read/skim Extending and try all examples. This is critical if you want to understand Cython with any depth at all. Don't think: I don't need to know any of this, because I have Cython. Yes, after you play with this a bit you may never explicitly use it. But you do need to know it.
  • Oh, definitely learn the C language if you haven't already. This is
    the book. There are courses. It's crazy not to learn C, anyways, since it (along with C++) is hugely popular, and a massive amount of code is written in C/C++. (See, e.g., http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html where C and C++ together are the most popular, by a long margin.)

  • Obviously, read http://docs.cython.org/ until it is burned into your brain.


  • Look at the code that Cython generates (use "cython -a" for a nice HTML view).