Tuesday, October 26, 2010

Sage as a Python Library?

When I started Sage I viewed it as a distribution of a bunch of math software, and Python as just the interpreter language I happen to use at the time. I didn't even know if using Python as the language would last. However, it's also possible to think of Sage as a Python library.

Anyway, it has occurred to me (a few times, and again recently) that it would be possible to make much of the Sage distribution, without Python of course, into a Python library. What I mean is the following. You would have a big Python library called "sagemath", say, and inside of that would be a huge HG repository. In that repository, one would check in the source code for many of the standard Sage spkg's... e.g., GAP, Pari, etc. When you type

python setup.py install

then GAP, Pari, etc., would all get built, controlled by some Python scripts, then installed as package_data in the sagemath directory of /site-packages/.

From a technical perspective, I don't see any reason why this couldn't be made to work. HG can handle this much data, and "python setup.py install" can do anything. It does lead to a very different way of looking at Sage though, and it could help untangle things in interesting ways.

(1) Have a Python library called "sagecore", which is just the most important standard spkg's (e.g., Singular, PARI, etc.), perhaps eventually built *only* as shared object libraries (no standalone interpreters).

(2) Have a Python library which is the current Sage library (we already have this), and which can be installed assuming sagecore is installed.

(3) Have other Python libraries (like psage: http://code.google.com/p/purplesage/source/browse/), which depend on (2). Maybe a lot of the "sage-combinat" code could also be moved to such a library, so they can escape the "combinat patch queue" madness. Maybe many other research groups in algebraic topology, differential geometry, special functions, etc., will start developing such libraries... on their own, and share them with the community (but without having to deal directly with the sage project until they want to).

To emphasize (3), when people want to write a lot of mathematics code in some area, e.g., differential geometry, they would just make a new library that depends on Sage (the library in (2)). We do the work needed to make it easy for people to write code outside of the Sage library, which depends on Sage. Especially writing Cython code like this can be difficult and confusing, and we don't explain it all in any Sage documentation. It actually took me quite a while to figure out how to do it today (with psage).

The core Sage library (2) above would continue to have a higher and higher level of code review, tough referee process etc. However, the development models for (3) would be up to the authors of those libraries.

The above is already how the ecosystem with Python (http://pypi.python.org/pypi), Perl (http://www.cpan.org/), R, etc., work. Fortunately, Python has reasonably good support already for this.

I think without a shift in this direction, Sage is going to be very frustrating for people writing research oriented code.

Fortunately, it's possible to do everything I'm describing above without disturbing the mainline Sage project itself, at least for now.