Sage: Open Source Mathematics Software

Tuesday, April 22, 2008

Sage-3.0 and Beyond

Finally, after much hard work, Sage-3.0 has been released! This release has tons of bug fixes plus several new features such as interact mode in the notebook, the R interface, full GCC-4.3 support, much better automated testing, etc.

Most importantly, Sage-3.0 finally has code for computing with modular abelian varieties. You almost certainly have no clue what those are, but suffice to say that I started the Sage project to compute with them, so having this code in Sage is a major milestone for me and the project.

We dramatically increased our automated testing and example suite so that 51.5 % of functions have autotested examples. There are now nearly 60,000 lines of input examples. In February our testing was in the 30% range. This was a huge amount of work by many many Sage developers, and it has the practical impact that when you type foo? it is nearly twice as likely that you'll see a helpful example.

There is now a new interface to R that uses a pseudotty; this is a completely different alternative to rpy, which makes it possible for the web-based Sage notebook to work as an R GUI, and also makes it so any R command can be used from Sage 100% exactly as in R. It is still clunky and has numerous issues, but it is fairly usable, documented, and has a test suite. Here it is drawing a plot in the notebook:

So what is the next step for Sage? We have finished rapidly expanding by incorporating major new components. Now we will work on making Sage more accessible and writing new code to make Sage highly competitive with all other options. I see The main goals for this coming year are as follows:

Port Sage and as many of its components as possible to Microsoft Windows (32 and 64-bit, MSVC) and to Sun's Solaris. For the Windows port the main challenge is writing a native implementation of pexpect for 32 and 64-bit MSVC windows, and porting several math software projects (e.g., R) that until now haven't had the resources to move beyond 32-bit crippled mingw/cygwin. For Solaris, the main issues are simply fixing bugs and improving the quality of the Sage codebase. Michael Abshoff is leading the charge.
Modular forms and L-functions -- greatly improve all linear algebra needed for modular forms computation; implement a much wider range of algorithms; run large computations and record results in a database. Mike Rubinstein and I will lead the charge.
Symbolic calculus -- a native optimized and much more general implementation of symbolic Calculus in Sage. Gary Furnish is leading the charge.
Combinatorics -- migrate the mupad-combinat group to Sage; implementation of a lot more native group theory in Sage. Mike Hansen is leading the charge.
Statistics -- integrate R and scipy.stats much more fully into Sage with the goal of making Sage the best available environment for statistical computation and education. Harald Schilly is leading the charge.
The Sage notebook -- separate the notebook from Sage (like I did with Cython); develop the notebook's interactive and visualization functionality much more, and make the notebook easier to use as a graphical frontend for all math software. I am leading the charge until I can find somebody else to do so.
Magma -- greatly improve the ability of Sage and Magma to work together (many more automatic conversions between Sage and Magma datastructures, etc.); Magma is by far the best in the world and numerous calculations, and much important code been written in Magma, so it's critical that Sage be fully able leverage all that functionality. Also, this will make comparing the capabilities of the two systems much easier. I'm also leading the charge here.

All of the above goals have received new financial support from industry or the NSF. For example, Microsoft is funding the Windows port, Google is funding many of the above projects, and the NSF is strongly funding a big modular forms and L-functions database project. The sage-devel mailing list has 434 members and over 1000 posts/month, and each Sage release has about 30 contributors. The size of the serious core Sage developer community might be nearing that of Mathematica's core developer group. Now is the moment in time when we have a chance to... create a viable free open source alternative to Magma, Maple, Mathematica, and Matlab.

Tuesday, April 8, 2008

Google Funds Sage!!

Today Chris DiBona (open source programs manager at Google) fully funded a proposal [pdf] that some students and I put together to carry out four critically important Sage development projects this summer (2008):

Mike Hansen (UCSD): Implement combinatorial species in Sage. This will provide a backend to handle a wide range of new combinatorial objects with minimal eﬀort.
Gary Furnish (Utah): Implement new optimized symbolic manipulation code in Cython. This will greatly speed up symbolic computation in Sage, and reduce Sage's dependence on Maxima.
Robert Miller (UW): Implement fast code for permutation groups and backtracking algorithms. This is needed for general applications to graph and code automorphism groups computations, and can be used to exhaustively generate isomorphism class representatives of a wide class of mathematical objects.
Yi Qiang (UW): Further document, improve the usability of, and provide numerous real world examples of distributed computing with Sage. This will greatly help in applications of Sage to research that involve parallel computation.

This funding is not part of the Google Summer of Code (the projects above do not fit with any of the mentoring organizations), but is instead I hope part of more longterm support by Google for the Sage project and open source mathematical software that provides people with a free alternative to Maple, Mathematica, Matlab, and Magma.

Friday, February 29, 2008

This is what I would like in Sage 3.0

DOCUMENTATION:
```
cd SAGE_ROOT/devel/sage/sage/; sage -coverage .
```
should output
```
Overall weighted coverage score:  x%  [[where x >= 50]]
```
Moreover there should be at least one hand written paragraph at the beginning of each file; "sage -coverage" can be adapted to flag whether or not this is there. This will improve the overall quality and maintainability of Sage, and make it easier for users to find examples.
MANIPULATE: Usable "manipulate" functionality standard in Sage. This has very wide applicability.
R: A pexpect interface to R (so, e..g, the notebook can act as a full R notebook using 100% native R syntax). This will matter to a lot of Sage users, and make using R from Sage much easier in some cases (just cut and paste anything from any R examples and it will work). It will also provide something in Sage that one doesn't get with Python + rpy + R.
TIMING/BENCHMARKING: Fully integrate in Sage wjp's code that times all doctests, and start publishing the results on all Sage-supported platforms with every Sage release. This will give people a much better sense about which hardware is best for running Sage, and avoid major performance regressions. Likewise, get the Malb/wjp/my generic benchmarking code into Sage (this provides a doctest like plain text format for creating benchmarks, and is already mostly written).

I've only proposed goals for 3.0 that are wide reaching and will be noticed in some way by most users instead of fairly technical optimizations in a specific area (such as FLINT, Libsingular, or optimized integer allocation). I think changing implementations in specific technical areas to speed things up is more appropriate in week-to-week releases, and is also something we should be very careful about until we have good speed regression testing in place (we should have done step 4
above a long long ago).

Thursday, February 21, 2008

Mathematics Research, Education, and Sage

I think there is a brewing tension between education and research amongst developers involved with the Sage project. More on that in a moment.

I cannot speak for all the core Sage developers, but I think I have some idea what some of them think and care about. My impression is that many of them are involved in Sage because they want to create software that they can use for attacking cutting edge research problems in their research area. This is true of me: I started Sage -- original called "Software for Arithmetic
Geometry Experimentation" -- to have a very powerful open software environment for computing with modular forms, abelian varieties,
elliptic curves, and L-functions.

I am quite happy that Sage has become much more general, addressing a huge range of mathematics, since this expands the range of good developers and also increasing the range of tools math researchers can bring to bare on attacking a problem results in better research. For example, the solutions
to many problems in number theory involve an incredible range of techniques in different areas of mathematics. I'm fairly certain that many of the people who have put in insane hours during the last few years making Sage what is it now (e.g., Mike Hansen, David Roed, Robert Bradshaw, David Harvey, Robert Miller, Emily Kirkman, Martin Albrecht, Michael Abshoff, etc.) have a similar perspective.

On the other hand, I teach high school students for a while every summer (in SIMUW), as do other people like David Roe, and of course I teach undergraduate classes... This is why I put so much effort into co-authoring things like the Sage notebook, which exist mainly to make the functionality of Sage more accessible to a wider range of people.

So, I think there is a brewing tension between education and research amongst developers involved with the Sage project (and in my case in my own mind). Some observations:

1. The research part of the Sage project is thriving and getting sufficient funding independent of any connection with educational applications of Sage. It very very healthy right now.

2. There is a lot of potential benefit to education in having a tool like Sage, since Mathematica is quite expensive, closed, etc. It's good for humanity
for Sage to be genuinely useful in an educational context.

3. People working on Sage for research have very limited time, and it can be frustrating being regularly asked to do things by the education community that not only have nothing to do with research, but are even sometimes at odds with it.

4. It is vitally important for the Sage project to be both well organized and have a clear sense of direction, purpose and goals.

It might be a good idea if the people who are really interested in Sage being a great tool for *education*, would consider doing the
following:

(a) setting up a mailing list called sage-edu for development discussions related to Sage in education. I realize that we just got rid of sage-newbie, but that was for a different reason -- because people were posting sage-support questions there and not getting responses.

(b) Gather together the best education-related tools in some sort
of organized package. This could start with Geogebra. I don't know. The key
thing is that there is no expectation at all that the people into Sage mainly for research do much of anything related to this project. I hope one outcome of this project would be an spkg that when installed would make available lots of cool extra stuff, and of course I would be very supportive about server space, posting of spkg's etc. And when this gets some momentum and
quality behind it this spkg would be included standard in Sage.

Basically I'm suggesting that everyone interested in making Sage the ultimate educational tool get organized, figure out who really wants to put
in an insane amount of effort on this sort of thing, and put together a bunch of cool tools. Stop thinking you have to convince a bunch of us research-focused people to do the work or that your ideas are good -- you
don't -- your ideas are good; it's just that if we put a lot of time into them we won't have time for our research.

Make an spkg that will be trivial to install into Sage and extend its functionality. There is definitely sufficient interest in something like this
for education, there is great potential for funding, and potential for having a major positive impact on society. Thus I think people will emerge who will
want to take up this challenge. I just thing it's better if it can happen for a while unconstrained by the rules or prejudices of the "Sage Research" side
of this project.

In summary, please put a huge amount of effort into getting organized and putting together something polished and great, so I can later effortless assimilate it :-).

Saturday, February 9, 2008

Benchmarketing: Modular Hermite Normal Form

Two days ago there was no fast freely available open source implementation of computation of the Hermite Normal Form of a (square nonsingular) matrix. In fact the fastest implementations I know of -- in Gap, Pari, and NTL, are all massively slower (millions of times) than Magma. See Allan Steel's page on this algorithm.

I just spent the last few days with help from Clement Pernet and Burcin Erocal implementing a fast modular/p-adic Hermite Normal Form algorithm. As soon as the coefficients of the input matrix get at all large our implementation is now the fastest in the world available anywhere (e.g., faster than Magma, etc.). This is incredibly important for my research work on modular forms.

The following plot compares Sage (blue) to Magma (red) for computing the Hermite Normal Form of an nxn matrix with coefficients that have b bits. The x-axis is n, the y-axis is b, and the blue dots show where Sage is much better, whereas the red dots are where Magma is much better. The timings are on a 2.6Ghz core 2 duo laptop running OS X, and both Sage and Magma were built as 32-bit native executables:

(Note: When doing timings under Linux, Magma doing slightly better than it does above, though Sage is still much better as soon as the coefficients are at all large.)

This plot shows the timings of Magma (red) versus Sage (blue) for computing the Hermite Forms of matrices with entries between -2¹² and 2¹². Sage is much faster than Magma, since the blue line is lower (=faster).

For even bigger coefficients Sage has a much bigger advantage. E.g., for a 200 x 200 256-bit matrix, Magma takes 232 seconds whereas Sage takes only 55 seconds.

In order to make this happen we read some old papers, read notes from a talk that Allan Steel (who implemented HNF in Magma) gave, and had to think quite hard to come up with three tricks (2 of which are probably similar to tricks alluded to in Steel's talk notes, but with no explanation about how they work), coded for 3 days at Sage Days 7, discovered that we weren't right about some of our tricks, doing lots of fine tuning, and finally fixed all the problems and got everything to work. We will be writing this up, and of course all the code is open source.

Acknowledgement: Many many thanks to Allan Steel for implementing an HNF in Magma that blows away everything else, which gave us a much better sense of what is possible in practice. Thanks also to the authors of IML whose beautiful GPL'd implementation of balanced p-adic is critical to our fast HNF implementation.

Wednesday, January 30, 2008

Josh Kantor's Lorenz attractor example

Josh posted a nice example of plotting a Lorenz attractor in Sage:

Put this in a notebook cell (be careful about newlines):


Integer = int
RealNumber = float

def lorenz(t,y,params):
    return [params[0]*(y[1]-y[0]),y[0]*(params[1]-y[2])- y[1],y[0]*y[1]-params[2]*y[2]]

def lorenz_jac(t,y,params):
    return [ [-params[0],params[0],0],[(params[1]-y[2]),-1,-y[0]],[y[1],y[0],-params[2]],[0,0,0]]

T=ode_solver()
T.algorithm="bsimp"
T.function=lorenz
T.jacobian=lorenz_jac
T.ode_solve(y_0=[.5,.5,.5],t_span=[0,155],params=[10,40.5,3],num_points=10000)
l=[T.solution[i][1] for i in range(len(T.solution))]

line3d(l,thickness=0.3, viewer='tachyon', figsize=8)

and this is what you get (click to zoom):

Friday, January 11, 2008

I'm glad I chose Python for Sage -- some cool scientific computing Python projects

When brainstorming for talk ideas for a workshop, Fernando Perez came up with a bunch of random off-the-top of his head very high quality/impact scientific computing projects that involve Python. Here they are (the following was written by Fernando Perez, the author of IPython):

- I can obviously talk about ipython and related projects, but I can also give a math/technical talk off this type of work (the whole implementation is python, and uses quite a few tricks):

http://dx.doi.org/10.1016/j.acha.2007.08.001

- Brian Granger (from Tech-X http://txcorp.org) has a NASA grant to develop distributed arrays for Python, using IPython and numpy. That would make for an excellent talk, I think (matlab, interactivesupercomputing.com, and all the 'big boys' are after distributed arrays).

- Trilinos (http://trilinos.sandia.gov), a large set of parallel solvers developed at Sandia National Lab, has a great set of Python bindings (even usable interactively via ipython).

- MPI4Py is an excellent set of MPI bindings for Python, and its author Lisandro Dalcin is also the developer of Petsc4py:
* http://mpi4py.scipy.org/
* http://code.google.com/p/petsc4py/
If Lisandro can't come, I can contact one of the Petsc guys who's a great python developer and see if he's coming or can make it, he's an excellent speaker (Matt Knepley from Argonne Nat. Lab http://www-unix.mcs.anl.gov/~knepley/).

- The Hubble space telescope people are all pyhton based, and have done enormous amounts of work on both their internal image processing pipeline and contrbuting to Matplotlib (they have currently a developer working full time on matplotlib).

- The NetworkX (Sage uses this) guys from Los Alamos will probably be coming: https://networkx.lanl.gov/wiki.

- The Scripps institute has an extremely strong Python team doing molecular visualization and visual programming. Their work is very impressive, and they're already in San Diego, so it's a no brainer to attend. Their presentations are always of very high quality: http://mgltools.scripps.edu.

- JPL uses python extensively (they've contracted out work for matplotlib to be extended to suit their specific needs).

- My new job at UC Berkeley is on neuroscience, and we could present the work that's being developed there for fMRI analysis (all Python based, fully open source, NIH funded).

- Andrew Straw's work (http://www.its.caltech.edu/~astraw/) on real-time 3d tracking of fruit flies is very, very impressive. All python based, hardware control, real-time parallel computing.

- The CACR group at Caltech has the contract for DANSE (http://wiki.cacr.caltech.edu/danse/index.php/Main_Page), the Spallation Neutron Source's data analysis framework, all python. This is currently the largest experiment being funded in the USA.