Friday, February 29, 2008

This is what I would like in Sage 3.0


  1. DOCUMENTATION:
    cd SAGE_ROOT/devel/sage/sage/; sage -coverage .

    should output
    Overall weighted coverage score:  x%  [[where x >= 50]]

    Moreover there should be at least one hand written paragraph at the beginning of each file; "sage -coverage" can be adapted to flag whether or not this is there. This will improve the overall quality and maintainability of Sage, and make it easier for users to find examples.

  2. MANIPULATE: Usable "manipulate" functionality standard in Sage. This has very wide applicability.

  3. R: A pexpect interface to R (so, e..g, the notebook can act as a full R notebook using 100% native R syntax). This will matter to a lot of Sage users, and make using R from Sage much easier in some cases (just cut and paste anything from any R examples and it will work). It will also provide something in Sage that one doesn't get with Python + rpy + R.

  4. TIMING/BENCHMARKING: Fully integrate in Sage wjp's code that times all doctests, and start publishing the results on all Sage-supported platforms with every Sage release. This will give people a much better sense about which hardware is best for running Sage, and avoid major performance regressions. Likewise, get the Malb/wjp/my generic benchmarking code into Sage (this provides a doctest like plain text format for creating benchmarks, and is already mostly written).



I've only proposed goals for 3.0 that are wide reaching and will be noticed in some way by most users instead of fairly technical optimizations in a specific area (such as FLINT, Libsingular, or optimized integer allocation). I think changing implementations in specific technical areas to speed things up is more appropriate in week-to-week releases, and is also something we should be very careful about until we have good speed regression testing in place (we should have done step 4
above a long long ago).

Thursday, February 21, 2008

Mathematics Research, Education, and Sage

I think there is a brewing tension between education and research amongst developers involved with the Sage project. More on that in a moment.

I cannot speak for all the core Sage developers, but I think I have some idea what some of them think and care about. My impression is that many of them are involved in Sage because they want to create software that they can use for attacking cutting edge research problems in their research area. This is true of me: I started Sage -- original called "Software for Arithmetic
Geometry Experimentation" -- to have a very powerful open software environment for computing with modular forms, abelian varieties,
elliptic curves, and L-functions.

I am quite happy that Sage has become much more general, addressing a huge range of mathematics, since this expands the range of good developers and also increasing the range of tools math researchers can bring to bare on attacking a problem results in better research. For example, the solutions
to many problems in number theory involve an incredible range of techniques in different areas of mathematics. I'm fairly certain that many of the people who have put in insane hours during the last few years making Sage what is it now (e.g., Mike Hansen, David Roed, Robert Bradshaw, David Harvey, Robert Miller, Emily Kirkman, Martin Albrecht, Michael Abshoff, etc.) have a similar perspective.

On the other hand, I teach high school students for a while every summer (in SIMUW), as do other people like David Roe, and of course I teach undergraduate classes... This is why I put so much effort into co-authoring things like the Sage notebook, which exist mainly to make the functionality of Sage more accessible to a wider range of people.

So, I think there is a brewing tension between education and research amongst developers involved with the Sage project (and in my case in my own mind). Some observations:

1. The research part of the Sage project is thriving and getting sufficient funding independent of any connection with educational applications of Sage. It very very healthy right now.

2. There is a lot of potential benefit to education in having a tool like Sage, since Mathematica is quite expensive, closed, etc. It's good for humanity
for Sage to be genuinely useful in an educational context.

3. People working on Sage for research have very limited time, and it can be frustrating being regularly asked to do things by the education community that not only have nothing to do with research, but are even sometimes at odds with it.

4. It is vitally important for the Sage project to be both well organized and have a clear sense of direction, purpose and goals.

It might be a good idea if the people who are really interested in Sage being a great tool for *education*, would consider doing the
following:

(a) setting up a mailing list called sage-edu for development discussions related to Sage in education. I realize that we just got rid of sage-newbie, but that was for a different reason -- because people were posting sage-support questions there and not getting responses.

(b) Gather together the best education-related tools in some sort
of organized package. This could start with Geogebra. I don't know. The key
thing is that there is no expectation at all that the people into Sage mainly for research do much of anything related to this project. I hope one outcome of this project would be an spkg that when installed would make available lots of cool extra stuff, and of course I would be very supportive about server space, posting of spkg's etc. And when this gets some momentum and
quality behind it this spkg would be included standard in Sage.

Basically I'm suggesting that everyone interested in making Sage the ultimate educational tool get organized, figure out who really wants to put
in an insane amount of effort on this sort of thing, and put together a bunch of cool tools. Stop thinking you have to convince a bunch of us research-focused people to do the work or that your ideas are good -- you
don't -- your ideas are good; it's just that if we put a lot of time into them we won't have time for our research.

Make an spkg that will be trivial to install into Sage and extend its functionality. There is definitely sufficient interest in something like this
for education, there is great potential for funding, and potential for having a major positive impact on society. Thus I think people will emerge who will
want to take up this challenge. I just thing it's better if it can happen for a while unconstrained by the rules or prejudices of the "Sage Research" side
of this project.

In summary, please put a huge amount of effort into getting organized and putting together something polished and great, so I can later effortless assimilate it :-).

Saturday, February 9, 2008

Benchmarketing: Modular Hermite Normal Form

Two days ago there was no fast freely available open source implementation of computation of the Hermite Normal Form of a (square nonsingular) matrix. In fact the fastest implementations I know of -- in Gap, Pari, and NTL, are all massively slower (millions of times) than Magma. See Allan Steel's page on this algorithm.

I just spent the last few days with help from Clement Pernet and Burcin Erocal implementing a fast modular/p-adic Hermite Normal Form algorithm. As soon as the coefficients of the input matrix get at all large our implementation is now the fastest in the world available anywhere (e.g., faster than Magma, etc.). This is incredibly important for my research work on modular forms.

The following plot compares Sage (blue) to Magma (red) for computing the Hermite Normal Form of an nxn matrix with coefficients that have b bits. The x-axis is n, the y-axis is b, and the blue dots show where Sage is much better, whereas the red dots are where Magma is much better. The timings are on a 2.6Ghz core 2 duo laptop running OS X, and both Sage and Magma were built as 32-bit native executables:



(Note: When doing timings under Linux, Magma doing slightly better than it does above, though Sage is still much better as soon as the coefficients are at all large.)

This plot shows the timings of Magma (red) versus Sage (blue) for computing the Hermite Forms of matrices with entries between -212 and 212. Sage is much faster than Magma, since the blue line is lower (=faster).


For even bigger coefficients Sage has a much bigger advantage. E.g., for a 200 x 200 256-bit matrix, Magma takes 232 seconds whereas Sage takes only 55 seconds.

In order to make this happen we read some old papers, read notes from a talk that Allan Steel (who implemented HNF in Magma) gave, and had to think quite hard to come up with three tricks (2 of which are probably similar to tricks alluded to in Steel's talk notes, but with no explanation about how they work), coded for 3 days at Sage Days 7, discovered that we weren't right about some of our tricks, doing lots of fine tuning, and finally fixed all the problems and got everything to work. We will be writing this up, and of course all the code is open source.

Acknowledgement: Many many thanks to Allan Steel for implementing an HNF in Magma that blows away everything else, which gave us a much better sense of what is possible in practice. Thanks also to the authors of IML whose beautiful GPL'd implementation of balanced p-adic is critical to our fast HNF implementation.