Sage: Open Source Mathematics Software

Thursday, May 29, 2008

GMP forked; Torbjorn Granlund: "blatant falsehoods and sinister insinuations"

Torbjorn Granlund:
> The other publicly stated reasons are a mixture of
> blatant falsehoods and sinister insinuations.
> This fork exists for some very different reasons than those publicy
> stated.

I know why the fork exists, so I'll state publicly precisely
why the fork exists. The issues below are not
blatant falsehood or sinister insinuations.

1. FACT: (L)GPLv3 cannot be used by some companies. See below where
I discuss this issue at length.

2. FACT: The GMP project is not developer friendly. This is easy to
see by reading the GMP mailing lists.

3. FACT: The GMP project does not have a regular and
predictable release cycle. How many times has the GMP 5.0 release
been moved back -- it used to be "sometime in 2007", but now the
GMP site says "5.0 is planned to come out in a couple of years."

4. FACT: The working code repository of GMP is closed. There is
no public svn, etc. repository so that anybody can look at the latest
version of the GMP code. See 2.

5. FACT: Some extremely capable developers do not want to contribute to
an LGPL'd project, because they don't want their voluntary contributions
to be used by Maple, Mathematica and Magma to make money.

6. FACT: The GMP project is unfriendly toward natively supporting
Microsoft Windows using MSVC. Just see any email you have sent
to Brian Gladdman.

7. FACT: The GMP project has been unfriendly toward supporting OS X.
Just search the gmp list archives for OS X.

Torbjorn Granlund:
> I think the forkers over at SAGE use the purported v3 incompatibility
> issues as an excuse for forking GMP.

I have been approached by numerous people (from industry,
government, etc.) over the last three years about forking GMP.
GPLv3 was one of numerous factors that really pushed things
over the edge, though a fork would have happened anyways.

Torbjorn Granlund wrote:
> Seriously, please don't use this sort of rhetoric. At least not
> before you've explained the actual problem. I suspect the SAGE team's
> problem is mere stubbornness, in particular since they have not been
> able to produce one single reason for their problems with v3. But I
> am all ears, should somebody spell it out.

It is no secret that the Sage project receives some funding from
Microsoft Research to produce free open source
mathematical software for use by their researchers. See
http://sagemath.org/ack.html
for a list of organizations that fund Sage development.
A company-wide requirement at Microsoft is that they do not run any
GPLv3 code, not even binaries. This is not surprising in light
of numerous quotes by Stallman about how GPLv3 was designed
partly to attack Microsoft. For example:

Stallman: "The point of the GPLv3 conditions that apply to the
Novell/Microsoft deal is to give the rest of the community a defense
against Microsoft's patent threats. If these conditions do their job,
the result will be that Microsoft never goes beyond threats, and the
community is safe."
http://www.technewsworld.com/story/must-read/59780.html?welcome=1212081285

This is one straightforward reason why GPLv3+ only code
is a problem for the Sage project. It has nothing to do with
ideological stubborness by Sage developers (instead it is
ideological stubborness by Stallman).

If Microsoft maintains this policy, then they will
also not run new versions of Mathematica, Maple,
Magma, that depend on any LGPLv3+ code.
So the problem Sage has will also be a problem for all
those projects. And I guarantee you that people at Microsoft
know about this issue.

You probably don't like Microsoft, so I doubt we will find any sympathy
from you as a result of the above. But rest assured
that the above issues with GPLv3 are NOT motivated by "mere
stubborness" by the *Sage team*. So you are wrong about that.

Thursday, May 1, 2008

Can There be a Viable Free Open Source Alternative to Magma, Maple, Mathematica, and Matlab?

For over a decade I have primarily done research in number theory that often involves computation, mainly using Magma. In 2004 I realized that it was stupid for me to continue building my work on Magma because Magma is proprietary, the development model is closed, Magma is expensive which is bad for students, and the language itself lacked many features (e.g., user defined classes) that I had requested repeatedly for over 5 years. Thus to continue to use only Magma unacceptably limited my potential in both research and education.

Having used Magma for many years, I simply could not switch to an existing open source system. The only serious free open source software for number theory is PARI, whose capabilities are far behind that of Magma in numerous critical areas of interest to me, including exact linear algebra, commutative algebra, and algebraic curves. And no other free system--GAP, Singular, Axiom, Maxima, Macaulay 2, etc.--even comes close in all these areas. In fact, after a decade of patiently waiting, I doubt they ever will.

Magma is the result of several decades of hard work by extremely talented fulltime mathematicians and programmers such as John Cannon, Allan Steel and Claus Fieker. I've worked with them and they are simply amazing. The situation seemed hopeless. If I had only never used Magma and tasted of the forbidden fruit...

In 2004, sure that there was no possible way to solve this problem, and driven by nothing but pure blind compulsion, I started the Sage project as my little free open source alternative to Magma, and spent an insane amount of time working on it even though I was utterly convinced that there was no hope of Sage ever succeeding. The first version of Sage consisted of the Python interpreter and a few scripts for doing number theory.

After a year, some of my first feedback from the computer algebra research community came from Richard Fateman on December 6, 2005 when he posted his opinion of the Sage project to sci.math.symbolic: ``By avoiding applications (say, to engineering design, finance, education, scientific visualization, etc etc) the activity is essentially doomed. Why? Government funding for people or projects will be a small percentage of the funding for pure mathematics. That's not much. And the future is pretty grim.''

Honestly, I believed he was right, but I just couldn't stop myself from working on Sage. It is now nearly three years later and the Sage project currently has around 100 active developers and 10,000 users. In November 2007, Sage won first place in the scientific category of the Trophees du Libre, which is a major international free software competition. Sage is funded by the US National Science Foundation, the US Department of Defense, the University of Washington, Microsoft Research, Google and private donations. Sage has new releases every two weeks, and typically 30--40 people contribute to each release. All new code contributions to Sage are peer reviewed, and every new function must be documented with tests that illustrate its usage. The docs have over 50,000 lines of input examples.

So what is Sage and what makes it unique? Sage is:

a huge distribution of free open source mathematical software that is surprisingly easy to build from source,

a set of interfaces to most other mathematical software systems, and

a new Python library that fills in huge gaps in other open source math software included in Sage, unifies everything offering a smooth user experience, and provides a modern web-based graphical notebook interface with math typesetting and integrated 2D and 3D graphics.

Sage is the first large general purpose mathematics software system that uses a mainstream programing language (Python) as the end user language. Python--easily one of the world's top 10 programming languages--is a powerful and beautiful modern interpreted programming language with an organized and professional developer base and millions of users. Sage also makes extensive use of a Python-to-C compiler called Cython. Thus Sage has a tremendous advantage over every other general purpose computer algebra system, since Python has thousands of third party libraries, sophisticated support for object serialization, databases, distributed programming, and a major following in scientific computing.

Instead of reinventing the wheel, Sage combines many of the best existing open source systems that have been developed over the last 40 years (about million lines of code) with about 250,000 lines of new code. Every single copy of Sage includes all of the following software (and much much more):

Algebra and calculus: Maxima, SymPy
High precision arithmetic: GMP, MPFR, MPFI, quaddouble, Givaro
Commutative algebra: Singular
Number theory: PARI, NTL, mwrank, ECM, FLINTQS, GMP-ECM
Exact linear algebra: LinBox, IML
Group theory: GAP
Scientific computation: GSL, SciPy, NumPy, cvxopt
Statistical computation: R
Graphics (2d and 3d): Matplotlib, Tachyon3d, Jmol

Sage is thus the first system to combine together such a wide range of libraries and programs in a meaningful way. Instead this huge range of programs is tied together using Python's excellent extensibility via C libraries and using pseudo-tty's. Sage has a highly developed unified collection of pseudo-tty based interfaces that make it is possible to make extensive use of Maple, Mathematica, Magma, Matlab, GAP, Maxima, Singular, PARI, and many other systems from anywhere within a single Sage program.

Curious? If you want a viable open source alternative to Magma, Maple, Mathematica and Matlab, drop everything, try out Sage now and become a Sage developer.

http://www.sagemath.org

Tuesday, April 22, 2008

Sage-3.0 and Beyond

Finally, after much hard work, Sage-3.0 has been released! This release has tons of bug fixes plus several new features such as interact mode in the notebook, the R interface, full GCC-4.3 support, much better automated testing, etc.

Most importantly, Sage-3.0 finally has code for computing with modular abelian varieties. You almost certainly have no clue what those are, but suffice to say that I started the Sage project to compute with them, so having this code in Sage is a major milestone for me and the project.

We dramatically increased our automated testing and example suite so that 51.5 % of functions have autotested examples. There are now nearly 60,000 lines of input examples. In February our testing was in the 30% range. This was a huge amount of work by many many Sage developers, and it has the practical impact that when you type foo? it is nearly twice as likely that you'll see a helpful example.

There is now a new interface to R that uses a pseudotty; this is a completely different alternative to rpy, which makes it possible for the web-based Sage notebook to work as an R GUI, and also makes it so any R command can be used from Sage 100% exactly as in R. It is still clunky and has numerous issues, but it is fairly usable, documented, and has a test suite. Here it is drawing a plot in the notebook:

So what is the next step for Sage? We have finished rapidly expanding by incorporating major new components. Now we will work on making Sage more accessible and writing new code to make Sage highly competitive with all other options. I see The main goals for this coming year are as follows:

Port Sage and as many of its components as possible to Microsoft Windows (32 and 64-bit, MSVC) and to Sun's Solaris. For the Windows port the main challenge is writing a native implementation of pexpect for 32 and 64-bit MSVC windows, and porting several math software projects (e.g., R) that until now haven't had the resources to move beyond 32-bit crippled mingw/cygwin. For Solaris, the main issues are simply fixing bugs and improving the quality of the Sage codebase. Michael Abshoff is leading the charge.
Modular forms and L-functions -- greatly improve all linear algebra needed for modular forms computation; implement a much wider range of algorithms; run large computations and record results in a database. Mike Rubinstein and I will lead the charge.
Symbolic calculus -- a native optimized and much more general implementation of symbolic Calculus in Sage. Gary Furnish is leading the charge.
Combinatorics -- migrate the mupad-combinat group to Sage; implementation of a lot more native group theory in Sage. Mike Hansen is leading the charge.
Statistics -- integrate R and scipy.stats much more fully into Sage with the goal of making Sage the best available environment for statistical computation and education. Harald Schilly is leading the charge.
The Sage notebook -- separate the notebook from Sage (like I did with Cython); develop the notebook's interactive and visualization functionality much more, and make the notebook easier to use as a graphical frontend for all math software. I am leading the charge until I can find somebody else to do so.
Magma -- greatly improve the ability of Sage and Magma to work together (many more automatic conversions between Sage and Magma datastructures, etc.); Magma is by far the best in the world and numerous calculations, and much important code been written in Magma, so it's critical that Sage be fully able leverage all that functionality. Also, this will make comparing the capabilities of the two systems much easier. I'm also leading the charge here.

All of the above goals have received new financial support from industry or the NSF. For example, Microsoft is funding the Windows port, Google is funding many of the above projects, and the NSF is strongly funding a big modular forms and L-functions database project. The sage-devel mailing list has 434 members and over 1000 posts/month, and each Sage release has about 30 contributors. The size of the serious core Sage developer community might be nearing that of Mathematica's core developer group. Now is the moment in time when we have a chance to... create a viable free open source alternative to Magma, Maple, Mathematica, and Matlab.

Tuesday, April 8, 2008

Google Funds Sage!!

Today Chris DiBona (open source programs manager at Google) fully funded a proposal [pdf] that some students and I put together to carry out four critically important Sage development projects this summer (2008):

Mike Hansen (UCSD): Implement combinatorial species in Sage. This will provide a backend to handle a wide range of new combinatorial objects with minimal eﬀort.
Gary Furnish (Utah): Implement new optimized symbolic manipulation code in Cython. This will greatly speed up symbolic computation in Sage, and reduce Sage's dependence on Maxima.
Robert Miller (UW): Implement fast code for permutation groups and backtracking algorithms. This is needed for general applications to graph and code automorphism groups computations, and can be used to exhaustively generate isomorphism class representatives of a wide class of mathematical objects.
Yi Qiang (UW): Further document, improve the usability of, and provide numerous real world examples of distributed computing with Sage. This will greatly help in applications of Sage to research that involve parallel computation.

This funding is not part of the Google Summer of Code (the projects above do not fit with any of the mentoring organizations), but is instead I hope part of more longterm support by Google for the Sage project and open source mathematical software that provides people with a free alternative to Maple, Mathematica, Matlab, and Magma.

Friday, February 29, 2008

This is what I would like in Sage 3.0

DOCUMENTATION:
```
cd SAGE_ROOT/devel/sage/sage/; sage -coverage .
```
should output
```
Overall weighted coverage score:  x%  [[where x >= 50]]
```
Moreover there should be at least one hand written paragraph at the beginning of each file; "sage -coverage" can be adapted to flag whether or not this is there. This will improve the overall quality and maintainability of Sage, and make it easier for users to find examples.
MANIPULATE: Usable "manipulate" functionality standard in Sage. This has very wide applicability.
R: A pexpect interface to R (so, e..g, the notebook can act as a full R notebook using 100% native R syntax). This will matter to a lot of Sage users, and make using R from Sage much easier in some cases (just cut and paste anything from any R examples and it will work). It will also provide something in Sage that one doesn't get with Python + rpy + R.
TIMING/BENCHMARKING: Fully integrate in Sage wjp's code that times all doctests, and start publishing the results on all Sage-supported platforms with every Sage release. This will give people a much better sense about which hardware is best for running Sage, and avoid major performance regressions. Likewise, get the Malb/wjp/my generic benchmarking code into Sage (this provides a doctest like plain text format for creating benchmarks, and is already mostly written).

I've only proposed goals for 3.0 that are wide reaching and will be noticed in some way by most users instead of fairly technical optimizations in a specific area (such as FLINT, Libsingular, or optimized integer allocation). I think changing implementations in specific technical areas to speed things up is more appropriate in week-to-week releases, and is also something we should be very careful about until we have good speed regression testing in place (we should have done step 4
above a long long ago).

Thursday, February 21, 2008

Mathematics Research, Education, and Sage

I think there is a brewing tension between education and research amongst developers involved with the Sage project. More on that in a moment.

I cannot speak for all the core Sage developers, but I think I have some idea what some of them think and care about. My impression is that many of them are involved in Sage because they want to create software that they can use for attacking cutting edge research problems in their research area. This is true of me: I started Sage -- original called "Software for Arithmetic
Geometry Experimentation" -- to have a very powerful open software environment for computing with modular forms, abelian varieties,
elliptic curves, and L-functions.

I am quite happy that Sage has become much more general, addressing a huge range of mathematics, since this expands the range of good developers and also increasing the range of tools math researchers can bring to bare on attacking a problem results in better research. For example, the solutions
to many problems in number theory involve an incredible range of techniques in different areas of mathematics. I'm fairly certain that many of the people who have put in insane hours during the last few years making Sage what is it now (e.g., Mike Hansen, David Roed, Robert Bradshaw, David Harvey, Robert Miller, Emily Kirkman, Martin Albrecht, Michael Abshoff, etc.) have a similar perspective.

On the other hand, I teach high school students for a while every summer (in SIMUW), as do other people like David Roe, and of course I teach undergraduate classes... This is why I put so much effort into co-authoring things like the Sage notebook, which exist mainly to make the functionality of Sage more accessible to a wider range of people.

So, I think there is a brewing tension between education and research amongst developers involved with the Sage project (and in my case in my own mind). Some observations:

1. The research part of the Sage project is thriving and getting sufficient funding independent of any connection with educational applications of Sage. It very very healthy right now.

2. There is a lot of potential benefit to education in having a tool like Sage, since Mathematica is quite expensive, closed, etc. It's good for humanity
for Sage to be genuinely useful in an educational context.

3. People working on Sage for research have very limited time, and it can be frustrating being regularly asked to do things by the education community that not only have nothing to do with research, but are even sometimes at odds with it.

4. It is vitally important for the Sage project to be both well organized and have a clear sense of direction, purpose and goals.

It might be a good idea if the people who are really interested in Sage being a great tool for *education*, would consider doing the
following:

(a) setting up a mailing list called sage-edu for development discussions related to Sage in education. I realize that we just got rid of sage-newbie, but that was for a different reason -- because people were posting sage-support questions there and not getting responses.

(b) Gather together the best education-related tools in some sort
of organized package. This could start with Geogebra. I don't know. The key
thing is that there is no expectation at all that the people into Sage mainly for research do much of anything related to this project. I hope one outcome of this project would be an spkg that when installed would make available lots of cool extra stuff, and of course I would be very supportive about server space, posting of spkg's etc. And when this gets some momentum and
quality behind it this spkg would be included standard in Sage.

Basically I'm suggesting that everyone interested in making Sage the ultimate educational tool get organized, figure out who really wants to put
in an insane amount of effort on this sort of thing, and put together a bunch of cool tools. Stop thinking you have to convince a bunch of us research-focused people to do the work or that your ideas are good -- you
don't -- your ideas are good; it's just that if we put a lot of time into them we won't have time for our research.

Make an spkg that will be trivial to install into Sage and extend its functionality. There is definitely sufficient interest in something like this
for education, there is great potential for funding, and potential for having a major positive impact on society. Thus I think people will emerge who will
want to take up this challenge. I just thing it's better if it can happen for a while unconstrained by the rules or prejudices of the "Sage Research" side
of this project.

In summary, please put a huge amount of effort into getting organized and putting together something polished and great, so I can later effortless assimilate it :-).

Saturday, February 9, 2008

Benchmarketing: Modular Hermite Normal Form

Two days ago there was no fast freely available open source implementation of computation of the Hermite Normal Form of a (square nonsingular) matrix. In fact the fastest implementations I know of -- in Gap, Pari, and NTL, are all massively slower (millions of times) than Magma. See Allan Steel's page on this algorithm.

I just spent the last few days with help from Clement Pernet and Burcin Erocal implementing a fast modular/p-adic Hermite Normal Form algorithm. As soon as the coefficients of the input matrix get at all large our implementation is now the fastest in the world available anywhere (e.g., faster than Magma, etc.). This is incredibly important for my research work on modular forms.

The following plot compares Sage (blue) to Magma (red) for computing the Hermite Normal Form of an nxn matrix with coefficients that have b bits. The x-axis is n, the y-axis is b, and the blue dots show where Sage is much better, whereas the red dots are where Magma is much better. The timings are on a 2.6Ghz core 2 duo laptop running OS X, and both Sage and Magma were built as 32-bit native executables:

(Note: When doing timings under Linux, Magma doing slightly better than it does above, though Sage is still much better as soon as the coefficients are at all large.)

This plot shows the timings of Magma (red) versus Sage (blue) for computing the Hermite Forms of matrices with entries between -2¹² and 2¹². Sage is much faster than Magma, since the blue line is lower (=faster).

For even bigger coefficients Sage has a much bigger advantage. E.g., for a 200 x 200 256-bit matrix, Magma takes 232 seconds whereas Sage takes only 55 seconds.

In order to make this happen we read some old papers, read notes from a talk that Allan Steel (who implemented HNF in Magma) gave, and had to think quite hard to come up with three tricks (2 of which are probably similar to tricks alluded to in Steel's talk notes, but with no explanation about how they work), coded for 3 days at Sage Days 7, discovered that we weren't right about some of our tricks, doing lots of fine tuning, and finally fixed all the problems and got everything to work. We will be writing this up, and of course all the code is open source.

Acknowledgement: Many many thanks to Allan Steel for implementing an HNF in Magma that blows away everything else, which gave us a much better sense of what is possible in practice. Thanks also to the authors of IML whose beautiful GPL'd implementation of balanced p-adic is critical to our fast HNF implementation.