Monday, December 22, 2008

new hardware!

I recently received four 24-core Sun X4450 with 128GB RAM, and a Sun X4500 storage box with 8 cores, 32GB RAM, and 24 terabytes of disk space. It took literally less than 2 hours to install Ubuntu 8.0.4 LTS server Linux on all the 4450's (about 15 minute each), but several more days to make some sort of transition over to them.

The machines are fast. Most tests I try run twice as fast on the new machines as on the old sage.math (which was a 1.8Ghz Operton).

The first frustrating transition was that I can no longer run Magma on the new boxes, because they have a different MAC address. I've received one user complaint so far, and expect others, though of course I hope people consider using a viable open source alternative.

I installed about 20 operating systems, including most major Linux distros, FreeBSD, and Solaris under vmware server on one of the boxes, and configured them to all use NFS. Unfortunately, we don't have an NIS/NIS+ or OpenLDAP server setup, so only I can login to those boxes :-(. Setting up NIS or OpenLDAP clients and servers is not for the faint of heart. Anyway, just installing that many Linuxes and configuring them with build tools and NFS took about two days, including time to download installation media. Of all the OS's the only one I gave up on was gentoo, which is a complete disaster (hours of tedious instructions that don't quite work; the live installer crashing; getting the live installer to work, but having vi crash on compilation once it boots up, etc.); that is the worst major operating system I have tried. According to distrowatch, "Gentoo Linux has lost much of its original glory in recent years. Some Gentoo users have come to a realisation that the time-consuming compiling of software packages brings only marginal speed and optimisation benefits. Ever since the resignation of Gentoo's founder and benevolent dictator from the project in 2004, the newly established Gentoo Foundation has been battling with lack of clear directions and frequent developer conflicts..." The failure of Gentoo hopefully helps emphasize the importance of clear direction.

Anyway, it is very nice having about two dozen cleanly installed OS's with a shared filesystem, all running on very fast hardware.

We still haven't installed an operating system on the 24 terabyte disk box, and have a lot of configuration work to do. Fortunately, Tom Boothby is working for me as a sysadmin, so he will do a lot of the hard work.

Friday, November 28, 2008

Sage Patch Review

I just spent a lot of time during the last few days refereeing Sage patches in the Sage trac server. Basically, we got behind and had something like nearly a 100 patches in there that were marked as "[with patch; needs review]", which means that the patch is done, and is just waiting on review. Some of the patches in this state hadn't been commented on since August! In many cases they had "bit rotted", which means that they are broken, since related parts of Sage had changed after the batches had been posted.

In June 2008 at Sage Days 8.5 we had a long meeting and setup a patch referee editor system, but it turned out to be a total failure. Our system was setup to be like a journal editor system, except with the addition of a weekly meeting. Even in person, our one and only meeting was an inefficient experience, and it never worked over irc or email.

Review is a Generic Problem


Robert Bradshaw recently went to a Google Summer of Code Mentors summit, and told me that the number one problem open source projects have that was discussed a lot at the summit is with patch review, in particular, with patches not getting reviewed in a timely fashion.

Review is also a big problem with mathematics journals, though with journals the turnaround time is expected to be months. Or even years -- I have one paper that's been in the review process at Mathematics of Computation for 3 years now, and I've almost never gotten a first referee report back from that journal in less than a year from submission time! But for some reason math journals tend to work. That said, I had some discussions with a prominent publisher about a journal that had fallen apart because the chief editor had seriously dropped the ball, and said publisher had to spend a massive amount of his own time fixing the situation.

Patch Review and Paper Review



Reviewing a math paper is a much different experience from reviewing a software patch. When one reviews a math paper, you basically get a potentially very interesting paper that is at the cutting edge of research, which you would be crazy to not want to read anyways (or you're just going to easily not recommend it for publication). When you review the paper, you excitedly read it, and if things don't make sense -- or the style is bad, or whatever -- you get to ask the author to explain clearly any point that troubles you, and the author definitely will. It's a lot of work, but it is mathematical research.

Reviewing a patch is different than reviewing a math paper. First, very often a patch fixes a subtle bug in Sage or introduces some new functionality. If the patch fixes a subtle bug, then of course the fix itself could easily introduce a new subtle bug, or maybe not really fix the bug correctly, and the referee has to figure this out. If the patch introduces new functionality, it could easily contain bugs, often bugs that the author of the patch hasn't thought about testing for. Often when refereeing a patch, I bounce it back with a use case that exhibits a bug. Some of the most common and dangerous patches are the ones that speed up existing code. These are often some slight changes in a few places, which seem safe enough, and make some code a bit faster. Last time I looked at one of these, I decided to loop over the first 100 obvious test cases -- about the 90th input crashed the new code (the old code worked fine on that example). I would have never noticed that problem had I just read the patch and tried the tests included with it; it was thus critical for me to creatively think of new tests.

Of course, when refereeing a math paper one also looks for subtle flaws and tests theorems for consistency, but still it has a different feel than refereeing a patch.
And with refereeing patches there is also a time window, which also a huge problem. Patches stop working because the rest of the Sage system evolves past them -- math papers don't just stop working because somebody out there proves more theorems.


What Should We Do?


As I said above, I believe that patch review is a major problem for the Sage project. I think the best solution is that there be one person at any given time who acts as the "patch review manager". This person could vary from month to month. This position would be the analogue of release manager or more directly of the chief editor at a traditional journal. Every few days, this person looks at the list of all patches that need review and goes through every single one, either pinging people about them, or refereeing them if possible. Said person, must be bold enough to be able to understand code anywhere in Sage, and have a good sense of who can look at what. If a person did that, I believe our patch review problem would be solved.

If we don't do this, the Sage project will operate at reduced efficiency and new and experienced developers alike will get needlessly frustrated. (NOTE: We currently do have Michael Abshoff who does pretty much what I describe above, but he has other things he should be doing, since he is release manager.) The Sage project will have a harder time reaching its goal to be a viable alternative to Magma, Maple, Mathematica, and Matlab. We have to work smarter and do a better job, for the sake of all the people out there currently forced to use Magma, Maple, Mathematica, or Matlab, instead of an open source peer reviewed scientifically viable system like Sage aims to be.

Sunday, November 23, 2008

Magma and Sage

I spent the weekend working on making Sage and Magma talk to each other more robustly. Getting different math software systems to talk to each other is a problem that the OpenMath project tried to tackle since the 1990s, but they failed. Sage has out of necessity made real (rather than theoretical) progress toward this problem over the years, and what I did this weekend was a little step in the right direction for Sage.

First, I designed with Michael Abshoff a new feature for our testing framework, so we can test only optional doctests that depend on a certain component or program being present. Without a usable, efficient, and flexible testing system it is impossible to develop good code, so we had to do this. Next, I worked on fixing the numerous issues with the current Sage/Magma interface, as evidenced by many existing doctests failing. It was amusing because some of the doctests had clearly never ever succeeded, e.g., things like

sage: magma.eval('2') # optional
sage: other stuff

was in the tree, where the output was simply missing.

Anyway, in fixing some of the much more interesting issues, for example, things like this that involve nested polynomial rings, I guess I came to understand better some of the subtleties of getting math software to talk with other math software.

sage: R.<x,y> = QQ[]; S.<z,w> = R[]; magma(x+z)
boom

The first important point is that one often thinks that the problem with interfacing between systems is given an object X in system (say Sage), finding a string s such that s evaluates in another system (say Magma) to something that "means the same thing" as X. This is the problem that OpenMath attempt to solve (via XML and content dictionaries), but it is not quite the right problem. Instead, given a particular mathematical software system (e.g., Magma) in a particular state, and a view of that state by another system (e.g, Sage), the problem is to then come up with a string that evaluates to the "twin image" of X in the running Magma system.

To do this right involves careful use of caching. Unless X is an atomic element (e.g., a simple thing like an integer) it's important to cache the version of X in Magma as an attribute of X itself. Let's take an example where this caching is very important and subtle. Consider our example above, which has the following Sage code as setup.

sage: R.<x,y> = QQ[]
sage: S.<z,w> = R[]

This creates the nested polynomial ring (QQ[x,y])[z,w]. The new code in sage-3.2.1 (see #4601) does the following to convert x + z to a particular Magma session. Note that the steps to convert x+z to Magma depend on the state of a particular Magma session! Anyway, Sage first gets the Magma version of S, then askes for the generator names of that object in the given Magma session. These are definitely not z,w:

sage: m = magma(S)
sage: m.gen_names()
('_sage_[4]', '_sage_[5]')

The key point is the strings returned by the gen_names command are strings that are valid in Magma and evaluate to each of the generators we're after. They depend on time -- if you did stuff in the interface earlier you would get back different numbers (not 4 and 5). Note that it's very important that the Python objects in Sage that _sage_[4] and _sage_[5] point to do not get garbage collected, since if they do then _sage_[4] and _sage_[5] also become invalid, which is not good. So it's important that the magma version (m above) of S is cached.

Next Sage gets the magma string version of each of the coefficients of the polynomial x+z (over the base ring R) using a similar process. It all works very well without memory leaks, but only because of careful track of state and caching.
And the resulting string expression involves the _sage_[...]'s.

sage: (x+z)._magma_init_(magma)
'((1/1)*1)*_sage_[4]+((1/1)*_sage_[7])*1'
sage: magma(x+z)
z + x


Notice that _magma_init_ -- the function that produces the string that evaluates to something equal to x+z in magma -- now takes as input a particular Magma session (there can be dozens of these in a given Sage session, with different Magma's running on different computers all over the world). This is a change to _magma_init_ that makes the implementation of what's described above easy. It's an API change that might also be carried over to many of the other interfaces (?).

Thursday, November 20, 2008

Sage-3.2 and Mathematica 7.0

We just released sage-3.2! W00t! See for the tickets closed in this release.

There's been a lot of hyperbole due to Mathematica 7.0's recent release. A colleague of mine got a personal email from Stephen Wolfram himself, asking him to try out Mathematica 7.0, and instead my colleague forwarded the message to me and remarked that it was too late, since he had switched to Sage.

I looked over the Mathematica 7.0 release notes... and noticed that they added support for computing with Dirichlet characters. I implemented the code in Magma and Sage, and wrote a chapter in my modular forms book about computing with Dirichlet characters. So I followed the "what's new" to this Mathematica page about their new functionality for Dirichlet characters. It's sad. They give no way of specifying a character, except to give the "ith character", which is meaningless and random (and they say so) -- that's like giving a matrix over a finite field at random. All they give is a function to evaluate characters at numbers -- they don't give functions for arithmetic with them, or computing invariant such as the conductor, which is where all the real fun comes in. Boggle. Sage is light years ahead of Mathematica here.

The Mathematica release notes also brag about finally having something for finite groups, but again it is very minimal compared to what Sage provides (via GAP). Basically all they have is a bunch of tables of groups, but no real algorithms or functionality. The whole approach seems all backwards -- first one should implement algorithms for computing with groups, then use them to make tables in order to really robustify the algorithms, then compare those tables to existing tables, etc. I wonder whether the group theory data in Mathematica was computed using Gap or Magma?

Wednesday, November 12, 2008

I'm back from Sage Days 11 (UT Austin), which was very intense as are all Sage Days. It was a great workshop, and kickstarted many things, I hope. One very obvious big plus that came out of it was Craig Citro and Gonzalo Tornaria's work on massively optimizing the Cython-related dependency checking code in setup.py. The current implementation was some crappy thing I literally did in an hour, which has really bogged down as the Sage core library has gotten massive. Also, there's now a lot of momentum behind implementing Dokchitser's L-functions algorithm natively in Sage, which I'm greatly looking forward to -- Sourav San Gupta did much work on this at Sage Days 11, and has done a lot since too (in addition to his heavy load of grad courses). Mike Rubinstein and Rishi are wrapping Rubinstein's Lcalc using Cython as well, so Sage will soon go from the best system for computing with L-functions to even better!!

Yesterday I thought a bunch about the Sage/Magma interface and wrote several demos and test code. I'm still not 100% sure about how I want to do this -- there are numerous interesting subtleties with Magma. For example, if you create QQ[x,y] in Magma, then create QQ[x,y][z,q], the variables x,y from the first ring will *not* play nicely with the variables x,y in the second ring, which is surprising, since it is different than what happens with Sage. Anyway, this and many other problems are solvable and I'll be working on this again a lot tomorrow.

Thursday, November 6, 2008

Sage Days 11 in Austin Texas is tomorrow

I'm in Seattle, it's 6pm, and Sage Days 11 is in Austin, Texas tomorrow. I'm very excited, and I'm flying there over night. On the one hand, a red eye might make me "totally exhausted" for the conference tomorrow. On the other hand, I slept an average of 2-3 hours per night for a week during Sage Days 8 last time I was in Austin, and I'm reasonably caught up on sleep.

My main goal for the workshop is to continue work of Tim Dokchitser and Jennifer Balakrishnan to create a native implementation of Dokchiter's algorithm for computing L(f,s). Having this is suddenly incredibly important to my number theory research, so I'm finally motivated to want to get it into Sage.

I'll post about the workshop in my blog here.

Wednesday, October 1, 2008

Why I like Sage

Today's blog post is really from Jason Grout. It's why he likes Sage. This is from an email he sent out today, which I liked reading:

"It depends on the area, so you'll have to give me an area to get a more specific answer. In general, Sage has somewhat weaker general symbolic capabilities (i.e., integrals, etc.) than mathematica or maple (though usually this does not seem to be a problem in undergraduate-level problems). It has *much* stronger number theory functionality. Things are object-oriented in Sage and Sage understands mathematical structures and how they relate (using category theory). For example, Sage knows what a vector space is, what a finite field is, etc. You can actually create a finite field or an extension of the rationals and ask questions about it. You can create a polynomial ring over a field and then just work with it.

Sage is also generally faster than either Mathematica or Maple, in my experience.

The web interface to Sage is a huge plus to Sage over mathematica and maple. Of course, being free and open-source is something that is unmatched in either Mathematica or Maple; that is a very important point that is sometimes overlooked. You can literally see what is going on inside of Sage, where you have to guess what is happening in Mathematica or Maple.

One reason that Sage was chosen for an AIM workshop on helping undergraduate research was that the participants didn't have a common computational system (i.e., some had access to Mathematica, some had access to Maple, some had access to neither). They could use Sage because it was free, whereas it would have been problematic to insist that every person somehow acquire access to a specific piece of commercial software. Related to this, I had a student complain on my course evaluations about me using Mathematica in class because it is hard for our students here to have access to Mathematica, and they would have to pay in order to use it at home, etc.

If you are teaching future secondary ed teachers, then they most likely will not have access to Maple or Mathematica when they are teaching high school because of the cost. However, they *will* have access to Sage, so using Sage directly benefits their future students because whatever they learn can be used in their high school classes.

Another huge plus to Sage, in my eyes, is that it is based on one of the most prevalent and easiest-to-use computer languages around, Python. Students that learn to use Mathematica and Maple learn a language that they, in most likelyhood, will never use once they graduate. However, Python is used in many, many industries, so their python knowledge from using Sage is directly applicable later on.

Those are a few things that came to my mind right away. After some time thinking about it, I probably will have other things that make Sage more effective for me than other commercial software.

Thanks,

Jason"

Thursday, July 24, 2008

Austria, ISSAC, and Hidden Markov Models

Yesterday, I gave a controversial plenary lecture on Sage at the 2008 ISSAC symbolic computer algebra conference. It was well received by some proportion of the large audience of about 170 people, and will hopefully influence that research community to be more supportive of open source. In particular, I hope professors doing computer algebra research will allow their Ph.D. students to use open source software on research projects instead of forcing them to use Maple or Mathematica like most of them currently do at RISC.

Many people asked me what I thought of the ISSAC conference -- it was very similar to the yearly ANTS (Algorithmic Number Theory Symposium) meetings we number theorists have, but without number theorists. The meeting has a generally positive "vibe" and participants are enthusiastic about doing computation. My only criticism compared to ANTS is that the publication process for the proceedings isn't nearly as professional as what ANTS does -- the ISSAC publisher's website was in my opinion hell to use, working with the publisher to get my abstract in shape was no fun, and the final paper proceedings look like they were done at Kinko's, whereas ANTS proceedings are part of Springer-Verlag's lecture notes in computer science series, hence look very professional *and* are available online.

I also started looking at getting Hidden Markov Model functionality into Sage, since HMM's are very relevant to certain areas of machine learning, language processing, statistics, financial time series, etc., and Sage doesn't do much in that direction yet. I was prepared to have to write something from scratch myself in Cython, but quickly found GHMM.org, which is GPLv2+, actively used and developed, written in C with a Python interface, and with some work could possibly work very well for Sage. I would certainly rather spend a solid week writing high-quality documentation and tests (and reporting bugs) than months learning, implementing, and optimizing algorithms followed by a solid week writing high-quality documentation and tests, followed by months building a community of developers to maintain said code. The GHMM program linked to above only has an svn distribution and depends on xml, and it depends on swig. I've created an spkg that one can build into sage and which doesn't depend on libxml; it does assume you have swig installed, and takes about 30 seconds to install from source. It's installed into the system-wide sage on sage.math.washington.edu:

was@sage:~/patches$ sage
----------------------------------------------------------------------
| SAGE Version 3.0.5, Release Date: 2008-07-11 |
| Type notebook() for the GUI, and license() for information. |
----------------------------------------------------------------------

sage: import ghmm
sage: ghmm.[tab key]
ghmm.Alphabet ghmm.AminoAcids



In a few hours Michael Abshoff and I are heading to Vienna to meet with Harald Schilly (who I've never met), who is the new sagemath.org webmaster.

Sunday, July 20, 2008

ISSAC

I arrived in Austria last night totally exhausted after spending some time in Amsterdam with Michael Abshoff, Hendrik Lenstra (who was my Ph.D. thesis adviser), Waldek Habisch (the FriCAS guy), and others.

All the students working on funded projects with me this summer have setup blogs, and I'm encouraging them to blog very frequently about their work on Sage. I will start doing the same. Here are links to their blogs:


The last thing I did on Sage was spent nearly a week working on making a release that has fixes for numerous subtle build (and other) bugs on a very wide range of Linux distributions and hardware, e.g., Itanium2's and 64-bit Pentium 4's.

Friday, June 20, 2008

Ondrej, Bernard, and Tim have been sort of arguing in response to Rob Dodier's nice post... The discussion is missing some shades of gray. Here's what we actually do in Sage:

1. Identify needed functionality (e.g., compute characteristic polynomials of matrices, or "symbolic calculus").

2. Find the existing best open source free option that implements 1, if there is any (e.g., say the pari C library in the above example, or "maxima" + a very sophisticated Python interface).

3. Fully integrate 2 into Sage. (Bobby Moretti and I did this for Calculus, in the above example.)

4. Somebody comes along and points out that 2 is not optimal and that they can do better. E.g., they have a new algorithm for computing characteristic polynomials, or think they can implement a drop in replacement for symbolic calculus that is in fact much better.

5. Somebody tries very hard to implement 4. Sometimes they succeed, and we *switch* over to it, and sometimes they fail, and the code never goes into Sage. Both happen and that is fine by me.

This is what is happening with symbolics. In 2005 we identified Maxima as the best open source system given our constraints. Bobby Moretti and I spent about 8 months fully integrating Maxima in as the core of our symbolic calculus functionality in Sage. Bill Furnish "popped up" and claimed he could do something much better, and has been working on this for the last 5 months. If (or when) his code turns out to be clearly better than what is currently built on top of Maxima in Sage, then it will go into Sage. If not, it won't.

The *only* reason I can see for not going from 4 to 5 in the above example is that I, or Bobby Moretti, and Maxima authors, or whatever have big egos and can't stand to see their hard work and code get thrown away. Well I swallow my frickin' pride and just deal with it, and have thrown away massive amounts of code I write. I think that's really the core issue in this whole thread -- some people are really disturbed by code get thrown away... Well deal with it. It's much more important to make decisions that are best for the overall quality of a project and "the community goals" than to stroke your ego by keeping your own code alive forever.

Thursday, May 29, 2008

GMP forked; Torbjorn Granlund: "blatant falsehoods and sinister insinuations"

Torbjorn Granlund:
> The other publicly stated reasons are a mixture of
> blatant falsehoods and sinister insinuations.
> This fork exists for some very different reasons than those publicy
> stated.

I know why the fork exists, so I'll state publicly precisely
why the fork exists. The issues below are not
blatant falsehood or sinister insinuations.

1. FACT: (L)GPLv3 cannot be used by some companies. See below where
I discuss this issue at length.

2. FACT: The GMP project is not developer friendly. This is easy to
see by reading the GMP mailing lists.

3. FACT: The GMP project does not have a regular and
predictable release cycle. How many times has the GMP 5.0 release
been moved back -- it used to be "sometime in 2007", but now the
GMP site says "5.0 is planned to come out in a couple of years."

4. FACT: The working code repository of GMP is closed. There is
no public svn, etc. repository so that anybody can look at the latest
version of the GMP code. See 2.

5. FACT: Some extremely capable developers do not want to contribute to
an LGPL'd project, because they don't want their voluntary contributions
to be used by Maple, Mathematica and Magma to make money.

6. FACT: The GMP project is unfriendly toward natively supporting
Microsoft Windows using MSVC. Just see any email you have sent
to Brian Gladdman.

7. FACT: The GMP project has been unfriendly toward supporting OS X.
Just search the gmp list archives for OS X.


Torbjorn Granlund:
> I think the forkers over at SAGE use the purported v3 incompatibility
> issues as an excuse for forking GMP.

I have been approached by numerous people (from industry,
government, etc.) over the last three years about forking GMP.
GPLv3 was one of numerous factors that really pushed things
over the edge, though a fork would have happened anyways.


Torbjorn Granlund wrote:
> Seriously, please don't use this sort of rhetoric. At least not
> before you've explained the actual problem. I suspect the SAGE team's
> problem is mere stubbornness, in particular since they have not been
> able to produce one single reason for their problems with v3. But I
> am all ears, should somebody spell it out.

It is no secret that the Sage project receives some funding from
Microsoft Research to produce free open source
mathematical software for use by their researchers. See
http://sagemath.org/ack.html
for a list of organizations that fund Sage development.
A company-wide requirement at Microsoft is that they do not run any
GPLv3 code, not even binaries. This is not surprising in light
of numerous quotes by Stallman about how GPLv3 was designed
partly to attack Microsoft. For example:

Stallman: "The point of the GPLv3 conditions that apply to the
Novell/Microsoft deal is to give the rest of the community a defense
against Microsoft's patent threats. If these conditions do their job,
the result will be that Microsoft never goes beyond threats, and the
community is safe."
http://www.technewsworld.com/story/must-read/59780.html?welcome=1212081285

This is one straightforward reason why GPLv3+ only code
is a problem for the Sage project. It has nothing to do with
ideological stubborness by Sage developers (instead it is
ideological stubborness by Stallman).

If Microsoft maintains this policy, then they will
also not run new versions of Mathematica, Maple,
Magma, that depend on any LGPLv3+ code.
So the problem Sage has will also be a problem for all
those projects. And I guarantee you that people at Microsoft
know about this issue.

You probably don't like Microsoft, so I doubt we will find any sympathy
from you as a result of the above. But rest assured
that the above issues with GPLv3 are NOT motivated by "mere
stubborness" by the *Sage team*. So you are wrong about that.

Thursday, May 1, 2008

Can There be a Viable Free Open Source Alternative to Magma, Maple, Mathematica, and Matlab?



For over a decade I have primarily done research in number theory that often involves computation, mainly using Magma. In 2004 I realized that it was stupid for me to continue building my work on Magma because Magma is proprietary, the development model is closed, Magma is expensive which is bad for students, and the language itself lacked many features (e.g., user defined classes) that I had requested repeatedly for over 5 years. Thus to continue to use only Magma unacceptably limited my potential in both research and education.

Having used Magma for many years, I simply could not switch to an existing open source system. The only serious free open source software for number theory is PARI, whose capabilities are far behind that of Magma in numerous critical areas of interest to me, including exact linear algebra, commutative algebra, and algebraic curves. And no other free system--GAP, Singular, Axiom, Maxima, Macaulay 2, etc.--even comes close in all these areas. In fact, after a decade of patiently waiting, I doubt they ever will.

Magma is the result of several decades of hard work by extremely talented fulltime mathematicians and programmers such as John Cannon, Allan Steel and Claus Fieker. I've worked with them and they are simply amazing. The situation seemed hopeless. If I had only never used Magma and tasted of the forbidden fruit...

In 2004, sure that there was no possible way to solve this problem, and driven by nothing but pure blind compulsion, I started the Sage project as my little free open source alternative to Magma, and spent an insane amount of time working on it even though I was utterly convinced that there was no hope of Sage ever succeeding. The first version of Sage consisted of the Python interpreter and a few scripts for doing number theory.

After a year, some of my first feedback from the computer algebra research community came from Richard Fateman on December 6, 2005 when he posted his opinion of the Sage project to sci.math.symbolic: ``By avoiding applications (say, to engineering design, finance, education, scientific visualization, etc etc) the activity is essentially doomed. Why? Government funding for people or projects will be a small percentage of the funding for pure mathematics. That's not much. And the future is pretty grim.''

Honestly, I believed he was right, but I just couldn't stop myself from working on Sage. It is now nearly three years later and the Sage project currently has around 100 active developers and 10,000 users. In November 2007, Sage won first place in the scientific category of the Trophees du Libre, which is a major international free software competition. Sage is funded by the US National Science Foundation, the US Department of Defense, the University of Washington, Microsoft Research, Google and private donations. Sage has new releases every two weeks, and typically 30--40 people contribute to each release. All new code contributions to Sage are peer reviewed, and every new function must be documented with tests that illustrate its usage. The docs have over 50,000 lines of input examples.

So what is Sage and what makes it unique? Sage is:

  1. a huge distribution of free open source mathematical software that is surprisingly easy to build from source,

  2. a set of interfaces to most other mathematical software systems, and

  3. a new Python library that fills in huge gaps in other open source math software included in Sage, unifies everything offering a smooth user experience, and provides a modern web-based graphical notebook interface with math typesetting and integrated 2D and 3D graphics.



Sage is the first large general purpose mathematics software system that uses a mainstream programing language (Python) as the end user language. Python--easily one of the world's top 10 programming languages--is a powerful and beautiful modern interpreted programming language with an organized and professional developer base and millions of users. Sage also makes extensive use of a Python-to-C compiler called Cython. Thus Sage has a tremendous advantage over every other general purpose computer algebra system, since Python has thousands of third party libraries, sophisticated support for object serialization, databases, distributed programming, and a major following in scientific computing.

Instead of reinventing the wheel, Sage combines many of the best existing open source systems that have been developed over the last 40 years (about million lines of code) with about 250,000 lines of new code. Every single copy of Sage includes all of the following software (and much much more):
  • Algebra and calculus: Maxima, SymPy
  • High precision arithmetic: GMP, MPFR, MPFI, quaddouble, Givaro
  • Commutative algebra: Singular
  • Number theory: PARI, NTL, mwrank, ECM, FLINTQS, GMP-ECM
  • Exact linear algebra: LinBox, IML
  • Group theory: GAP
  • Scientific computation: GSL, SciPy, NumPy, cvxopt
  • Statistical computation: R
  • Graphics (2d and 3d): Matplotlib, Tachyon3d, Jmol
Sage is thus the first system to combine together such a wide range of libraries and programs in a meaningful way. Instead this huge range of programs is tied together using Python's excellent extensibility via C libraries and using pseudo-tty's. Sage has a highly developed unified collection of pseudo-tty based interfaces that make it is possible to make extensive use of Maple, Mathematica, Magma, Matlab, GAP, Maxima, Singular, PARI, and many other systems from anywhere within a single Sage program.

Curious? If you want a viable open source alternative to Magma, Maple, Mathematica and Matlab, drop everything, try out Sage now and become a Sage developer.

http://www.sagemath.org

Tuesday, April 22, 2008

Sage-3.0 and Beyond

Finally, after much hard work, Sage-3.0 has been released! This release has tons of bug fixes plus several new features such as interact mode in the notebook, the R interface, full GCC-4.3 support, much better automated testing, etc.


Most importantly, Sage-3.0 finally has code for computing with modular abelian varieties. You almost certainly have no clue what those are, but suffice to say that I started the Sage project to compute with them, so having this code in Sage is a major milestone for me and the project.

We dramatically increased our automated testing and example suite so that 51.5 % of functions have autotested examples. There are now nearly 60,000 lines of input examples. In February our testing was in the 30% range. This was a huge amount of work by many many Sage developers, and it has the practical impact that when you type foo? it is nearly twice as likely that you'll see a helpful example.

There is now a new interface to R that uses a pseudotty; this is a completely different alternative to rpy, which makes it possible for the web-based Sage notebook to work as an R GUI, and also makes it so any R command can be used from Sage 100% exactly as in R. It is still clunky and has numerous issues, but it is fairly usable, documented, and has a test suite. Here it is drawing a plot in the notebook:



So what is the next step for Sage? We have finished rapidly expanding by incorporating major new components. Now we will work on making Sage more accessible and writing new code to make Sage highly competitive with all other options. I see The main goals for this coming year are as follows:
  1. Port Sage and as many of its components as possible to Microsoft Windows (32 and 64-bit, MSVC) and to Sun's Solaris. For the Windows port the main challenge is writing a native implementation of pexpect for 32 and 64-bit MSVC windows, and porting several math software projects (e.g., R) that until now haven't had the resources to move beyond 32-bit crippled mingw/cygwin. For Solaris, the main issues are simply fixing bugs and improving the quality of the Sage codebase. Michael Abshoff is leading the charge.
  2. Modular forms and L-functions -- greatly improve all linear algebra needed for modular forms computation; implement a much wider range of algorithms; run large computations and record results in a database. Mike Rubinstein and I will lead the charge.
  3. Symbolic calculus -- a native optimized and much more general implementation of symbolic Calculus in Sage. Gary Furnish is leading the charge.
  4. Combinatorics -- migrate the mupad-combinat group to Sage; implementation of a lot more native group theory in Sage. Mike Hansen is leading the charge.
  5. Statistics -- integrate R and scipy.stats much more fully into Sage with the goal of making Sage the best available environment for statistical computation and education. Harald Schilly is leading the charge.
  6. The Sage notebook -- separate the notebook from Sage (like I did with Cython); develop the notebook's interactive and visualization functionality much more, and make the notebook easier to use as a graphical frontend for all math software. I am leading the charge until I can find somebody else to do so.
  7. Magma -- greatly improve the ability of Sage and Magma to work together (many more automatic conversions between Sage and Magma datastructures, etc.); Magma is by far the best in the world and numerous calculations, and much important code been written in Magma, so it's critical that Sage be fully able leverage all that functionality. Also, this will make comparing the capabilities of the two systems much easier. I'm also leading the charge here.
All of the above goals have received new financial support from industry or the NSF. For example, Microsoft is funding the Windows port, Google is funding many of the above projects, and the NSF is strongly funding a big modular forms and L-functions database project. The sage-devel mailing list has 434 members and over 1000 posts/month, and each Sage release has about 30 contributors. The size of the serious core Sage developer community might be nearing that of Mathematica's core developer group. Now is the moment in time when we have a chance to... create a viable free open source alternative to Magma, Maple, Mathematica, and Matlab.

Tuesday, April 8, 2008

Google Funds Sage!!

Today Chris DiBona (open source programs manager at Google) fully funded a proposal [pdf] that some students and I put together to carry out four critically important Sage development projects this summer (2008):

  • Mike Hansen (UCSD): Implement combinatorial species in Sage. This will provide a backend to handle a wide range of new combinatorial objects with minimal effort.
  • Gary Furnish (Utah): Implement new optimized symbolic manipulation code in Cython. This will greatly speed up symbolic computation in Sage, and reduce Sage's dependence on Maxima.
  • Robert Miller (UW): Implement fast code for permutation groups and backtracking algorithms. This is needed for general applications to graph and code automorphism groups computations, and can be used to exhaustively generate isomorphism class representatives of a wide class of mathematical objects.
  • Yi Qiang (UW): Further document, improve the usability of, and provide numerous real world examples of distributed computing with Sage. This will greatly help in applications of Sage to research that involve parallel computation.


This funding is not part of the Google Summer of Code (the projects above do not fit with any of the mentoring organizations), but is instead I hope part of more longterm support by Google for the Sage project and open source mathematical software that provides people with a free alternative to Maple, Mathematica, Matlab, and Magma.

Friday, February 29, 2008

This is what I would like in Sage 3.0


  1. DOCUMENTATION:
    cd SAGE_ROOT/devel/sage/sage/; sage -coverage .

    should output
    Overall weighted coverage score:  x%  [[where x >= 50]]

    Moreover there should be at least one hand written paragraph at the beginning of each file; "sage -coverage" can be adapted to flag whether or not this is there. This will improve the overall quality and maintainability of Sage, and make it easier for users to find examples.

  2. MANIPULATE: Usable "manipulate" functionality standard in Sage. This has very wide applicability.

  3. R: A pexpect interface to R (so, e..g, the notebook can act as a full R notebook using 100% native R syntax). This will matter to a lot of Sage users, and make using R from Sage much easier in some cases (just cut and paste anything from any R examples and it will work). It will also provide something in Sage that one doesn't get with Python + rpy + R.

  4. TIMING/BENCHMARKING: Fully integrate in Sage wjp's code that times all doctests, and start publishing the results on all Sage-supported platforms with every Sage release. This will give people a much better sense about which hardware is best for running Sage, and avoid major performance regressions. Likewise, get the Malb/wjp/my generic benchmarking code into Sage (this provides a doctest like plain text format for creating benchmarks, and is already mostly written).



I've only proposed goals for 3.0 that are wide reaching and will be noticed in some way by most users instead of fairly technical optimizations in a specific area (such as FLINT, Libsingular, or optimized integer allocation). I think changing implementations in specific technical areas to speed things up is more appropriate in week-to-week releases, and is also something we should be very careful about until we have good speed regression testing in place (we should have done step 4
above a long long ago).

Thursday, February 21, 2008

Mathematics Research, Education, and Sage

I think there is a brewing tension between education and research amongst developers involved with the Sage project. More on that in a moment.

I cannot speak for all the core Sage developers, but I think I have some idea what some of them think and care about. My impression is that many of them are involved in Sage because they want to create software that they can use for attacking cutting edge research problems in their research area. This is true of me: I started Sage -- original called "Software for Arithmetic
Geometry Experimentation" -- to have a very powerful open software environment for computing with modular forms, abelian varieties,
elliptic curves, and L-functions.

I am quite happy that Sage has become much more general, addressing a huge range of mathematics, since this expands the range of good developers and also increasing the range of tools math researchers can bring to bare on attacking a problem results in better research. For example, the solutions
to many problems in number theory involve an incredible range of techniques in different areas of mathematics. I'm fairly certain that many of the people who have put in insane hours during the last few years making Sage what is it now (e.g., Mike Hansen, David Roed, Robert Bradshaw, David Harvey, Robert Miller, Emily Kirkman, Martin Albrecht, Michael Abshoff, etc.) have a similar perspective.

On the other hand, I teach high school students for a while every summer (in SIMUW), as do other people like David Roe, and of course I teach undergraduate classes... This is why I put so much effort into co-authoring things like the Sage notebook, which exist mainly to make the functionality of Sage more accessible to a wider range of people.

So, I think there is a brewing tension between education and research amongst developers involved with the Sage project (and in my case in my own mind). Some observations:

1. The research part of the Sage project is thriving and getting sufficient funding independent of any connection with educational applications of Sage. It very very healthy right now.

2. There is a lot of potential benefit to education in having a tool like Sage, since Mathematica is quite expensive, closed, etc. It's good for humanity
for Sage to be genuinely useful in an educational context.

3. People working on Sage for research have very limited time, and it can be frustrating being regularly asked to do things by the education community that not only have nothing to do with research, but are even sometimes at odds with it.

4. It is vitally important for the Sage project to be both well organized and have a clear sense of direction, purpose and goals.

It might be a good idea if the people who are really interested in Sage being a great tool for *education*, would consider doing the
following:

(a) setting up a mailing list called sage-edu for development discussions related to Sage in education. I realize that we just got rid of sage-newbie, but that was for a different reason -- because people were posting sage-support questions there and not getting responses.

(b) Gather together the best education-related tools in some sort
of organized package. This could start with Geogebra. I don't know. The key
thing is that there is no expectation at all that the people into Sage mainly for research do much of anything related to this project. I hope one outcome of this project would be an spkg that when installed would make available lots of cool extra stuff, and of course I would be very supportive about server space, posting of spkg's etc. And when this gets some momentum and
quality behind it this spkg would be included standard in Sage.

Basically I'm suggesting that everyone interested in making Sage the ultimate educational tool get organized, figure out who really wants to put
in an insane amount of effort on this sort of thing, and put together a bunch of cool tools. Stop thinking you have to convince a bunch of us research-focused people to do the work or that your ideas are good -- you
don't -- your ideas are good; it's just that if we put a lot of time into them we won't have time for our research.

Make an spkg that will be trivial to install into Sage and extend its functionality. There is definitely sufficient interest in something like this
for education, there is great potential for funding, and potential for having a major positive impact on society. Thus I think people will emerge who will
want to take up this challenge. I just thing it's better if it can happen for a while unconstrained by the rules or prejudices of the "Sage Research" side
of this project.

In summary, please put a huge amount of effort into getting organized and putting together something polished and great, so I can later effortless assimilate it :-).

Saturday, February 9, 2008

Benchmarketing: Modular Hermite Normal Form

Two days ago there was no fast freely available open source implementation of computation of the Hermite Normal Form of a (square nonsingular) matrix. In fact the fastest implementations I know of -- in Gap, Pari, and NTL, are all massively slower (millions of times) than Magma. See Allan Steel's page on this algorithm.

I just spent the last few days with help from Clement Pernet and Burcin Erocal implementing a fast modular/p-adic Hermite Normal Form algorithm. As soon as the coefficients of the input matrix get at all large our implementation is now the fastest in the world available anywhere (e.g., faster than Magma, etc.). This is incredibly important for my research work on modular forms.

The following plot compares Sage (blue) to Magma (red) for computing the Hermite Normal Form of an nxn matrix with coefficients that have b bits. The x-axis is n, the y-axis is b, and the blue dots show where Sage is much better, whereas the red dots are where Magma is much better. The timings are on a 2.6Ghz core 2 duo laptop running OS X, and both Sage and Magma were built as 32-bit native executables:



(Note: When doing timings under Linux, Magma doing slightly better than it does above, though Sage is still much better as soon as the coefficients are at all large.)

This plot shows the timings of Magma (red) versus Sage (blue) for computing the Hermite Forms of matrices with entries between -212 and 212. Sage is much faster than Magma, since the blue line is lower (=faster).


For even bigger coefficients Sage has a much bigger advantage. E.g., for a 200 x 200 256-bit matrix, Magma takes 232 seconds whereas Sage takes only 55 seconds.

In order to make this happen we read some old papers, read notes from a talk that Allan Steel (who implemented HNF in Magma) gave, and had to think quite hard to come up with three tricks (2 of which are probably similar to tricks alluded to in Steel's talk notes, but with no explanation about how they work), coded for 3 days at Sage Days 7, discovered that we weren't right about some of our tricks, doing lots of fine tuning, and finally fixed all the problems and got everything to work. We will be writing this up, and of course all the code is open source.

Acknowledgement: Many many thanks to Allan Steel for implementing an HNF in Magma that blows away everything else, which gave us a much better sense of what is possible in practice. Thanks also to the authors of IML whose beautiful GPL'd implementation of balanced p-adic is critical to our fast HNF implementation.

Wednesday, January 30, 2008

Josh Kantor's Lorenz attractor example

Josh posted a nice example of plotting a Lorenz attractor in Sage:


Put this in a notebook cell (be careful about newlines):

Integer = int
RealNumber = float

def lorenz(t,y,params):
return [params[0]*(y[1]-y[0]),y[0]*(params[1]-y[2])- y[1],y[0]*y[1]-params[2]*y[2]]

def lorenz_jac(t,y,params):
return [ [-params[0],params[0],0],[(params[1]-y[2]),-1,-y[0]],[y[1],y[0],-params[2]],[0,0,0]]

T=ode_solver()
T.algorithm="bsimp"
T.function=lorenz
T.jacobian=lorenz_jac
T.ode_solve(y_0=[.5,.5,.5],t_span=[0,155],params=[10,40.5,3],num_points=10000)
l=[T.solution[i][1] for i in range(len(T.solution))]

line3d(l,thickness=0.3, viewer='tachyon', figsize=8)


and this is what you get (click to zoom):

Friday, January 11, 2008

I'm glad I chose Python for Sage -- some cool scientific computing Python projects

When brainstorming for talk ideas for a workshop, Fernando Perez came up with a bunch of random off-the-top of his head very high quality/impact scientific computing projects that involve Python. Here they are (the following was written by Fernando Perez, the author of IPython):

- I can obviously talk about ipython and related projects, but I can also give a math/technical talk off this type of work (the whole implementation is python, and uses quite a few tricks):

http://dx.doi.org/10.1016/j.acha.2007.08.001

- Brian Granger (from Tech-X http://txcorp.org) has a NASA grant to develop distributed arrays for Python, using IPython and numpy. That would make for an excellent talk, I think (matlab, interactivesupercomputing.com, and all the 'big boys' are after distributed arrays).

- Trilinos (http://trilinos.sandia.gov), a large set of parallel solvers developed at Sandia National Lab, has a great set of Python bindings (even usable interactively via ipython).

- MPI4Py is an excellent set of MPI bindings for Python, and its author Lisandro Dalcin is also the developer of Petsc4py:
* http://mpi4py.scipy.org/
* http://code.google.com/p/petsc4py/
If Lisandro can't come, I can contact one of the Petsc guys who's a great python developer and see if he's coming or can make it, he's an excellent speaker (Matt Knepley from Argonne Nat. Lab http://www-unix.mcs.anl.gov/~knepley/).

- The Hubble space telescope people are all pyhton based, and have done enormous amounts of work on both their internal image processing pipeline and contrbuting to Matplotlib (they have currently a developer working full time on matplotlib).

- The NetworkX (Sage uses this) guys from Los Alamos will probably be coming: https://networkx.lanl.gov/wiki.

- The Scripps institute has an extremely strong Python team doing molecular visualization and visual programming. Their work is very impressive, and they're already in San Diego, so it's a no brainer to attend. Their presentations are always of very high quality: http://mgltools.scripps.edu.

- JPL uses python extensively (they've contracted out work for matplotlib to be extended to suit their specific needs).

- My new job at UC Berkeley is on neuroscience, and we could present the work that's being developed there for fMRI analysis (all Python based, fully open source, NIH funded).

- Andrew Straw's work (http://www.its.caltech.edu/~astraw/) on real-time 3d tracking of fruit flies is very, very impressive. All python based, hardware control, real-time parallel computing.

- The CACR group at Caltech has the contract for DANSE (http://wiki.cacr.caltech.edu/danse/index.php/Main_Page), the Spallation Neutron Source's data analysis framework, all python. This is currently the largest experiment being funded in the USA.

Monday, January 7, 2008

AMS Meeting Day 2: The Competition

Today I violated the Mathematica license agreement in front of Eric Weisstein (a famous Mathematica developer), I talked with the developers of Wiris (rhymes with virus) which is a commercial competitor to the Sage notebook, discussed Mupad with a longtime Mupad developer, and gave away $1000 of tutorials, DVD's, and other goodies at the Sage AMS exhibit booth.

There is a project called "Wiris", which I had never heard of until today. So Tom Boothby and I had a very interesting talk with the people at the Wiris Booth (http://www.wiris.com/). Wiris is a closed-source commercial math software company in Barcelona that makes a web-based interface to their own custom mathematical software (interestingly, one of their main developers took Calculus from Jordi Quer, who wrote a lot of modular-forms related code for Sage recently). Their primary audience and market right now is European Government Agencies who use their software for high school and beginning college education. Their software is much different than the Sage notebook, since it is written entirely in Java instead of being an AJAX javascript application. They knew about Sage and asked if it used OpenMath or MathML, and I explained that it didn't use either, that it shouldn't and that for our purposes (i.e., interfacing math software) those technologies do not solve the problems we have -- in fact, they are worse than useless. They said that us not taking the OpenMath route was disappointing. They ended the discussion by telling us that their web-based interface is much better than ours :-). I guess they were trying to intimidate us.

Eric Weisstein -- who told us that he is now an official Mathematica developer (doing graph theory among other things) in addition to his "Math World" came over to the Sage booth and asked a lot of pointed questions about Sage, mostly "how does this get funded?" We ended up talking for about an hour. There are numerous people at Mathematica who are well aware of Sage, and he claims he doesn't see Sage as a threat to Mathematica as a company. He said Mathematica has had a recent explosion in hiring as a result of greatly increased sales because of the new "Demonstrations" feature in Mathematica 6. We also talked a lot about graph theory and how Sage has a complete implementation of graph isomorphism testing, etc., which greatly impressed him (thanks Robert Miller!). Jason Grout asked about the Mathematica end use license agreement and Eric pleaded IANAL and I demonstrated using Mathematica via the Sage notebook -- all locally over localhost on a machine with a valid Mathematica license -- that this violates the Mathematica license. He got annoyed when I did a Mathematica graph through the Sage notebook... Later Eric talked about how he "hoped" Sage would continue to have momentum and not just die like other free projects. He gave Maxima as an example of a dead project, and seemed quite shocked when I mentioned that they are very much alive and have regular releases, etc. He then said that there is no such thing as free.

I met somebody from MuPad who has worked on that closed source project for a decade. They used to be a German national government funded project, then had an academic research branch until one year ago, and now are 100% commercial and private. I asked about their vision for the next 5 years, and he said "MuPad will survive", and said they were mainly happy to be stable. They also mentioned wanting to do more numerical and applied functionality.

I talked with a Unix guy who works on some Scientific Workplace. There is only a Windows version, and he is working on doing an OS X/Linux port, which they will finish "in the future"!

And that's just a little of what happened today...

Sunday, January 6, 2008

AMS Meeting Part 1

I am completely exhausted right now, having put in a huge amount of time with little sleep during the last few days writing 3d graphics code for Sage (with Robert Bradshaw), making the Sage-2.9.2 release (with Michael Abshoff), printing fliers, tutorials, and preparing DVDs (with my brother) for the massive joint AMS meeting where there will be a Sage exhibit booth.

Our booth with have a huge banner that says "Creating a viable free open source alternative to Magma, Maple, Mathematica, and Matlab."
An Austrian named Harald Schilly did most of the work creating this poster.

I have no clue what to expect during the next few days. I've never run an exhibit booth before; I'm a mathematician not a "vendor", and my product is free. This should be very interesting. I've been to numerous AMS meetings before, and every booth I remember is commercial... TI, book publishers, math games, Maple, Mathematica, etc.