Thursday, July 31, 2014

SageMathCloud -- history and status

2005: I made first release the SageMath software project, with the goal to create a viable open source free alternative to Mathematica, Magma, Maple, Matlab.

2006: First web-based notebook interface for using Sage, called "sagenb". It was a cutting edge "AJAX" application at the time, though aimed at a small number of users.

2007-2009: Much work on sagenb. But it's still not scalable. Doesn't matter since we don't have that many users.

2011-: Sage becomes "self sustaining" from my point of view -- I have more time to work on other things, since the community has really stepped up.

2012: I'm inspired by the Simons Foundation's (and especially Jim Simon's) "cluelessness" about open source software to create a new online scalable web application to (1) make it easier for people to get access to Sage, especially on Windows, and (2) generate a more longterm sustainable revenue stream to support Sage development. (I was invited to a day-long meeting in NYC at Simon's headquarters.)

2012-2013: Spent much of 2012 and early 2013 researching options, building prototypes, some time talking with Craig Citro and Robert Bradshaw (both at Google), and launched SageMathCloud in April 2013. SMC got some high-profile use, e.g., by UCLA's 400+ student calculus course.

2014: Much development over the last 1.5 years. Usage has also grown. There is some growth information here. I also have useful google analytics data from the whole time, which shows around 4000 unique users per week, with an average session duration of 97 minutes (see attached). Number of users has actually dropped off during the summer, since there is much less teaching going on.

SMC itself is written mostly in CoffeeScript using Node.js on the backend. There's a small amount of Python as well.

It's a highly distributed multi-data center application. The database is Cassandra. The backend server processes are mostly Node.js processes, and also nginx+haproxy+stunnel.

A copy of user data is stored in every data center, and is snapshotted every few minutes, both via :
  • ZFS -- for rolling snapshots that vanish after a month -- and via
  • bup -- for snapshots that remain forever, and are consistent across data centers.
These snapshots are critical for making it possible to trust collaborators on projects to not (accidentally) destroy your work. It is not possible for users to delete the bup snapshots, by design.
Here's what it does: realtime collaborative editing of Latex docs, IPython notebooks, Sage worksheets; use the command line terminal; have several people collaborate on a project (=a Linux account).
The main applications seem to be:
  • teaching courses with a programming or math software components -- where you want all your students to be able to use something, e.g., IPython, Julia, etc, and don't want to have to deal with trying to get them to install said software themselves. Also, you want to easily be able to share files with students, see their work in realtime, etc. It's a much, much easier for people to get going that with naked VM's they have to configure -- and also I provide cross-data center replication.
  • collaborative research mathematics -- all co-authors of a paper work together in an SMC project, both writing the paper there and doing computations.
Active development work right now:
  • course management for homework (etc)
  • administration functionality (mainly motivated by self-hosting and better moderation)
  • easy history slider to see all pasts states of a document
  • switching from bootstrap2 to bootstrap3.