Thursday, November 19, 2015

"Prime Numbers and the Riemann Hypothesis", Cambridge University Press, and SageMathCloud

Overview

Barry Mazur and I spent over a decade writing a popular math book "Prime Numbers and the Riemann Hypothesis", which will be published by Cambridge Univeristy Press in 2016.  The book involves a large number of illustrations created using SageMath, and was mostly written using the LaTeX editor in SageMathCloud.

This post is meant to provide a glimpse into the writing process and also content of the book.

This is about making research math a little more accessible, about math education, and about technology.

Intended Audience: Research mathematicians! Though there is no mathematics at all in this post.

The book is here: http://wstein.org/rh/
Download a copy before we have to remove it from the web!

Goal: The goal of our book is simply to explain what the Riemann Hypothesis is really about. It is a book about mathematics by two mathematicians. The mathematics is front and center; we barely touch on people, history, or culture, since there are already numerous books that address the non-mathematical aspects of RH.  Our target audience is math-loving high school students, retired electrical engineers, and you.

Clay Mathematics Institute Lectures: 2005

The book started in May 2005 when the Clay Math Institute asked Barry Mazur to give a large lecture to a popular audience at MIT and he chose to talk about RH, with me helping with preparations. His talk was entitled "Are there still unsolved problems about the numbers 1, 2, 3, 4, ... ?"

See http://www.claymath.org/library/public_lectures/mazur_riemann_hypothesis.pdf

Barry Mazur receiving a prize:


Barry's talk went well, and we decided to try to expand on it in the form of a book. We had a long summer working session in a vacation house near an Atlantic beach, in which we greatly refined our presentation. (I remember that I also finally switched from Linux to OS X on my laptop when Ubuntu made a huge mistake pushing out a standard update that hosed X11 for everybody in the world.)

Classical Fourier Transform

Going beyond the original Clay Lecture, I kept pushing Barry to see if he could describe RH as much as possible in terms of the classical Fourier transform applied to a function that could be derived via a very simple process from the prime counting function pi(x). Of course, he could. This led to more questions than it answered, and interesting numerical observations that are more precise than analytic number theorists typically consider.

Our approach to writing the book was to try to reverse engineer how Riemann might have been inspired to come up with RH in the first place, given how Fourier analysis of periodic functions was in the air. This led us to some surprisingly subtle mathematical questions, some of which we plan to investigate in research papers. They also indirectly play a role in Simon Spicer's recent UW Ph.D. thesis. (The expert analytic number theorist Andrew Granville helped us out of many confusing thickets.)

In order to use Fourier series we naturally have to rely heavily on Dirac/Schwartz distributions.

SIMUW

University of Washington has a great program called SIMUW: "Summer Institute for Mathematics at Univ of Washington.'' It's for high school; admission is free and based on student merit, not rich parents, thanks to an anonymous wealthy donor!  I taught a SIMUW course one summer from the RH book.  I spent one very intense week on the RH book, and another on the Birch and Swinnerton-Dyer conjecture.

The first part of our book worked well for high school students. For example, we interactively worked with prime races, multiplicative parity, prime counting, etc., using Sage interacts. The students could also prove facts in number theory. They also looked at misleading data and tried to come up with conjectures. In algebraic number theory, usually the first few examples are a pretty good indication of what is true. In analytic number theory, in contrast, looking at the first few million examples is usually deeply misleading.

Reader feedback: "I dare you to find a typo!"

In early 2015, we posted drafts on Google+ daring anybody to find typos. We got massive feedback. I couldn't believe the typos people found. One person would find a subtle issue with half of a bibliography reference in German, and somebody else would find a different subtle mistake in the same reference. Best of all, highly critical and careful non-mathematicians read straight through the book and found a large number of typos and minor issues that were just plain confusing to them, but could be easily clarified.

Now the book is hopefully not riddled with errors. Thanks entirely to the amazingly generous feedback of these readers, when you flip to a random page of our book (go ahead and try), you are now unlikely to see a typo or, what's worse, some corrupted mathematics, e.g., a formula with an undefined symbol.

Designing the cover

Barry and Gretchen Mazur, Will Hearst, and I designed a cover that combined the main elements of the book: title, Riemann, zeta:



Then designers at CUP made our rough design more attractive according their tastes. As non-mathematician designers, they made it look prettier by messing with the Riemann Zeta function...


Publishing with Cambridge University Press

Over years, we talked with people from AMS, Springer-Verlag and Princeton Univ Press about publishing our book. I met CUP editor Kaitlin Leach at the Joint Mathematics Meetings in Baltimore, since the Cambridge University Press (CUP) booth was directly opposite the SageMath booth, which I was running. We decided, due to their enthusiasm, which lasted more than for the few minutes while talking to them (!), past good experience, and general frustration with other publishers, to publish with CUP.


What is was like for us working with CUP

The actual process with CUP has had its ups and downs, and the production process has been frustrating at times, being in some ways not quite professional enough and in other ways extremely professional. Traditional book publication is currently in a state of rapid change. Working with CUP has been unlike my experiences with other publishers.

For example, CUP was extremely diligent putting huge effort into tracking down permissions for every one of the images in our book. And they weren't satisfy with a statement on Wikipedia that "this image is public domain", if the link didn't work. They tracked down alternatives for all images for which they could get permissions (or in some cases have us partly pay for them). This is in sharp contrast to my experience with Springer-Verlag, which spent about one second on images, just making sure I signed a statement that all possible copyright infringement was my fault (not their's).

The CUP copyediting and typesetting appeared to all be outsourced to India, organized by people who seemed far more comfortable with Word than LaTeX. Communication with people that were being contracted out about our book's copyediting was surprisingly difficult, a problem that I haven't experienced before with Springer and AMS. That said, everything seems to have worked out fine so far.

On the other hand, our marketing contact at CUP mysteriously vanished for a long time; evidently, they had left to another job, and CUP was recruiting somebody else to take over. However, now there are new people and they seem extremely passionate!

The Future

I'm particularly excited to see if we can produce an electronic (Kindle) version of the book later in 2016, and eventually a fully interactive complete for-pay SageMathCloud version of the book, which could be a foundation for something much broader with publishers, which addresses the shortcoming of the Kindle format for interactive computational books. Things like electronic versions of books are the sort of things that AMS is frustratingly slow to get their heads around...

Conclusions

  1. Publishing a high quality book is a long and involved process.
  2. Working with CUP has been frustrating at times; however, they have recruited a very strong team this year that addresses most issues.
  3. I hope mathematicians will put more effort into making mathematics accessible to non-mathematicians.
  4. Hopefully, this talk will give provide a more glimpse into the book writing process and encourage others (and also suggest things to think about when choosing a publisher and before signing a book contract!)

Wednesday, September 30, 2015

What is SageMath's strategy?

Here is SageMath's strategy, or at least what my strategy toward SageMath has been for the last 5 years.

Diagnose the problem

Statement of problem: SageMath is not growing.

Justification

Facts: Growth in the number of active users [1] of SageMath has stalled since about 2011 (as defined by Google analytics on sagemath.org). From 2008 to 2011, year-on-year growth was about 50%, which isn't great. However, from 2011 to now, year-on-year growth is slightly less than 0%. It was maybe -10% from 2013 to 2014. Incidentally, number of monthly active users of sagemath.org is about 68,652 right now, but the raw number isn't as import as the year-to-year rate of change.

I set an overall mission statement for the Sage project at the outset, which was is to be a viable alternative to Magma, Maple, Mathematica and Matlab. Being a "viable alternative" is something that holds or doesn't for specific people. A useful measure of this mission then is whether or not people use Sage. This is a different metric than trying to argue from "first principles" by making a list of features of each system, comparing benchmarks, etc.















Guiding policies

Statement of policy: focus on undergraduate students in STEM courses (science, tech, engineering, math)

Justification

In order for Sage to start growing again, identify groups of people that are not using Sage. Then decide, for each of these groups, who might find value in using Sage, especially if we are able to put work into making it easier for them to benefit from Sage. This is something to re-evaluate periodically. In itself, this is very generic -- it's what any software project that wishes to grow should do. The interesting part is the details.
Some big groups of potential future users of Sage, who use Sage very little now, include
  • employees/engineers in various industries (from defense contractors, to finance, to health care to "data science").
  • researchers in area of mathematics where Sage is currently not popular
  • undergraduate students in STEM courses (science, tech, engineering, math)
I think by far the most promising group is "undergraduate students in STEM courses". In many cases they use no software at all or are unhappy with what they do use. They are extremely cost sensitive. Open source provides a unique advantage in education because it is less expensive than closed source software, and having access to source code is something that instructors consider valuable as part of the learning experience. Also, state of the art performance, which often requires enormous dedicated for-pay work, is frequently not a requirement.

Actions

  • (a) Make access to Sage as easy as possible.
  • (b) Encourage the creation of educational resources (books, tutorials, etc.) that make using Sage for particular courses as easy as possible.
  • (c) Implement missing functionality in Sage that is needed in support of undergraduate teaching.

Justification

Why don't more undergraduates use Sage? For the most part, students use what they are told to use by their instructors. So why don't instructors chose to use Sage? (a) Sage is not trivial to install (in fact it is incredibly hard to install), (b) There are limited resources (books, tutorials, course materials, etc.) for making using Sage really easy, (c) Sage is missing key functionality needed in support undergraduate teaching.

Regarding (c), in 2008 Sage was utterly useless for most STEM courses. However, over the years things changed for the better, due to the hard work of Rob Beezer, Karl Dieter, Burcin Erocal, and many others. Also, for quite a bit of STEM work, the numerical Python ecosystem (and/or R) provides much of what is needed, and both have evolved enormously in recent years. They are all usable from Sage, and making such use easier should be an extremely high priority. Related -- Bill Hart wrote "I recently sat down with some serious developers and we discussed symbolics in Sage (which I know nothing about). They argued that Sage is not a viable contender in that area, and we discussed some of the possible reasons for that. " The reason is that the symbolic functionality in Sage is motivated by making Sage useful for undergraduate teaching; it has nothing to do with what serious developers in symbolics would care about.

Regarding (b), an NSF (called "UTMOST") helped in this direction... Also, Gregory Bard wrote "Sage for undergraduates", which is exactly the sort of thing we should be very strongly encouraging. This is a book that is published by the AMS and is also freely available. And it squarely addresses exactly this audience. Similarly, the French book that Paul Zimmerman edited is fantastic for France. Let's make an order of magnitude similar resources along these lines! Let's make vastly more tutorials and reference manuals that are "for undergraduates".

Regarding (a), in my opinion the most viable option that fits with current trends in software is a full web application that provides access to Sage. SageMathCloud is what I've been doing in this direction, and it's been growing since 2013 at over 100% year on year, and much is in place so that it could scale up to more users. It still has a huge way to go regarding user friendliness, and it is still losing money every month. But it is a concrete action toward which nontrivial effort has been invested, and it has the potential to solve problem (a) for a large number of potential STEM users. College students very often have extremely good bandwidth coupled with cheap weak laptops, so a web application is the natural solution for them.

Though much has been done to make Sage easier to install on individual computers, it's exactly the sort of problem that money could help solve, but for which we have little money. I'm optimistic that OpenDreamKit will do something in this direction.

[I've made this post motivated by the discussion in this thread.  Also, I used the framework from this book.]

Tuesday, September 22, 2015

SageMathCloud's poor user retention rate

Poor retention rate

Many people try SageMathCloud, but only a small percentage stick around.  I definitely don't know why. Recent SageMathCloud rates are below 4%:





Is it performance?

Question: Are the people who try SMC discouraged by performance issues?

I think it's unlikely many users are leaving due to hitting noticeable performance issues.  I think I would know, since there's a huge bold messages all over the site that say "Email help@sagemath.com: in case of problems, do not hesitate to immediately email us. We want to know if anything is broken!"    In the past when there have been performance or availability issues -- which of course do happen sometimes due to bugs or whatever -- I quickly get a lot of emails.  I haven't got anything that mentioned performance recently.  And usage of SMC is at an all time high: in the last day there were 676 projects created and 3500 projects modified -- which is significantly higher than ever before since the site started.  It's also about 2.2x what we had exactly a year ago.    


Is it the user interface?

Question: Is the SMC user interface highly discouraging and difficult to use?

My current best guess is that the main reason for attrition of our users is that 
they do not understand how to actually use SageMathCloud (SMC), and the interface doesn't help at all.   I think a large number of users get massively confused and lost when trying to use SMC.    It's pretty obvious this happens if you just watch what they do...    In order to have a foundation on which to fix that, the plan I came up with in May was to at least fix the frontend implementation so that it would be  much easier to do development with -- by switching from a confusing  mess of jQuery soup, e.g., 2012-style single page app development -- to Facebook's new React.js approach.  This is basically half done and deployed, and I'm going to work very hard for a while to finish it.   Once it's done, it's going to be much easier to improve the UI to  make it more user friendly.

Is it the open source software?

Question: Is open source mathematical software not sufficiently user friendly?

Fixing the UI probably won't help so much with improving the underlying open 
source mathematical software to be friendly though.    This is a massive, deep, and very difficult problem, and might be why growth of Sage stopped in 2011:





SageMath (and maybe Numpy/Scipy/IPython/etc.) are not as user friendly as Mathematica/Matlab.   I think they could be even more user friendly, but it's highly unlikely as long as the developers are mostly working on SageMath in their spare time as part of advanced research projects (which have little to do with user friendliness).  

Analyzing data about mistakes, frustation, and issues people actually have with  real worksheets and notebooks could also help a lot with directing our  effort in improving Sage/Python/Numpy/etc to be more user friendly.


Is it support?

Question: Are users frustrated by lack of interactive support?

Having integrated high-quality support for users inside SMC, in which we help 
them write code, answer questions, etc., could help with retention.  


Why don't you use SageMathCloud?

I've been watching this  stuff closely for over a decade most waking moments, and everybody likes to complain to me.    Why don't you use SageMathCloud?   Tell me:  wstein@sagemath.com.

Thursday, September 10, 2015

Funding Open Source Mathematical Software in the United States

I do not know how to get funding for open source mathematical software in the United States. However, I'm trying.

Why: Because Sage is Hobbling Along

Despite what we might think in our Sage-developer bubble, Sage is hobbling along, and without an infusion of financial support very soon, I think the project is going to fail in the next few years. I have access to Google analytics data for sagemath.org since 2007, and there has been no growth  in active users of the website since 2011:

Something that is Missing

The worse part of all for me, after ten years, is seeing things like this email today from John Palmieri, where he talks about writing slow but interesting algebraic topology code, and needing help from somebody who knows Cython to actually make his code fast.

I know from my three visits to the Magma group in Sydney that such assistance is precisely what having real financial support can provide. Such money makes it possible to have fulltime people who know the tools and how to optimize them well, and they work on this sort of speedup and integration -- this "devil is in the details" work -- for each major contribution (they are sort of like a highly skilled version of a journal copy editor and referee all in one). Doing this makes a massive difference, but also costs on the order of $1 million / year to have any real impact. 1 million is probably the Magma budget to support around 10 people and periodic visitors, and of course like 1% of the budget of Matlab/Mathematica. Magma has this support partly because Magma is closed source, and maintains tight control on who may use it.

Searching for a Funding Model

Sage is open source and freely available to all, so it is of potential huge value to the community by being owned by everybody and changeable. However, those who fund Magma (either directly or indirectly) haven't funded Sage at the same level for some reason. I can't make Sage closed source and copy that very successful funding model. I've tried everything I can think of given the time and resources I have, and the only model left that seems able to support open source is having a company that does something else well and makes money, then using some of the profit to fund open source (Intel is the biggest contributor to Linux).

SageMath, Inc.

Since I failed to find any companies that passionately care about Sage like Intel/Google/RedHat/etc. care about Linux, I started one. I've been working on SageMathCloud extremely hard for over 3 years now, with the hopes that at least it could be a way to fund Sage development.

Saturday, September 5, 2015

The Simons Foundation and Open Source Software

Jim Simons

Jim Simons is a mathematician who left academia to start a hedge fund that beat the stock market. He contributes back to the mathematical community through the Simons Foundation, which provides an enormous amount of support to mathematicians and physicists, and has many outreach programs.

SageMath is a large software package for mathematics that I started in 2005 with the goal of creating a free open source viable alternative to Magma, Mathematica, Maple, and Matlab. People frequently tell me I should approach the Simons Foundation for funding to support Sage. For example:
Jim Simons, after retiring from Renaissance Technologies with a cool 15 billion, has spent the last 10 years giving grants to people in the pure sciences. He's a true academic at heart [...] Anyways, he's very fond of academics and gives MacArthur-esque grants, especially to people who want to change the way mathematics is taught. Approach his fund. I'm 100% sure he'll give you a grant on the spot.

The National Science Foundation

Last month the http://sagemath.org website had 45,114 monthly active users. However, as far as I know, there is no NSF funding for Sage in the United States right now, and development is mostly done on a shoestring in spare time. We have recently failed to get several NSF grants for Sage, despite there being Sage-related grants in the past from NSF. I know that funding is random, and I will keep trying. I have two proposals for Sage funding submitted to NSF right now.

Several million dollars per year

I was incredibly excited in 2012 when David Eisenbud invited me to a meeting at the Simons Foundation headquarters in New York City with the following official description of their goals:
The purpose of this round table is to investigate what sorts of support would facilitate the development, deployment and maintenance of open-source software used for fundamental research in mathematics, statistics and theoretical physics. We hope that this group will consider what support is currently available, and whether there are projects that the Simons Foundation could undertake that would add significantly to the usefulness of computational tools for basic research. Modes of support that duplicate or marginally improve on support that is already available through the universities or the federal government will not be of interest to the foundation. Questions of software that is primarily educational in nature may be useful as a comparison, but are not of primary interest.  The scale of foundation support will depend upon what is needed and on the potential scientific benefit, but could be substantial, perhaps up to several million dollars per year.
Current modes of funding for research software in mathematics, statistics and physics differ very significantly. There may be correspondingly great differences in what the foundation might accomplish in these areas. We hope that the round table members will be able to help the foundation understand the current landscape  (what are the needs, what is available, whether it is useful, how it is supported) both in general and across the different disciplines, and will help us think creatively about new possibilities.
I flew across country to this the meeting, where we spent the day discussing ways in which "several million dollars per year" could revolutionize "the development, deployment and maintenance of open-source software used for fundamental research in mathematics...".

In the afternoon Jim Simons arrived, and shook our hands. He then lectured us with some anecdotes, didn't listen to what we had to say, and didn't seem to understand open source software. I was frustrated watching how he treated the other participants, so I didn't say a word to him. I feel bad for failing to express myself.

The Decision

In the backroom during a coffee break, David Eisenbud told me that it had already been decided that they were going to just fund Magma by making it freely available to all academics in North America. WTF? I explained to David that Magma is closed source and that not only does funding Magma not help open source software like Sage, it actively hurts it. A huge motivation for people to contribute to Sage is that they do not have access to Magma (which was very expensive).

I wandered out of that meeting in a daze; things had gone so differently than I had expected. How could a goal to "facilitate the development, deployment and maintenance of open-source software... perhaps up to several million dollars per year" result in a decision that would make things possibly much worse for open source software?

That day I started thinking about creating what would become SageMathCloud. The engineering work needed to make Sage accessible to a wider audience wasn't going to happen without substantial funding (I had put years of my life into this problem but it's really hard, and I couldn't do it by myself). At least I could try to make it so people don't have to install Sage (which is very difficult). I also hoped a commercial entity could provide a more sustainable source of funding for open source mathematics software. Three years later, the net result of me starting SageMathCloud and spending almost every waking moment on it is that I've gone from having many grants to not, and SageMathCloud itself is losing money. But I remain cautiously optimistic and forge on...

We will not fund Sage

Prompted by numerous messages recently from people, I wrote to David Eisenbud this week. He suggested I write to Yuri Schinkel, who is the current director of the Simons Foundation:
Dear William,
Before I joined the foundation, there was a meeting conducted by David Eisenbud to discuss possible projects in this area, including Sage.
After that meeting it was decided that the foundation would support Magma.
Please keep me in the loop regarding developments at Sage, but I regret that we will not fund Sage at this time.
Best regards, Yuri
The Simons Foundation, the NSF, or any other foundation does not owe the Sage project anything. Sage is used by a lot of people for free, who together have their research and teaching supported by hundreds of millions of dollars in NSF grants. Meanwhile the Sage project barely hobbles along. I meet people who have fantastic development or documentations projects for Sage that they can't do because they are far too busy with their fulltime teaching jobs. More funding would have a massive impact. It's only fair that the US mathematical community is at least aware of a missed opportunity.
Funding in Europe for open source math software is much better.

Hacker News discussion

Monday, August 31, 2015

React, Flux, RethinkDB and SageMathCloud -- Summer 2015 update

I've been using databases and doing web development for over 20 years, and I've never really loved any database before and definitely didn't love any web development frameworks either. That all changed for me this summer...

SageMathCloud

SageMathCloud is a web application in which you collaboratively use Python, LaTeX, Markdown, Sage worksheets (sophisticated mathematics), task lists, R, Jupyter Notebooks, manage courses, write C programs, make chatrooms, and more. It is hosted on Google Compute Engine, but is also entirely open source and there is a pre-made Virtual Machine that you can download. A project in SMC is a Linux account, with resources constrained using cgroups and quotas. Many SMC users can collaborate on the same project, and have equal privileges in that project. Interaction with all file types (including Jupyter notebooks, task lists and course managements) is synchronized in realtime, like Google docs. There is also a global notifications feed that shows all editing activity on all files in all projects on which the user collaborates, which is a sort of highly technical version of Facebook's feed.

Rewrite motivation

I originally wrote the SageMathCloud frontend using progressive-refinement jQuery (no third-party framework beyond that) and the Cassandra database. These were reasonable choices when I started. There are much better approaches now, which are critical to dramatically improving the user experience with SMC, and also growing the developer base. So far SMC has had no nontrivial outside contributions, probably due to the difficulty of understanding the code. In fact, I think nobody besides me has ever even installed SMC, despite these install notes.

We (me, Jon Lee, Nicholas Ruhland) are currently completely rewriting the entire frontend of SMC using React.js, Flux, and RethinkDB. We started this rewrite in June 2015, with Jon being supported by Google Summer of Code (2015), Nich being supported some by NSF grants from Randy Leveque and Rekha Thomas, and with me being unemployed.

Terrible funding situation

I'm living on credit cards -- I have no NSF grant support anymore, and SageMathCloud is still losing a lot of money every month, and I'm unhappy about this situation. It was either completely quit working on SMC and instead teach or consult a lot, or lose tens of thousands of dollars. I am doing the latter right now. I was very caught off guard, since this is my first summer ever to not have NSF support since I got my Ph.D. in 2000, and I didn't expect to have my grant proposals all denied (which happened in June). There is some modest Angel investment in SageMath, Inc., but I can't bring myself to burn through that money on salary, since it would run out quickly, and I don't want to have to shut down the site due to not being able to pay the hosting bill. I've failed to get any significant free hosting, due to already getting free hosting in the past, and SageMath, Inc. not being in any incubators. For example, we tried very hard to get hosting from Google, but they flatly refused for these two reasons (they gave $60K in hosting to UW/Sage project in 2012). I'm clearly having trouble transitioning from an academic to an industry funding model. But if there are enough paying customers by January 2016, things will turn around.

Jon, Nich, and I have been working on this rewrite for three months, and hope to finish it by the end of September, when Jon and Nich will become busy with classes again. However, it seems unlikely we'll be able to finish at the current rate. Fortunately, I don't start teaching fulltime again until January, and we put a lot of work into doing a release in mid-August that fully uses RethinkDB and partly uses React.js, so that we can finish the second stage of the rewrite iteratively, without any major technical surprises.

RethinkDB

Cassandra is an excellent database for many applications, but it is not the right database for SMC and I'm making no further use of Cassandra. SMC is a realtime application that does a lot more reading than writing to the database, and SMC greatly benefits from realtime push updates from the database. I've tried quite hard in the past to build an appropriate architecture for SMC on top of Cassandra, but it is the wrong tool for the job. RethinkDB scales up linearly (with sharding and replication), and has high availability and automatic failover as of version 2.1.2. See https://github.com/rethinkdb/rethinkdb/issues/4678 for my painful path to ensuring RethinkDB actually works for me (the RethinkDB developers are incredibly helpful!).

React.js

I learned about React.js first from some "random podcast", then got more interested in it when Chris Swenson gave a demo at a Sage Days workshop in San Diego in May 2015. React (+Flux) is a web development framework that actually has solid ideas behind it, backed by an implementation that has been optimized and tested by a highly nontrivial real world application: namely the Facebook website. Even if I were to have the idea of React, implementing in a way that is actually usable would be difficult. The key idea of React.js is that -- surprisingly -- it is possible to write efficient client-side code that describes how to render the application purely as a function of its state.

React is different than jQuery. With jQuery, you write lots of code explaining how to transform the user interface of your application from one complicated state (that you might never have anticipated happening) to another complicated state. When using React.js you don't write code about how your application's visible state changes -- instead you write code to answer the question: "given this state, what should the application look like". For me, it's a game changer. This is like what one does when writing video games; the innovation is that some people at Facebook figured out how to practically program this way in a client side web browser application, then tuned their implementation based on huge amounts of real world data (Facebook has users). Oh, and they open sourced the result and ran several conferences explaining React.

React.js reminds me of when Andrew Wiles proved Fermat's Last Theorem in the mid 1990s. Wiles (and Ken Ribet) had genuine new ideas, which dramatically reshaped the landscape of number theory. The best number theorists quickly realized this and adopted to the new world, pushing the envelope of Wiles work far beyond what I expected could happen. Other people pretended like Wiles didn't exist and continued studying Fibonnaci numbers. I browsed the web development section of Barnes and Noble last night and there were dozens of books on jQuery and zero on React.js. I feel for anybody who tries to learn client-side web development by reading books at Barnes and Noble.

IPython/Jupyter and PhosphorJS

I recently met with Fernando Perez, who founded IPython/Jupyter. He seemed to tell me that currently 9 people are working fulltime on rewriting the Jupyter web notebook using the PhosphorJS framework. I tried to understand PhosphorJS based on the github page, but couldn't, except to deduce that it is mostly the work of one person from Bloomberg/Continuum Analytics. Fernando told me that they chose PhosphorJS since it very fast, and that their main motivation is to (1) make Jupyter better use their huge high-resolution monitors on their new institute at Berkeley, and (2) make it easier for developers like me to integrate/extend Jupyter into their applications. I don't understand (2), because PhosphorJS is perhaps the least popular web framework I've ever heard of (is it a web framework -- I can't tell?). I pushed Fernando to explain why they made that design choice, but didn't really understand the answer, except that they had spent a lot of time investigating alternatives (like React first). I'm intimidated by their resources and concerned that I'm making the wrong choice; however, I just can't understand why they have made what seems to me to be the wrong choice. I hope to understand more at the joint Sage/Jupyter Days 70 that we are organizing together in Berkeley, CA in November. (Edit: see https://github.com/ipython/ipython/issues/8239 for a discussion of why IPython/Jupyter uses PhosphorJS.)

Tables and RethinkDB

Our rewrite of SMC is built on Tables, Flux and React. Tables are client-side technology I wrote inspired by Facebook's GraphQL/Relay technology (and Meteor, Firebase, etc.); they synchronize data between clients and the backend database in realtime. Tables are defined by a JSON schema file, which specifies the fields in the table, and explains what get and set queries are allowed. A table is a subset of a much larger table in the database, with the subset defined by conditions that are relative to the user making the query. For example, the projects table has one entry for each project that the user is a collaborator on.

Tables are automatically synchronized between the user and the database whenever the database changes, using RethinkDB changefeeds. RethinkDB's innovation is to build realtime updates -- triggered when the result of a query to the database changes -- directly into the database at the lowest level. Of course it is possible to build something that looks the same from the outside using either a message queue (say using RabbitMQ or ZeroMQ), or by watching the replication stream from the database and triggering actions based on that (like Meteor does using MongoDB). RethinkDB's approach seems better to me, putting the abstraction at the right level. That said, based on mailing list traffic, searches, etc., it seems that very, very few people get RethinkDB yet. Also, despite years of development, RethinkDB only became "production ready" a few months ago, and only got automatic failover a few weeks ago. That said, after ironing out some kinks, I'm now using it with heavy traffic in production and it works very well.

Flux

Once data is automatically synchronized between the database and web browsers in realtime, we can build everything else on top of this. Facebook also introduced an architecture pattern that they call Flux, which works well with React. It's very different than MVC-style two-way binding frameworks, where objects are directly linked to UI elements, with an object changing causing the UI element to change and vice versa. In SMC each major part of the system has two objects associated to it: Actions and Stores. We think of them in terms of the classical CQRS pattern -- command query responsibility segregation. Actions are commands -- they are Javascript "functions" that get stuff done, but they do not return values; instead, they impact the state of the store. The store has functions that allow one to query for the state of the store, but they do not change the state of the store. The store functions must only be functions of the internal state of the store and nothing else. They might cache their results and format their output to be very convenient for rendering. But that's it.

Actions usually cause the corresponding store (or stores) to change. When a store changes, it emit a change event, which causes any React components that depend on the store to be updated, which in many cases means they are re-rendered. There are optimizations one can introduce to reduce the amount of re-rendering, which if one isn't careful leads to subtle bugs; pretty much the only subtle React UI bugs one hits are caused by such optimizations. When the UI re-renders, the user sees their view of the world change. The user then clicks buttons, types, etc., which triggers actions, which in turn update stores (and tables, hence propogating changes to the ultimate source of truth, which is the RethinkDB database). As stores update, the UI again updates, etc.

Status

So far, we have completely (re-)written the project listing, file manager, help/status page, new file page, project log, file finder, project settings, course management system, account settings, billing, project upgrade system, and file use notifications using React, Flux, and Tables, and the result works well. Bugs are much easier to fix, and it is easy (possible?) to understand the state of the system, since it is defined by the state of the database and the corresponding client-side stores. We've completely rethought everything about the UI in doing the rewrite of the above components, and it has taken several months. Also, as mentioned above, I completely rewrote most of the backend to use RethinkDB instead of Cassandra. There were also the weeks of misery for me after we made the switch over. Even after weeks of thinking/testing/wondering "what could go wrong?", we found out all kinds of surprising little things within hours of pushing everything into production, which took more than a week of sleep deprived days to sort out.

What's left? We have to rewrite the file editor tabs system, the project tabs system, and all the applications (except course management): editing text files using Codemirror, task lists (which are suprisingly complicated!), color xterm terminals, Jupyter notebooks (which will still use an iframe for the notebook itself), Sage worksheets (with complicated html output embedded in codemirror), compressed file de-archiver, the LaTeX editor, the wiki and markdown editors, and file chat. We hope to find a clean way to abstract away the various SMC applications as plugins, so that other people can easily write their own applications/plugins that will run inside of SMC. There will be a rich collection of example plugins to build on, namely the ones listed above, which are all driven by critical-to-us real world applications.

Discussion about this blog post on Hacker News.

Wednesday, May 27, 2015

Guiding principles for SageMath, Inc.

In February of this year (2015), I founded a Delaware C Corporation called "SageMath, Inc.".  This is a first stab at the guiding principles for the company.    It should help clarify the relationship between the company, the Sage project, and other projects like OpenDreamKit and Jupyter/IPython.

Company mission statement:

Make open source mathematical software ubiquitous.
This involves both creating the SageMathCloud website and supporting the development and distribution of the SageMath and other software, including Jupyter, Octave, Scilab, etc. Anything open source.

Company principles:

  • Absolutely all company funded software must be open source, under a GPLv3 compatible license. We are a 100% open source company.
  • Company independence and self-determination is far more important than money. A core principle is that SMI is not for sale at any price, and will not participate in any partnership (for cost) that would restrict our freedom. This means:
    • reject any offers from corp development from big companies to purchase or partner,
    • do not take any investment money unless absolutely necessary, and then only from the highest quality investors
    • do not take venture capital ever
  • Be as open as possible about everything involving the company. What should not be open (since it is dangerous):
    • security issues, passwords
    • finances (which could attract trolls)
    • private user data
What should be open:
  • aggregate usage data, e.g., number of users.
  • aggregate data that could help other open source projects improve their development, e.g., common problems we observe with Jupyter notebooks should be provided to their team.
  • guiding principles

Business model

  • SageMathCloud is freemium with the expectation that 2-5% of users pay.
  • Target audience: all potential users of cloud-based math-related software.

SageMathCloud mission

Make it as easy as possible to use open source mathematical software in the cloud.
This means:
  • Minimize onboard friction, so in less than 1 minute, you can create an account and be using Sage or Jupyter or LaTeX. Morever, the UI should be simple and streamlined specifically for the tasks, while still having deep functionality to support expert users. Also, everything persists and can be sorted, searched, used later, etc.
  • Minimize support friction, so one click from within SMC leads to a support forum, an easy way for admins to directly help, etc. This is not at all implemented yet. Also, a support marketplace where experts get paid to help non-experts (tutoring, etc.).
  • Minimize teaching friction, so everything involving software related to teaching a course is as easy as possible, including managing a list of students, distributing and collecting homework, and automated grading and feedback.
  • Minimize pay friction, sign up for a $7 monthly membership, then simple clear pay-as-you-go functionality if you need more power.