Sage: Open Source Mathematics Software

Major Price Cuts: Deepnote Versus Cocalc --- Compute Server Pricing

noreply@blogger.com (William Stein) — Sun, 03 Dec 2023 17:29:00 +0000

Major Price Cuts: Deepnote Versus Cocalc

Deepnote is one of CoCalc's direct competitors. Today (November 30, 2023) they announced a major price cut on their pay-as-you-go rates:

"As you may have already heard, starting December 1, we're slashing the pay-as-you-go rates across all our machines – making them more budget-friendly without any hidden terms."

At CoCalc, we recently finally launched pay as you go machines, which was one of our main development priorities for 2023. These are fully integrated with CoCalc, and were a huge amount of work to bring to market. I was terrified that Deepnote's major price cuts would make Deepnote a much better deal than CoCalc.

Here is how the Deepnote and CoCalc pricing compares:

	Deepnote's New Price	CoCalc Standard	CoCalc Spot
64GB RAM, 16vCPU	$1.54	$0.59	$0.12
128GB RAM, 16vCPU (32 CPU on cocalc)	$2.02	$1.17	$0.23
K80 GPU (newer L4 GPU on cocalc)	$2.02	$0.93	$0.30

Conclusion: CoCalc's prices are still highly competitive, even in light of Deepnote's major price cuts.

Also, spot instances do work very well for many applications. For more details and how to get these prices at https://cocalc.com, read the rest of this post.

CAVEAT: comparing RAM and vCPU is not necessarily easy. Maybe I'm completely wrong.

More Details

I don't know exactly what Deepnote means by the above machine specs. However, according to my benchmarks, one of the very best machines we offer via Google Cloud is the AMD EPYC Milan family. Their single core performance is excellent, and a vCPU is equivalent to an entire core, which makes them up to twice as fast as lot of "vCPU" options out there. We offer both spot instances and standard instances.

Performance: 16 vCPU and 64GB RAM

Our best pricing on an AMD EPYC with 64GB RAM and 16 cores is $0.59/hour for standard instances.

By selecting a region in Europe, the cost is only $0.12/hour for a spot instance. Spot instances may stop or not be available, but our stats so far show they often work well for days to weeks, perhaps because Google has built out such massive CPU capacity:

In CoCalc the region where the machine is located is transparent, so you can take advantage of the best prices in the world.

High Memory: 16 vCPU and 32GB RAM

Our analogue of "High memory" above is a t2d-standard-32 with 32 cores, 128B of RAM, and it costs $1.17/hour for a standard instance, or $0.23/hour for a spot instance.

Again, the best price on spot instances is in a different region than for standard:

GPU

Deep note offers a K80 GPU for $1.80/hour. We do not offer K80's on CoCalc since they are so old, but we have L4's that have the same 24GB of RAM and are a much newer architecture. Our GPU price is $0.93/hour for standard instances, and $0.30/hour for spot instances:

Conclusion: CoCalc's new prices are still competitive. Yeah.

Happy Holidays! 🎄

Should I Resign from My Full Professor Job to Work Fulltime on Cocalc?

noreply@blogger.com (William Stein) — Thu, 09 May 2019 21:21:00 +0000

Nearly 3 years ago, I gave a talk at a Harvard mathematics conference announcing that “I am leaving academia to build a company”. What I really did is go on unpaid leave for three years from my tenured Full Professor position. No further extensions of that leave is possible, so I finally have to decide whether or not to go back to academia or resign.

How did I get here?

Nearly two decades ago, as a recently minted Berkeley math Ph.D., I was hired as a non-tenure-track faculty member in the mathematics department at Harvard. I spent five years at Harvard, then I applied for jobs, and accepted a tenured Associate Professor position in the mathematics department at UC San Diego. The mathematics community was very supportive of my number theory research; I skipped tenure track, and landed a tier-1 tenured position by the time I was 30 years old. In 2006, I moved from UCSD to a tenured Associate Professor position at the University of Washington (UW) mathematics department, primarily because my wife was a graduate student there, UW has strong research in number theory and algebraic geometry, and they have a good culture supporting undergraduate research.

Before I left Harvard, I started the SageMath open source software project, initially with the longterm goal of creating a free open source viable alternative to Mathematica, Maple, Matlab and Magma. As a result, in addition to publishing dozens of research mathematics papers and some books, I also started spending a lot of my time writing software, and organizing Sage Days workshops.

Recruiting at UW Mathematics

At UW, I recruited an amazing team of undergraduates and grad students who had a major impact on the development of Sage. I was blown away by the quality of the students (both undergrad and grad) that I was able to get involved in Sage development. I fully expected that in the next few years I would have the resources to hire some of these students to work fulltime on Sage. They had written the first versions of much of the core functionality of Sage (e.g., graph theory, symbolic calculus, matrices, and much more).

I was surprised when my application for Full Professor at UW was delayed for one year because – I was told – I wasn’t publishing enough research papers. This was because I was working very hard on building Sage, which was going extremely well at the time. I took the feedback seriously, and put more time into traditional research and publishing; this was the first time in my life that I did research mathematics for reasons other than just because I loved doing it.

I tried very hard to hire Bill Hart as a tenure-track faculty member at UW. However, I was told that his publication count was “a bit light”, and I did not succeed at hiring him. If you printed out the source code of software he has written, it would be a tall stack of paper. In any case, I totally failed at the politics needed to make his case and was left dispirited, realizing my personal shortcomings at department politics meant I probably could not hire the sort of colleagues I desperately needed.

UW was also very supportive of me teaching an undergrad course on open source math software (it evolved into this). I taught a similar course at the graduate level once, and it went extremely well, and was in my mind the best course I ever taught at UW. I was extremely surprised when my application to teach that grad course again was denied, and I was told that grad students should just go to my undergraduate course. I thought, “this is really strange”, instead of lobbying to teach the course and better presenting my case.

To be clear, I do not mean to criticize the mathematics department. The UW math department has thought very hard and systematically about their priorities and how they fit into UW. They are a traditional pure mathematics departments that is generally ranked around 25 in the country, with a particular set of strengths. There is a separate applied math department on campus, several stats departments, and a massive School of Computer Science. Maybe I was in the wrong place to try to hire somebody whose main qualification is being world class at writing mathematical software. This blog post is about the question of whether the UW math department is the right place for me or not.

Outside Grant Support?

My number theory research received incredible support from the NSF, with me being the PI on six NSF grants. Also, Magma (which is similar to Sage, but closed source) had managed to find sufficient government funding, so I remained optimistic. Maybe I could fund people to build Sage via grants, and even start an institute! I applied for grants to support work on SageMath at a larger scale, and had some initial success (half of a postdoc, and some workshops, etc.).

Why is grant funding so important for Sage? The goal of the SageMath project is to create free open source software that is a viable alternative to Mathematica, Maple, Matlab, and Magma – software produced by companies with a combined thousands of fulltime employees. Though initial progress was encouraging, it was clear that I desperately needed significant money to genuinely compete. For example, one Sage developer had a fantastic Sage development project he wanted about 20K to work fulltime on during a summer, and I could not find the money; as a result he quit working on Sage. This project involved implementing some deep algorithms that are needed to more directly compete with Mathematica for solving symbolic inequalities. This sort of thing happened over and over again, and it began to really frustrate me. I could get plenty of funding for 1-week workshops (just travel expenses – everybody works for free), but there’s only so much you can do at such sprints.

I kept hearing that there would be a big one-in-10-years NSF institutes competition sometime in the “next year or two”. People hinted to me that this would be a good thing to watch out for, and I dreamed that I could found such an institute, with the mission to make it so the mathematics community finally owned the deep software on which teaching and research are based. This institute would bring the same openness and robustness to computational mathematics that rigorous proof had brought to mathematics itself a century earlier.

Alas, this did not happen. I remember the moment I found out about the actual NSF institutes competition. Joe Silverman was standing behind me at a coffee break at The Arizona Winter School 2010 telling people about how his proposal for ICERM had just won the NSF institutes competition. I spun around and congratulated him as I listened to how much work it was to put together the application during the last year; internally, my heart sunk. Not only did I not win, I didn’t even know the competition had happened! I guess I was too busy working on Sage. In any case, my fantasy of creating an NSF-funded institute died at that moment. Of course, ICERM has turned out to be a fantastic institute, and it has hosted several workshops that support the development of open source math software.

Around this time, I also started having my grant proposals denied for reasons I do not understand. This was confusing to me, after having received so many NSF grants before. In 2012, the Simons Foundation put out a call for something that potentially addressed what I had hoped to accomplish via an NSF-funded institute. I was very excited again, but that did not turn out as I had hoped. So next I tried something I never thought I would ever do in a million years…

Commercialization at UW

For various reasons, I failed to get the NSF or other foundations to fund Sage at the level I needed, so in 2013, I decided to try to sell a commercial product, and use the profits to fund Sage development. I first tried to do this at University of Washington, by working with the commercialization office (C4C) to sell access to Sage online. As long as the business and product were merely abstract ideas (e.g., let’s make up a name and trademark it! let’s write some terms of service!) things went fine. However, when things became much more concrete, working with C4C got strange and frustrating for me. I was clearly missing something.

For example, the first thing C4C told me on the very first day we sat down together was they would not work with me if I made the software I wrote for this open source, and that the university would own the software. Given there was no software at all yet, and I imagined I would just whip out a quick modern web-based frontend to Sage and make boatloads of money that would go straight into a UW account to be used to fund Sage, this seemed fine to me. However, I had a nagging feeling that a pure closed-source approach to this problem was impossible, and not having that flexibility would come back to haunt me.

Naively optimistic, I found myself working fulltime at UW and at the same time trying to get a sophisticated web application off the ground by myself, with many important early users depending on it for their classes. This was stressful and took an enormous amount of time and attention. I felt like I was just part of the software, often getting warnings that things were broken or breaking, and manually fixing them. The toil was high, and only got worse as more people used the software. I would get woken up all night. I couldn’t travel since things were constantly breaking.

Every time I fought through some really difficult problem with the web application instead of just giving up, I came out far more determined not to quit.

The web application described above evolved over 6 years into what is now https://CoCalc.com; the functionality was pretty similar from day 1, but quality and scalability have come a long ways. CoCalc lets you collaboratively use LaTeX, Sage, Terminals, Jupyter Notebooks, etc., for teaching and research.

In 2014, I went on sabbatical and worked fulltime developing this web application and the feedback loop I described above only grew more intense: fix things, fight through difficult problems, be even more determined not to give up. Fortunately, I had some leftover NSF grant funds, and was able to use them to hire several students to help with development. I failed to find students who I could hire to do the backend work (and be available any time day or night), which meant that much of the stress of keeping the site running continued to fall squarely on my shoulders. And as the site grew in popularity (and functionality), the stress from it got worse.

My Sabbatical ended, and I was required to return to UW fulltime for one year, or return all the money I was paid during my sabbatical. So far, CoCalc had grown in popularity, but I had not been allowed by the “commercialization office” to actually commercialize it, so it was still a free site.

I taught at UW at the same time as being the main person trying to run this very complicated and painful production web application. Based on user feedback, I was also highly motivated to improve CoCalc. I would typically sleep a few hours, get up at 3am and write code until 8am, then prepare to teach, hope not to have any site issues right before class, and so on. One day CoCalc got hit by a massive DDoS attack minutes before a class I was teaching, while I was talking with a prospective donor to the math department.

I am the sort of person who does well focusing on exactly one thing at a time. Given the chance to fully focus on one thing for extended periods of time, I sometimes even do things that really matter and have an impact. I am not great at doing many different things at once.

In the meantime, Sage itself was growing and receiving funding, though this had nothing to do with me. For example, Gregg Musiker was putting together a big program at IMA, in the form of a ton of Sage Days workshops. Also, the huge ODK project, which was a European Union grant proposal to support open source math software would be fully funded. And closer to home, Moore and Sloane funded a major new initiative that could potentially have also supported work on Sage. I was invited to go to workshops and events involving these and other grants, but often I either said no or canceled at the last minute due to the toil needed just to keep CoCalc running. Also, I believed if I could start charging customers, then I would have a lot more money, and could hire more help.

I met with more senior people at UW’s C4C to finally actually charge people to use CoCalc. They wanted me to do some integration with their license management system, and sell “express” software licenses. It didn’t make any sense to me, and we went around in circles. I then asked about actually starting a separate company (a legal entity) that the university would have some ownership in, so that the company could take payments, etc. This is when things got really weird. They would not talk with me about creating the company due to “conflict of interest”.

I searched for other UW faculty that had commercialized remotely similar products, and found one. He told me how it went, and said it was the worst experience of his life. UW owned 50% of the company, and all of the software of the company, which they licensed under onerous terms. They refused to negotiate anything with him, instead requiring his spinoff company to hire an outside negotiator. As a result of all this, I educated myself as much as possible about relevant rules and laws, and consulted with a lawyer.

It turns out that the NSF grants I used to fund work on CoCalc explicitly stipulated that code funded by those grants had to be GPL licensed. This meant all the code for CoCalc had to be open sourced. Later the university even agreed in writing to release a snapshot of all the CoCalc code under the BSD license, and I haven’t been paid a penny by UW since the date of that release, so there is no possible claim that the company can’t use the code.

Building a company

A colleague of mine from when I was at Harvard was in town for a day, and we met for coffee. He expected we would talk about Sage and number theory, but instead I told him about CoCalc and my attempts at commercialization and starting a company. He immediately suggested a solution to my problems, which was to talk with a friend of his who had both extensive experience working with companies and deep connections with mathematics. I was confident that in the worst case I could quit my job at UW and rewrite all the software from scratch, so I took him up on the offer.

In 2015 I formed a corporation, and received some outside investment, and used that (and dramatically cutting my already-small academic income) to “leave academia”. More precisely, in 2016 (after working fulltime for a year at UW), I finally went on 100% unpaid leave from UW in order to completely focus on CoCalc development and getting a business off the ground. Also, there was no good reason to quit a tenured Full Professor job when you can go on leave; also CoCalc supports teaching in math departments, so it is closely related to my academic job. The only academic responsibilities I had were to my two Ph.D. students, who I meet with one-on-one at least once a week. At the end of two years, I requested a third year of unpaid leave, which UW granted (this is not routine). Throughout all this, the UW mathematics department was very supportive.

During these three years on unpaid leave, I’ve hired three other people who work fulltime on CoCalc. Together we have massively improved the software, and built a business with thousands of paying customers. The company is still not profitable, though the future is clearly very bright if we continue what we are currently doing. CoCalc has become a platform that an increasing number of scalable products (such as this) are being built with, and there is enormous growth potential in the next year.

At this point, it rightfully appears to the community that I have left SageMath development to focus fulltime on building CoCalc as an independent business. Indeed, I do not spend any significant time contributing to Sage, and I even switched to getting daily digests of the sage-devel mailing list.

On the other hand, as mentioned above, CoCalc is going well by many metrics (in terms of quality, feature development, customer love, market position, etc.). Most importantly, me and the other three people who work fulltime on CoCalc really, really love this job, and the potential to have a significant impact. I still don’t know if CoCalc will ever be wildly profitable and massively fund Sage development. If I were to obsess over only that goal, I would have to quit working on CoCalc (since it is taking way too long) and pursue other opportunities for funding Sage.

In retrospect, my idea from 7 years ago to start a web-based software company from scratch and build it into a successful profitable business has so far completely failed to fund Sage.

It would be far easier to work fulltime writing grants to foundations, and leveraging the acknowledged success of Sage so far. I made the wrong move, given my original goal. The surprise is that I really enjoy what I’m doing right now!

My unpaid leave is up – what am I going to do?

My third year of unpaid leave from UW is up. I have to decide whether to return to UW or resign. If I return, it turns out that I would have to have at least a 50% appointment. I currently have 50% of one year of teaching in “credits”, which means I wouldn’t be required to teach for the first year I go back as a 50% appointment. Moreover, the current department chair (John Palmieri) understands and appreciates Sage – he is among the top 10 all time contributors to the source code of Sage!

I have decided to resign. I’m worried about issues of intellectual property; it would be extremely unfair to my employees, investors and customers if I took a 50% UW position, and then later got sued by UW as a result. Having a 50% paid appointment at UW subjects one to a lot of legal jeopardy, which is precisely why I have been on 100% unpaid leave for the last three years. But more importantly, I feel very good about continuing to focus 100% on the development of CoCalc, which is going to have an incredible year going forward. I genuinely love building this (non-VC funded) company, and feel very good about it.

Low latency local CoCalc and SageMath on the Google Pixelbook: playing with Crouton, Gallium OS, Rkt, Docker

noreply@blogger.com (William Stein) — Tue, 02 Jan 2018 06:29:00 +0000

I just got CoCalc fully working locally on my Google Pixelbook Chromebook! I want this, since (1) I was inspired by a recent blog post about computer latency, and (2) I'm going to be traveling a lot next week (the JMM in San Diego -- come see me at the Sage booth), and may have times without Internet during which I want to work on CoCalc's development.

I first tried Termux, which is a "Linux in Android" userland that runs on the Pixelbook (via Android), but there were way, way too many problems for CoCalc, which is a very complicated application, so this was out. The only option was to enable ChromeOS dev mode.

I next considered partitioning the hard drive, installing Linux natively (in addition to ChromeOS), and dual booting. However, it seems the canonical option is Gallium OS and it nobody has got that to work with Pixelbook yet (?). In fact, it appears that Gallium OS development made have stopped a year ago (?). Bummer. So I gave up on that approach...

The next option was to try Crouton + Docker, since we have a CoCalc Docker image. Unfortunately, it seems currently impossible to use Docker with the standard ChromeOS kernel. The next thing I considered was to use Crouton + Rkt, since there are blog posts claiming Rkt can run vanilla Docker containers on Crouton.

I setup Crouton, installed the cli-extra chroot, and easily installed Rkt. I learned how Rkt is different than Docker, and tried a bunch of simple standard Docker containers, which worked. However, when I tried running the (huge) CoCalc Docker container, I hit major performance issues, and things broke down. If I had the 16GB Chromebook and more patience, maybe this would have worked. But with only 8GB RAM, it really wasn't feasible.

The next idea was to just use Crouton Linux directly (so no containers), and fix whatever issues arose. I did this, and it worked well, with CoCalc giving me a very nice local browser-based interface to my Crouton environment. Also, since we've spent so much time optimizing CoCalc to be fast over the web, it feels REALLY fast when used locally. I made some changes to the CoCalc sources and added a directory, to hopefully make this easier if anybody else tries. This is definitely not a 1-click solution.

Finally, for SageMath I first tried the Ubuntu PPA, but realized it is hopelessly out of date. I then downloaded and extracted the Ubuntu 16.04 binary and it worked fine. Of course, I'm also building Sage from source (I'm the founder of SageMath after all), but that takes a long time...

Anyway, Crouton works really, really well on the Pixelbook, especially if you do not need to run Docker containers.

RethinkDB must relicense NOW

noreply@blogger.com (William Stein) — Mon, 10 Oct 2016 22:59:00 +0000

What is RethinkDB?

UPDATE: Several months after I wrote this post, RethinkDB was relicensed. For the CoCalc project, it was too late, and by then we had already switched to PostgreSQL.

RethinkDB is a INCREDIBLE high quality polished open source realtime database that is easy to deploy, shard, replicate, and supports a reactive client programming model, which is useful for collaborative web-based applications. Shockingly, the 7-year old company that created RethinkDB has just shutdown. I am the CEO of a company, SageMath, Inc., that uses RethinkDB very heavily, so I have a strong interest in RethinkDB surviving as an independent open source project.

Three Types of Open Source Projects

There are many types of open source projects. RethinkDB was the type of open source project where most work on RethinkDB has been fulltime focused work, done by employees of the RethinkDB company. RethinkDB is licensed under the AGPL, but the company promised to make the software available to customers under other licenses.

Academia: I started the SageMath open source math software project in 2005, which has over 500 contributors, and a relatively healthy volunteer ecosystem, with about hundred contributors to each release, and many releases each year. These are mostly volunteer contributions by academics: usually grad students, postdocs, and math professors. They contribute because SageMath is directly relevant to their research, and they often contribute state of the art code that implements algorithms they have created or refined as part of their research. Sage is licensed under the GPL, and that license has worked extremely well for us. Academics sometimes even get significant grants from the NSF or the EU to support Sage development.

Companies: I also started the Cython compiler project in 2007, which has had dozens of contributors and is now the defacto standard for writing or wrapping fast code for use by Python. The developers of Cython mostly work at companies (e.g., Google) as a side project in their spare time. (Here's a message today about a new release from a Cython developer, who works at Google.) Cython is licensed under the Apache License.

What RethinkDB Will Become

RethinkDB will no longer be an open source project whose development is sponsored by a single company dedicated to the project. Will it be an academic project, a company-supported project, or dead?

A friend of mine at Oxford University surveyed his academic CS colleagues about RethinkDB, and they said they had zero interest in it. Indeed, from an academic research point of view, I agree that there is nothing interesting about RethinkDB. I myself am a college professor, and understand these people! Academic volunteer open source contributors are definitely not going to come to RethinkDB's rescue. The value in RethinkDB is not in the innovative new algorithms or ideas, but in the high quality carefully debugged implementations of standard algorithms (largely the work of bad ass German programmer Daniel Mewes). The RethinkDB devs had to carefully tune each parameter in those algorithms based on extensive automated testing, user feedback, the Jepsen tests, etc.

That leaves companies. Whether or not you like or agree with this, many companies will not touch AGPL licensed code:

"Google open source guru Chris DiBona says that the web giant continues to ban the lightning-rod AGPL open source license within the company because doing so "saves engineering time" and because most AGPL projects are of no use to the company."

This is just the way it is -- it's psychology and culture, so deal with it. In contrast, companies very frequently embrace open source code that is licensed under the Apache or BSD licenses, and they keep such projects alive. The extremely popular PostgreSQL database is licensed under an almost-BSD license. MySQL is freely licensed under the GPL, but there are good reasons why people buy a commercial MySQL license (from Oracle) for MySQL. Like RethinkDB, MongoDB is AGPL licensed, but they are happy to sell a different license to companies.

With RethinkDB today, the only option is AGPL. This very strongly discourage use by the only possible group of users and developers that have any chance to keep RethinkDB from death. If this situation is not resolved as soon as possible, I am extremely afraid that it never will be resolved. Ever. If you care about RethinkDB, you should be afraid too. Ignoring the landscape and culture of volunteer open source projects is dangerous.

A Proposal

I don't know who can make the decision to relicense RethinkDB. I don't kow what is going on with investors or who is in control. I am an outsider. Here is a proposal that might provide a way out today:

PROPOSAL: Dear RethinkDB, sell me an Apache (or BSD) license to the RethinkDB source code. Make this the last thing your company sells before it shuts down. Just do it.

Hacker News Discussion

RethinkDB, SageMath, Andreessen-Horowitz, Basecamp and Open Source Software

noreply@blogger.com (William Stein) — Fri, 07 Oct 2016 20:12:00 +0000

RethinkDB and sustainable business models

Three weeks ago, I spent the evening of Sept 12, 2016 with Daniel Mewes, who is the lead engineer of RethinkDB (an open source database). I was also supposed to meet with the co-founders, Slava and Michael, but they were too busy fundraising and couldn't join us. I pestered Daniel the whole evening about what RethinkDB's business model actually was. Yesterday, on October 6, 2016, RethinkDB shut down.

I met with some RethinkDB devs because an investor who runs a fund at the VC firm Andreessen-Horowitz (A16Z) had kindly invited me there to explain my commercialization plans for SageMath, Inc., and RethinkDB is one of the companies that A16Z has invested in. At first, I wasn't going to take the meeting with A16Z, since I have never met with Venture Capitalists before, and do not intend to raise VC. However, some of my advisors convinced me that VC's can be very helpful even if you never intend to take their investment, so I accepted the meeting.

In the first draft of my slides for my presentation to A16Z, I had a slide with the question: "Why do you fund open source companies like RethinkDB and CoreOS, which have no clear (to me) business model? Is it out of some sense of charity to support the open source software ecosystem?" After talking with people at Google and the RethinkDB devs, I removed that slide, since charity is clearly not the answer (I don't know if there is a better answer than "by accident").

I have used RethinkDB intensely for nearly two years, and I might be their biggest user in some sense. My product SageMathCloud, which provides web-based course management, Python, R, Latex, etc., uses RethinkDB for everything. For example, every single time you enter some text in a realtime synchronized document, a RethinkDB table gets an entry inserted in it. I have RethinkDB tables with nearly 100 million records. I gave a talk at a RethinkDB meetup, filed numerous bug reports, and have been described by them as "their most unlucky user". In short, in 2015 I bet big on RethinkDB, just like I bet big on Python back in 2004 when starting SageMath. And when visiting the RethinkDB devs in San Francisco (this year and also last year), I have said to them many times "I have a very strong vested interest in you guys not failing." My company SageMath, Inc. also pays RethinkDB for a support contract.

Sustainable business models were very much on my mind, because of my upcoming meeting at A16Z and the upcoming board meeting for my company. SageMath, Inc.'s business model involves making money from subscriptions to SageMathCloud (which is hosted on Google Cloud Platform); of course, there are tons of details about exactly how our business works, which we've been refining based on customer feedback. Though absolutely all of our software is open source, what we sell is convenience, easy of access and use, and we provide value by hosting hundreds of courses on shared infrastructure, so it is much cheaper and easier for universities to pay us rather than hosting our software themselves (which is also fairly easy). So that's our business model, and I would argue that it is working; at least our MRR is steadily increasing and is more than twice our hosting costs (we are not cash flow positive yet due to developer costs).

So far as I can determine, the business model of RethinkDB was to make money in the following ways: 1. Sell support contracts to companies (I bought one). 2. Sell a closed-source proprietary version of RethinkDB with extra features that were of interest to enterprise (they had a handful of such features, e.g., audit logs for queries). 3. Horizon would become a cloud-hosted competitor to Firebase, with unique advantages that users have the option to migrate from the cloud to their own private data center, and more customizability. This strategy depends on a trend for users to migrate away from the cloud, rather than to it, which some people at RethinkDB thought was a real trend (I disagree).

I don't know of anything else they were seriously trying right now. The closed-source proprietary version of RethinkDB also seemed like a very recent last ditch effort that had only just begun; perhaps it directly contradicted a desire to be a 100% open source company?

With enough users, it's easier to make certain business models work. I suspect RethinkDB does not have a lot of real users. Number of users tends to be roughly linearly related to mailing list traffic, and the RethinkDB mailing list has an order of magnitude less traffic compared to the SageMath mailing lists, and SageMath has around 50,000 users. RethinkDB wasn't even advertised to be production ready until just over a year ago, so even they were telling people not to use it seriously until relatively recently. The adoption cycle for database technology is slow -- people wisely wait for Aphyr's tests, benchmarks comparing with similar technology, etc. I was unusual in that I chose RethinkDB much earlier than most people would, since I love the design of RethinkDB so much. It's the first database I loved, having seen a lot over many decades.

Conclusion: RethinkDB wasn't a real business, and wouldn't become one without year(s) more runway.

I'm also very worried about the future of RethinkDB as an open source project. I don't know if the developers have experience growing an open source community of volunteers; it's incredibly hard and its unclear they are even going to be involved. At a bare minimum, I think they must switch to a very liberal license (Apache instead of AGPL), and make everything (e.g., automated testing code, documentation, etc) open source. It's insanely hard getting any support for open source infrastructure work -- support mostly comes from small government grants (for research software) or contributions from employees at companies (that use the software). Relicensing in a company friendly way is thus critical.

Company Incentives

Companies can be incentived in various ways, including:

to get to the next round of VC funding
to be a sustainable profitable business by making more money from customers than they spend, or
to grow to have a very large number of users and somehow pivot to making money later.

When founding a company, you have a chance to choose how your company will be incentived based on how much risk you are willing to take, the resources you have, the sort of business you are building, the current state of the market, and your model of what will happen in the future.

For me, SageMath is an open source project I started in 2004, and I'm in it for the long haul. I will make the business I'm building around SageMathCloud succeed, or I will die trying -- therefore I have very, very little tolerance for risk. Failure is not an option, and I am not looking for an exit. For me, the strategy that best matches my values is to incentive my company to build a profitable business, since that is most likely to survive, and also to give us the freedom to maintain our longterm support for open source and pure mathematics software.

Thus for my company, neither optimizing for raising the next round of VC or growing at all costs makes sense. You would be surprised how many people think I'm completely wrong for concluding this.

Andreessen-Horowitz

I spent the evening with RethinkDB developers, which scared the hell out of me regarding their business prospects. They are probably the most open source friendly VC-funded company I know of, and they had given me hope that it is possible to build a successful VC-funded tech startup around open source. I prepared for my meeting at A16Z, and deleted my slide about RethinkDB.

I arrived at A16Z, and was greeted by incredibly friendly people. I was a little shocked when I saw their nuclear bomb art in the entry room, then went to a nice little office to wait. The meeting time arrived, and we went over my slides, and I explained my business model, goals, etc. They said there was no place for A16Z to invest directly in what I was planning to do, since I was very explicit that I'm not looking for an exit, and my plan about how big I wanted the company to grow in the next 5 years wasn't sufficiently ambitious. They were also worried about how small the total market cap of Mathematica and Matlab is (only a few hundred million?!). However, they generously and repeatedly offered to introduce me to more potential angel investors.

We argued about the value of outside investment to the company I am trying to build. I had hoped to get some insight or introductions related to their portfolio companies that are of interest to my company (e.g., Udacity, GitHub), but they deflected all such questions. There was also some confusion, since I showed them slides about what I'm doing, but was quite clear that I was not asking for money, which is not what they are used to. In any case, I greatly appreciated the meeting, and it really made me think. They were crystal clear that they believed I was completely wrong to not be trying to do everything possible to raise investor money.

Basecamp

During the first year of SageMath, Inc., I was planning to raise a round of VC, and was doing everything to prepare for that. I then read some of DHH's books about Basecamp, and realized many of those arguments applied to my situation, given my values, and -- after a lot of reflection -- I changed my mind. I think Basecamp itself is mostly closed source, so they may have an advantage in building a business. SageMathCloud (and SageMath) really are 100% open source, and building a completely open source business might be harder. Our open source IP is considered worthless by investors. Witness: RethinkDB just shut down and Stripe hired just the engineers -- all the IP, customers, etc., of RethinkDB was evidently considered worthless by investors.

The day after the A16Z meeting, I met with my board, which went well (we discussed a huge range of topics over several hours). Some of the board members also tried hard to convince me that I should raise a lot more investor money.

Will Poole: you're doomed

Two weeks ago I met with Will Poole, who is a friend of a friend, and we talked about my company and plans. I described what I was doing, that everything was open source, that I was incentivizing the company around building a business rather than raising investor money. He listened and asked a lot of follow up questions, making it very clear he understands building a company very, very well.

His feedback was discouraging -- I said "So, you're saying that I'm basically doomed." He responded that I wasn't doomed, but might be able to run a small "lifestyle business" at best via my approach, but there was absolutely no way that what I was doing would have any impact or pay for my kids college tuition. If this was feedback from some random person, it might not have been so disturbing, but Will Poole joined Microsoft in 1996, where he went on to run Microsoft's multibillion dollar Windows business. Will Poole is like a retired four-star general that executed a successful campaign to conquer the world; he been around the block a few times. He tried pretty hard to convince me to make as much of SageMathCloud closed source as possible, and to try to convince my users to make content they create in SMC something that I can reuse however I want. I felt pretty shaken and convinced that I needed to close parts of SMC, e.g., the new Kubernetes-based backend that we spent all summer implementing. (Will: if you read this, though our discussion was really disturbing to me, I really appreciate it and respect you.)

My friend, who introduced me to Will Poole, introduced me to some other people and described me as that really frustrating sort of entrepreneur who doesn't want investor money. He then remarked that one of the things he learned in business school, which really surprised him, was that it is good for a company to have a lot of debt. I gave him a funny look, and he added "of course, I've never run a company".

I left that meeting with Will convinced that I would close source parts of SageMathCloud, to make things much more defensible. However, after thinking things through for several days, and talking this over with other people involved in the company, I have chosen not to close anything. This just makes our job harder. Way harder. But I'm not going to make any decisions based purely on fear. I don't care what anybody says, I do not think it is impossible to build an open source business (I think Wordpress is an example), and I do not need to raise VC.

Hacker News Discussion: https://news.ycombinator.com/item?id=12663599

Chinese version: http://www.infoq.com/cn/news/2016/10/Reflection-sustainable-profit-co

SageMath: "it's not research"

noreply@blogger.com (William Stein) — Wed, 05 Oct 2016 20:13:00 +0000

The University of Washington (UW) mathematics department has funding for grad students to "travel to conferences". What sort of travel funding?

The department has some money available.
The UW Graduate school has some money available: They only provide funding for students giving a talk or presenting a poster.
The UW GPSS has some money available: contact them directly to apply (they only provide funds for "active conference participation", which I think means giving a talk, presenting a poster, or similar)

One of my two Ph.D. students at UW asked our Grad program director: "I'll be going to Joint Mathematics Meetings (JMM) to help out at the SageMath booth. Is this a thing I can get funding for?"

ANSWER: Travel funds are primarily meant to support research, so although I appreciate people helping out at the SageMath booth, I think that's not the best use of the department's money.

I think this "it's not research" perspective on the value of mathematical software is unfortunate and shortsighted. Moreover, it's especially surprising as the person who wrote the above answer has contributed substantially to the algebraic topology functionality of Sage itself, so he knows exactly what Sage is.

Sigh. Can some blessed person with an NSF grant out there pay for this grad student's travel expenses to help with the Sage booth? Or do I have to use the handful of $10, $50, etc., donations I've got the last few months for this purpose?

Jupyter: "take the domain name down immediately"

noreply@blogger.com (William Stein) — Wed, 27 Jul 2016 01:42:00 +0000

The Jupyter notebook is an open source BSD-licensed browser-based code execution environment, inspired by my early work on the Sage Notebook (which we launched in 2007), which was in turn inspired heavily by Mathematica notebooks and Google docs. Jupyter used to be called IPython.

SageMathCloud is an open source web-based environment for using Sage worksheets, terminals, LaTeX documents, course management, and Jupyter notebooks. I've put much hard work into making it so that multiple people can simultaneously edit Jupyter notebooks in SageMathCloud, and the history of all changes are recorded and browsable via a slider.

Many people have written to me asking for there to be a modified version of SageMathCloud, which is oriented around Jupyter notebooks instead of Sage worksheets. So the default file type is Jupyter notebooks, the default kernel doesn't involve the extra heft of Sage, etc., and the domain name involves Jupyter instead of "sagemath". Some people are disuased from using SageMathCloud for Jupyter notebooks because of the "SageMath" name.

Dozens of web applications (including SageMathCloud) use the word "Jupyter" in various places. However, I was unsure about using "jupyter" in a domain name. I found this github issue and requested clarification 6 weeks ago. We've had some back and forth, but they recently made it clear that it would be at least a month until any decision would be considered, since they are too busy with other things. In the meantime, I rented jupytercloud.com, which has a nice ring to it, as the planet Jupiter has clouds. Yesterday, I made jupytercloud.com point to cloud.sagemath.com to see what it would "feel like" and Tim Clemans started experimenting with customizing the page based on the domain name that the client sees. I did not mention jupytercloud.com publicly anywhere, and there were no links to it.

Today I received this message:

    William,

    I'm writing this representing the Jupyter project leadership
    and steering council. It has recently come to the Jupyter
    Steering Council's attention that the domain jupytercloud.com
    points to SageMathCloud. Do you own that domain? If so,
    we ask that you take the domain name down immediately, as
    it uses the Jupyter name.

I of course immediately complied. It is well within their rights to dictate how their name is used, and I am obsessive about scrupulously doing everything I can to respect people's intellectual property; with Sage we have put huge amounts of effort into honoring both the letter and spirit of copyright statements on open source software.

I'm writing this because it's unclear to me what people really want, and I have no idea what to do here.

1. Do you want something built on the same technology as SageMathCloud, but much more focused on Jupyter notebooks?

2. Does the name of the site matter to you?

3. What model should the Jupyter project use for their trademark? Something like Python? like Git?Like Linux? Like Firefox? Like the email program PINE? Something else entirely?

4. Should I be worried about using Jupyter at all anywhere? E.g., in this blog post? As the default notebook for the SageMath project?

I appreciate any feedback.

Hacker News Discussion

UPDATE (Aug 12, 2016): The official decision is that I cannot use the domain jupytercloud.com. They did say I can use jupyter.sagemath.com or sagemath.com/jupyter. Needless to say, I'm disappointed, but I fully respect their (very foolish, IMHO) decision.

Elliptic curves: Magma versus Sage

noreply@blogger.com (William Stein) — Wed, 24 Feb 2016 20:07:00 +0000

Elliptic Curves

Elliptic curves are certain types of nonsingular plane cubic curves, e.g., y^2 = x^3 + ax +b, which are central to both number theory and cryptography (e.g., they are used to compute the hash in bitcoin).

Magma and Sage

If you want to do a wide range of explicit computations with elliptic curves, for research purposes, you will very likely use SageMath or Magma. If you're really serious, you'll use both.

Both Sage and Magma are far ahead of all other software (e.g., Mathematica, Maple and Matlab) for elliptic curves.

Magma reference manual about elliptic curves.
Sage reference manual about elliptic curves.
Pari reference manual about elliptic curves. -- pari is part of Sage and has some unique powerful functionality, e.g., ellheegner...

A Little History

When I started contributing to Magma in 1999, I remember that Magma was way, way behind Pari. I remember having lunch with John Cannon (founder of Magma), and telling him I would no longer have to use Pari if only Magma would have dramatically faster code for computing point counts on elliptic curves.

A few years later, John wisely hired Mark Watkins to work fulltime on Magma, and Mark has been working there for over a decade. Mark is definitely one of the top people in the world at implementing (and using) computational number theory algorithms, and he's ensured that Magma can do a lot. Some of that "do a lot" means catching up with (and surpassing!) what was in Pari and Sage for a long time (e.g., point counting, p-adic L-functions, etc.)

However, in addition, many people have visited Sydney and added extremely deep functionality for doing higher descents to Magma, which is not available in any open source software. Search for Magma in this paper to see how, even today, there seems to be no open source way to compute the rank of the curve y² = x³ + 169304x + 25788938. (The rank is 0.)

Two Codebases

There are several elliptic curves algorithms available only in Magma (e.g., higher descents) ... and some available only in Sage (L-function rank bounds, some overconvergent modular symbols, zeros of L-functions, images of Galois representations). I could be wrong about functionality not being in Magma, since almost anything can get implemented in a year...

The code bases are almost completely separate, which is a very good thing. Any time something gets implemented in one, it gets (or should get) tested via a big run on elliptic curves up to some bound in the other. This sometimes results in bugs being found. I remember refereeing the "integral points" code in Sage by running it against all curves up to some bound and comparing to what Magma output, and getting many discrepancies, which showed that there were bugs in both Sage and Magma.
Thus we would be way better off if Sage could do everything Magma does (and vice versa).

"If you were new faculty, would you start something like SageMathCloud sooner?"

noreply@blogger.com (William Stein) — Wed, 24 Feb 2016 04:40:00 +0000

I was recently asked by a young academic: "If you were a new faculty member again, would you start something like SageMathCloud sooner or simply leave for industry?" The academic goes on to say "I am increasingly frustrated by continual evidence that it is more valuable to publish a litany of computational papers with no source code than to do the thankless task of developing a niche open source library; deep mathematical software is not appreciated by either mathematicians or the public."

I wanted to answer that "things have gotten better" since back in 2000 when I started as an academic who does computation. Unfortunately, I think they have gotten worse. I do not understand why. In fact, this evening I just received the most recent in a long string of rejections by the NSF.

Regarding a company versus taking a job in industry, for me personally there is no point in starting a company unless you have a goal that can only be accomplished via a company, since building a business from scratch is extremely hard and has little to do with math or research. I do have such a goal: "create a viable open source alternative to Mathematica, etc...". I was very clearly told by Michael Monagan (co-founder of Maplesoft) in 2006 that this goal could not be accomplished in academia, and I spent the last 10 years trying to prove him wrong.

On the other hand, leaving for a job in industry means that your focus will switch from "pure" research to solving concrete problems that make products better for customers. That said, many of the mathematicians who work on open source math software do so because they care so much about making the experience of using math software much better for the math community. What often drives Sage developers is exactly the sort of passionate care for "consumer focus" and products that also makes one successful in industry. I'm sure you know exactly what I mean, since it probably partly motivates your work. It is sad that the math community turns its back on such people. If the community were to systematically embrace them, instead of losing all these $300K+/year engineers to mathematics entirely -- which is exactly what we do constantly -- the experience of doing mathematics could be massively improved into the future. But that is not what the community has chosen to do. We are shooting ourselves in the foot.

Now that I have seen how academia works from the inside over 15 years I'm starting to understand a little why these things change very slowly, if ever. In the mathematics department I'm at, there are a small handful of research areas in pure math, and due to how hiring works (voting system, culture, etc.) we have spent the last 10 years hiring in those areas little by little (to replace people who die/retire/leave). I imagine most mathematics departments are very similar. "Open source software" is not one of those traditional areas. Nobody will win a Fields Medal in it.

Overall, the mathematical community does not value open source mathematical software in proportion to its value, and doesn't understand its importance to mathematical research and education. I would like to say that things have got a lot better over the last decade, but I don't think they have. My personal experience is that much of the "next generation" of mathematicians who would have changed how the math community approaches open source software are now in industry, or soon will be, and hence they have no impact on academic mathematical culture. Every one of my Ph.D. students are now at Google/Facebook/etc.

We as a community overall would be better off if, when considering how we build departments, we put "mathematical software writers" on an equal footing with "algebraic geometers". We should systematically consider quality open source software contributions on a potentially equal footing with publications in journals.

To answer the original question, YES, knowing what I know now, I really wish I had started something like SageMathCloud sooner. In fact, here's the previously private discussion from eight years ago when I almost did.

--

- There is a community generated followup ...

- Relevant blog post: About Software Development in Companies, Communities and the Academia

Open source is now ready to compete with Mathematica for use in the classroom

noreply@blogger.com (William Stein) — Thu, 11 Feb 2016 04:19:00 +0000

When I think about what makes SageMath different, one of the most fundamental things is that it was created by people who use it every day. It was created by people doing research math, by people teaching math at universities, and by computer programmers and engineers using it for research. It was created by people who really understand computational problems because we live them. We understand the needs of math research, teaching courses, and managing an open source project that users can contribute to and customize to work for their own unique needs.

The tools we were using, like Mathematica, are clunky, very expensive, and just don't do everything we need. And worst of all, they are closed source software, meaning that you can't even see how they work, and can't modify them to do what you really need. For teaching math, professors get bogged down scheduling computer labs and arranging for their students to buy and install expensive software.

So I started SageMath as an open source project at Harvard in 2004, to solve the problem that other math software is expensive, closed source, and limited in functionality, and to create a powerful tool for the students in my classes. It wasn't a project that was intended initially as something to be used by hundred of thousands of people. But as I got into the project and as more professors and students started contributing to the project, I could clearly see that these weren't just problems that pissed me off, they were problems that made everyone angry.

The scope of SageMath rapidly expanded. Our mission evolved to create a free open source serious competitor to Mathematica and similar closed software that the mathematics community was collective spending hundreds of millions of dollars on every year. After a decade of work by over 500 contributors, we made huge progress.

But installing SageMath was more difficult than ever. It was at that point that I decided I needed to do something so that this groundbreaking software that people desperately needed could be shared with the world.

So I created SageMathCloud, which is an extremely powerful web-based collaborative way for people to easily use SageMath and other open source software such as LaTeX, R, and Jupyter notebooks easily in their teaching and research. I created SageMathCloud based on nearly two decades of experience using math software in the classroom and online, at Harvard, UC San Diego, and University of Washington.

SageMathCloud is commercial grade, hosted in Google's cloud, and very large classes are using it heavily right now. It solves the installation problem by avoiding it altogether. It is entirely open source.

Open source is now ready to directly compete with Mathematica for use in the classroom. They told us we could never make something good enough for mass adoption, but we have made something even better. For the first time, we're making it possible for you to easily use Python and R in your teaching instead of Mathematica; these are industry standard mainstream open source programming languages with strong support from Google, Microsoft and other industry leaders. For the first time, we're making it possible for you to collaborate in real time and manage your course online using the same cutting edge software used by elite mathematicians at the best universities in the world.

A huge community in academia and in industry are all working together to make open source math software better at a breathtaking pace, and the traditional closed development model just can't keep up.

Thinking of using SageMathCloud in a college course?

noreply@blogger.com (William Stein) — Fri, 15 Jan 2016 17:12:00 +0000

SageMathCloud course subscriptions

"We are college instructors of the calculus sequence and ODE’s. If the college were to purchase one of the upgrades for us as we use Sage with our students, who gets the benefits of the upgrade? Is is the individual students that are in an instructor’s Sage classroom or is it the collaborators on an instructor’s project?"

If you were to purchase just the $7/month plan and apply the upgrades to *one* single project, then all collaborators on that one project would benefit from those upgrades while using that project.

If you were to purchase a course plan for say $399/semester, then you could apply the upgrades (network access and members only hosting) to 70 projects that you might create for a course. When you create a course by clicking +New, then "Manage a Course", then add students, each student has their own project created automatically. All instructors (anybody who is a collaborator on the project where you clicked "Manage a course") is also added to the student's project. In course settings you can easily apply the upgrades you purchase to all projects in the course.

Also I'm currently working on a new feature where instructors may choose to require all students in their course to pay for the upgrade themselves. There's a one time $9/course fee paid by the student and that's it. At some colleges (in some places) this is ideal, and at other places it's not an option at all. I anticipate releasing this very soon.

Getting started with SageMathCloud courses

You can fully use the SMC course functionality without paying anything in order to get familiar with it and test it out. The main benefit of paying is that you get network access and all projects get moved to members only servers, which are much more robust; also, we greatly prioritize support for paying customers.

This blog post is an overview of using SMC courses:

  http://www.beezers.org/blog/bb/2015/09/grading-in-sagemathcloud/

This has some screenshots and the second half is about courses:

  http://blog.ouseful.info/2015/11/24/course-management-and-collaborative-jupyter-notebooks-via-sagemathcloud/

Here are some video tutorials made by an instructor that used SMC with a large class in Iceland recently:

  https://www.youtube.com/watch?v=dgTi11ZS3fQ
  https://www.youtube.com/watch?v=nkSdOVE2W0A
  https://www.youtube.com/watch?v=0qrhZQ4rjjg

Note that the above videos show the basics of courses, then talk specifically about automated grading of Jupyter notebooks. That might not be at all what you want to do -- many math courses use Sage worksheets, and probably don't automate the grading yet.

Regarding using Sage itself for teaching your courses, check out the free pdf book to "Sage for Undergraduates" here, which the American Mathematical Society just published (there is also a very nice print version for about $23):

http://www.gregorybard.com/SAGE.html

Mathematics Graduate School: preparation for non-academic employment

noreply@blogger.com (William Stein) — Fri, 08 Jan 2016 16:31:00 +0000

This is about my personal experience as a mathematics professor whose students all have non-academic jobs that they love. This is in preparation for a panel at the Joint Mathematics Meetings in Seattle.

My students and industry

My graduated Ph.D. students:

3 at Google
1 at Facebook
1 at CCR

My graduating student (Hao Chen):

Applying for many postdocs
But just did summer internship at Microsoft Research with Kristin. (I’ve had four students do summer internships with Kristin)

All my students:

Have done a lot of Software development, maybe having little to do with math, e.g., “developing the Cython compiler”, “transition the entire Sage project to git”, etc.
Did a thesis squarely in number theory, with significant theoretical content.
Guilt (or guilty pleasure?) spending time on some programming tasks instead of doing what they are “supposed” to do as math grad students.

Me: academia and industry

Math Ph.D. from Berkeley in 2000; many students of my advisor (Lenstra) went to work at CCR after graduating…
Academia: I’m a tenured math professor (since 2005) – number theory.
Industry: I founded a Delaware C Corp (SageMath, Inc.) one year ago to “commercialize Sage” due to VERY intense frustration trying to get grant funding for Sage development. Things have got so bad, with so many painful stupid missed opportunities over so many years, that I’ve given up on academia as a place to build Sage.

Reality check: Academia values basic research, not products. Industry builds concrete valuable products. Not understanding this is a recipe for pain (at least it has been for me).

Advice for students from students

Robert Miller (Google)

My student Robert Miller’s post on Facebook yesterday: “I LOVE MY JOB”. Why: “Today I gave the first talk in a seminar I organized to discuss this result: ‘Graph Isomorphism in Quasipolynomial Time’. Dozens of people showed up, it was awesome!”

Background: When he was my number theory student, working on elliptic curves, he gave a talk about graph theory in Sage at a Sage Days (at IPAM). His interest there was mainly in helping an undergrad (Emily Kirkman) with a Sage dev project I hired her to work on. David Harvey asked: “what’s so hard about implementing graph isomorphism”, and Robert wanted to find out, so he spent months doing a full implementation of Brendan McKay’s algorithm (the only other one). This had absolutely nothing to do with his Ph.D. thesis work on the Birch and Swinnerton-Dyer conjecture, but I was very supportive.

Craig Citro (Google)

Craig Citro did a Ph.D. in number theory (with Hida), but also worked on Sage aLOT as a grad student and postdoc. He’s done a lot of hiring at Google. He says: “My main piece of advice to potential google applicants is ‘start writing as much code as you can, right now.’ Find out whether you’d actually enjoyworking for a company like Google, where a large chunk of your job may be coding in front of a screen. I’ve had several friends from math discover (the hard way) that they don’t really enjoy full-time programming (any more than they enjoy full-time teaching?).”

“Start throwing things on github now. Potential interviewers are going to check out your github profile; having some cool stuff at the top is great, but seeing a regular stream of commits is also a useful signal.”

Robert Bradshaw (Google)

“A lot of mathematicians are good at (and enjoy) programming. Many of them aren’t (and don’t). Find out. Being involved in Sage is significantly more than just having taken a suite of programming courses or hacking personal scripts on your own: code reviews, managing bugs, testing, large-scale design, working with others’ code, seeing projects through to completion, and collaborating with others, local and remote, on large, technical projects are all important. It demonstrates your passion.”

Rado Kirov (Google)

“Robert Bradshaw said it before me, but I have to repeat. Large scale software development requires exposure to a lot of tooling and process beyond just writing code - version control, code reviews, bug tracking, code maintenance, release process, coordinating with collaborators. Contributing to an active open-source project with a large number of contributors like Sage, is a great way to experience all that and see if you would like to make it your profession. A lot of mathematicians write clever code for their research, but if less than 10 people see it and use it, it is not a realistic representation of what working as a software engineer feels like.

The software industry is in large demand of developers and hiring straight from academia is very common. Before I got hired by Google, the only software development experience on my resume was the Sage graph editor. Along with solid understanding of algorithms and data structures that was enough to get in."

David Moulton (Google)

“Google hires mathematicians now as quantitative analysts = data engineers. Google is very flexible for a tech company about the backgrounds of its employees. We have a long-standing reading group on category theory, and we’re about to start one on Babai’s recent quasi- polynomial-time algorithm for graph isomorphism. And we have a math discussion group with lots of interesting math on it.”

My advice for math professors

Obviously, encourage your students to get involved in open source projects like Sage, even if it appears to be a waste of time or distraction from their thesis work (this will likely feel very counterintuitive you’ll hate it).

At Univ of Washington, a few years ago I taught a graduate-level course on Sage development. The department then refused to run it again as a grad course, which was frankly very frustrating to me. This is exactly the wrong thing to do if you want to increase the options of your Ph.D. students for industry jobs. Maybe quit trying to train our students to be only math professors, and instead give them a much wider range of options.

"Prime Numbers and the Riemann Hypothesis", Cambridge University Press, and SageMathCloud

noreply@blogger.com (William Stein) — Thu, 19 Nov 2015 19:34:00 +0000

Overview

Barry Mazur and I spent over a decade writing a popular math book "Prime Numbers and the Riemann Hypothesis", which will be published by Cambridge Univeristy Press in 2016. The book involves a large number of illustrations created using SageMath, and was mostly written using the LaTeX editor in SageMathCloud.

This post is meant to provide a glimpse into the writing process and also content of the book.

This is about making research math a little more accessible, about math education, and about technology.

Intended Audience: Research mathematicians! Though there is no mathematics at all in this post.

The book is here: http://wstein.org/rh/
Download a copy before we have to remove it from the web!

Goal: The goal of our book is simply to explain what the Riemann Hypothesis is really about. It is a book about mathematics by two mathematicians. The mathematics is front and center; we barely touch on people, history, or culture, since there are already numerous books that address the non-mathematical aspects of RH. Our target audience is math-loving high school students, retired electrical engineers, and you.

Clay Mathematics Institute Lectures: 2005

The book started in May 2005 when the Clay Math Institute asked Barry Mazur to give a large lecture to a popular audience at MIT and he chose to talk about RH, with me helping with preparations. His talk was entitled "Are there still unsolved problems about the numbers 1, 2, 3, 4, ... ?"

See http://www.claymath.org/library/public_lectures/mazur_riemann_hypothesis.pdf

Barry Mazur receiving a prize:

Barry's talk went well, and we decided to try to expand on it in the form of a book. We had a long summer working session in a vacation house near an Atlantic beach, in which we greatly refined our presentation. (I remember that I also finally switched from Linux to OS X on my laptop when Ubuntu made a huge mistake pushing out a standard update that hosed X11 for everybody in the world.)

Classical Fourier Transform

Going beyond the original Clay Lecture, I kept pushing Barry to see if he could describe RH as much as possible in terms of the classical Fourier transform applied to a function that could be derived via a very simple process from the prime counting function pi(x). Of course, he could. This led to more questions than it answered, and interesting numerical observations that are more precise than analytic number theorists typically consider.

Our approach to writing the book was to try to reverse engineer how Riemann might have been inspired to come up with RH in the first place, given how Fourier analysis of periodic functions was in the air. This led us to some surprisingly subtle mathematical questions, some of which we plan to investigate in research papers. They also indirectly play a role in Simon Spicer's recent UW Ph.D. thesis. (The expert analytic number theorist Andrew Granville helped us out of many confusing thickets.)

In order to use Fourier series we naturally have to rely heavily on Dirac/Schwartz distributions.

SIMUW

University of Washington has a great program called SIMUW: "Summer Institute for Mathematics at Univ of Washington.'' It's for high school; admission is free and based on student merit, not rich parents, thanks to an anonymous wealthy donor! I taught a SIMUW course one summer from the RH book. I spent one very intense week on the RH book, and another on the Birch and Swinnerton-Dyer conjecture.

The first part of our book worked well for high school students. For example, we interactively worked with prime races, multiplicative parity, prime counting, etc., using Sage interacts. The students could also prove facts in number theory. They also looked at misleading data and tried to come up with conjectures. In algebraic number theory, usually the first few examples are a pretty good indication of what is true. In analytic number theory, in contrast, looking at the first few million examples is usually deeply misleading.

Reader feedback: "I dare you to find a typo!"

In early 2015, we posted drafts on Google+ daring anybody to find typos. We got massive feedback. I couldn't believe the typos people found. One person would find a subtle issue with half of a bibliography reference in German, and somebody else would find a different subtle mistake in the same reference. Best of all, highly critical and careful non-mathematicians read straight through the book and found a large number of typos and minor issues that were just plain confusing to them, but could be easily clarified.

Now the book is hopefully not riddled with errors. Thanks entirely to the amazingly generous feedback of these readers, when you flip to a random page of our book (go ahead and try), you are now unlikely to see a typo or, what's worse, some corrupted mathematics, e.g., a formula with an undefined symbol.

Designing the cover

Barry and Gretchen Mazur, Will Hearst, and I designed a cover that combined the main elements of the book: title, Riemann, zeta:

Then designers at CUP made our rough design more attractive according their tastes. As non-mathematician designers, they made it look prettier by messing with the Riemann Zeta function...

Publishing with Cambridge University Press

Over years, we talked with people from AMS, Springer-Verlag and Princeton Univ Press about publishing our book. I met CUP editor Kaitlin Leach at the Joint Mathematics Meetings in Baltimore, since the Cambridge University Press (CUP) booth was directly opposite the SageMath booth, which I was running. We decided, due to their enthusiasm, which lasted more than for the few minutes while talking to them (!), past good experience, and general frustration with other publishers, to publish with CUP.

What is was like for us working with CUP

The actual process with CUP has had its ups and downs, and the production process has been frustrating at times, being in some ways not quite professional enough and in other ways extremely professional. Traditional book publication is currently in a state of rapid change. Working with CUP has been unlike my experiences with other publishers.

For example, CUP was extremely diligent putting huge effort into tracking down permissions for every one of the images in our book. And they weren't satisfy with a statement on Wikipedia that "this image is public domain", if the link didn't work. They tracked down alternatives for all images for which they could get permissions (or in some cases have us partly pay for them). This is in sharp contrast to my experience with Springer-Verlag, which spent about one second on images, just making sure I signed a statement that all possible copyright infringement was my fault (not their's).

The CUP copyediting and typesetting appeared to all be outsourced to India, organized by people who seemed far more comfortable with Word than LaTeX. Communication with people that were being contracted out about our book's copyediting was surprisingly difficult, a problem that I haven't experienced before with Springer and AMS. That said, everything seems to have worked out fine so far.

On the other hand, our marketing contact at CUP mysteriously vanished for a long time; evidently, they had left to another job, and CUP was recruiting somebody else to take over. However, now there are new people and they seem extremely passionate!

The Future

I'm particularly excited to see if we can produce an electronic (Kindle) version of the book later in 2016, and eventually a fully interactive complete for-pay SageMathCloud version of the book, which could be a foundation for something much broader with publishers, which addresses the shortcoming of the Kindle format for interactive computational books. Things like electronic versions of books are the sort of things that AMS is frustratingly slow to get their heads around...

Conclusions

Publishing a high quality book is a long and involved process.
Working with CUP has been frustrating at times; however, they have recruited a very strong team this year that addresses most issues.
I hope mathematicians will put more effort into making mathematics accessible to non-mathematicians.
Hopefully, this talk will give provide a more glimpse into the book writing process and encourage others (and also suggest things to think about when choosing a publisher and before signing a book contract!)

What is SageMath's strategy?

noreply@blogger.com (William Stein) — Thu, 01 Oct 2015 04:54:00 +0000

Here is SageMath's strategy, or at least what my strategy toward SageMath has been for the last 5 years.

Diagnose the problem

Statement of problem: SageMath is not growing.

Justification

Facts: Growth in the number of active users [1] of SageMath has stalled since about 2011 (as defined by Google analytics on sagemath.org). From 2008 to 2011, year-on-year growth was about 50%, which isn't great. However, from 2011 to now, year-on-year growth is slightly less than 0%. It was maybe -10% from 2013 to 2014. Incidentally, number of monthly active users of sagemath.org is about 68,652 right now, but the raw number isn't as import as the year-to-year rate of change.

I set an overall mission statement for the Sage project at the outset, which was is to be a viable alternative to Magma, Maple, Mathematica and Matlab. Being a "viable alternative" is something that holds or doesn't for specific people. A useful measure of this mission then is whether or not people use Sage. This is a different metric than trying to argue from "first principles" by making a list of features of each system, comparing benchmarks, etc.

Guiding policies

Statement of policy: focus on undergraduate students in STEM courses (science, tech, engineering, math)

Justification

In order for Sage to start growing again, identify groups of people that are not using Sage. Then decide, for each of these groups, who might find value in using Sage, especially if we are able to put work into making it easier for them to benefit from Sage. This is something to re-evaluate periodically. In itself, this is very generic -- it's what any software project that wishes to grow should do. The interesting part is the details.
Some big groups of potential future users of Sage, who use Sage very little now, include

employees/engineers in various industries (from defense contractors, to finance, to health care to "data science").
researchers in area of mathematics where Sage is currently not popular
undergraduate students in STEM courses (science, tech, engineering, math)

I think by far the most promising group is "undergraduate students in STEM courses". In many cases they use no software at all or are unhappy with what they do use. They are extremely cost sensitive. Open source provides a unique advantage in education because it is less expensive than closed source software, and having access to source code is something that instructors consider valuable as part of the learning experience. Also, state of the art performance, which often requires enormous dedicated for-pay work, is frequently not a requirement.

Actions

(a) Make access to Sage as easy as possible.
(b) Encourage the creation of educational resources (books, tutorials, etc.) that make using Sage for particular courses as easy as possible.
(c) Implement missing functionality in Sage that is needed in support of undergraduate teaching.

Justification

Why don't more undergraduates use Sage? For the most part, students use what they are told to use by their instructors. So why don't instructors chose to use Sage? (a) Sage is not trivial to install (in fact it is incredibly hard to install), (b) There are limited resources (books, tutorials, course materials, etc.) for making using Sage really easy, (c) Sage is missing key functionality needed in support undergraduate teaching.

Regarding (c), in 2008 Sage was utterly useless for most STEM courses. However, over the years things changed for the better, due to the hard work of Rob Beezer, Karl Dieter, Burcin Erocal, and many others. Also, for quite a bit of STEM work, the numerical Python ecosystem (and/or R) provides much of what is needed, and both have evolved enormously in recent years. They are all usable from Sage, and making such use easier should be an extremely high priority. Related -- Bill Hart wrote "I recently sat down with some serious developers and we discussed symbolics in Sage (which I know nothing about). They argued that Sage is not a viable contender in that area, and we discussed some of the possible reasons for that. " The reason is that the symbolic functionality in Sage is motivated by making Sage useful for undergraduate teaching; it has nothing to do with what serious developers in symbolics would care about.

Regarding (b), an NSF (called "UTMOST") helped in this direction... Also, Gregory Bard wrote "Sage for undergraduates", which is exactly the sort of thing we should be very strongly encouraging. This is a book that is published by the AMS and is also freely available. And it squarely addresses exactly this audience. Similarly, the French book that Paul Zimmerman edited is fantastic for France. Let's make an order of magnitude similar resources along these lines! Let's make vastly more tutorials and reference manuals that are "for undergraduates".

Regarding (a), in my opinion the most viable option that fits with current trends in software is a full web application that provides access to Sage. SageMathCloud is what I've been doing in this direction, and it's been growing since 2013 at over 100% year on year, and much is in place so that it could scale up to more users. It still has a huge way to go regarding user friendliness, and it is still losing money every month. But it is a concrete action toward which nontrivial effort has been invested, and it has the potential to solve problem (a) for a large number of potential STEM users. College students very often have extremely good bandwidth coupled with cheap weak laptops, so a web application is the natural solution for them.

Though much has been done to make Sage easier to install on individual computers, it's exactly the sort of problem that money could help solve, but for which we have little money. I'm optimistic that OpenDreamKit will do something in this direction.

[I've made this post motivated by the discussion in this thread. Also, I used the framework from this book.]

SageMathCloud's poor user retention rate

noreply@blogger.com (William Stein) — Wed, 23 Sep 2015 02:52:00 +0000

Poor retention rate

Many people try SageMathCloud, but only a small percentage stick around. I definitely don't know why. Recent SageMathCloud rates are below 4%:

Is it performance?

Question: Are the people who try SMC discouraged by performance issues?

I think it's unlikely many users are leaving due to hitting noticeable performance issues. I think I would know, since there's a huge bold messages all over the site that say "Email help@sagemath.com: in case of problems, do not hesitate to immediately email us. We want to know if anything is broken!" In the past when there have been performance or availability issues -- which of course do happen sometimes due to bugs or whatever -- I quickly get a lot of emails. I haven't got anything that mentioned performance recently. And usage of SMC is at an all time high: in the last day there were 676 projects created and 3500 projects modified -- which is significantly higher than ever before since the site started. It's also about 2.2x what we had exactly a year ago.

Is it the user interface?

Question: Is the SMC user interface highly discouraging and difficult to use?

My current best guess is that the main reason for attrition of our users is that they do not understand how to actually use SageMathCloud (SMC), and the interface doesn't help at all. I think a large number of users get massively confused and lost when trying to use SMC. It's pretty obvious this happens if you just watch what they do... In order to have a foundation on which to fix that, the plan I came up with in May was to at least fix the frontend implementation so that it would be much easier to do development with -- by switching from a confusing mess of jQuery soup, e.g., 2012-style single page app development -- to Facebook's new React.js approach. This is basically half done and deployed, and I'm going to work very hard for a while to finish it. Once it's done, it's going to be much easier to improve the UI to make it more user friendly.

Is it the open source software?

Question: Is open source mathematical software not sufficiently user friendly?

Fixing the UI probably won't help so much with improving the underlying open source mathematical software to be friendly though. This is a massive, deep, and very difficult problem, and might be why growth of Sage stopped in 2011:

SageMath (and maybe Numpy/Scipy/IPython/etc.) are not as user friendly as Mathematica/Matlab. I think they could be even more user friendly, but it's highly unlikely as long as the developers are mostly working on SageMath in their spare time as part of advanced research projects (which have little to do with user friendliness).

Analyzing data about mistakes, frustation, and issues people actually have with real worksheets and notebooks could also help a lot with directing our effort in improving Sage/Python/Numpy/etc to be more user friendly.

Is it support?

Question: Are users frustrated by lack of interactive support?

Having integrated high-quality support for users inside SMC, in which we help them write code, answer questions, etc., could help with retention.

Why don't you use SageMathCloud?

I've been watching this stuff closely for over a decade most waking moments, and everybody likes to complain to me. Why don't you use SageMathCloud? Tell me: wstein@sagemath.com.

noreply@blogger.com (William Stein) — Fri, 11 Sep 2015 02:38:00 +0000

Funding Open Source Mathematical Software in the United States

I do not know how to get funding for open source mathematical software in the United States. However, I'm trying.

Why: Because Sage is Hobbling Along

Despite what we might think in our Sage-developer bubble, Sage is hobbling along, and without an infusion of financial support very soon, I think the project is going to fail in the next few years. I have access to Google analytics data for sagemath.org since 2007, and there has been no growth in active users of the website since 2011:

Something that is Missing

The worse part of all for me, after ten years, is seeing things like this email today from John Palmieri, where he talks about writing slow but interesting algebraic topology code, and needing help from somebody who knows Cython to actually make his code fast.

I know from my three visits to the Magma group in Sydney that such assistance is precisely what having real financial support can provide. Such money makes it possible to have fulltime people who know the tools and how to optimize them well, and they work on this sort of speedup and integration -- this "devil is in the details" work -- for each major contribution (they are sort of like a highly skilled version of a journal copy editor and referee all in one). Doing this makes a massive difference, but also costs on the order of $1 million / year to have any real impact. 1 million is probably the Magma budget to support around 10 people and periodic visitors, and of course like 1% of the budget of Matlab/Mathematica. Magma has this support partly because Magma is closed source, and maintains tight control on who may use it.

Searching for a Funding Model

Sage is open source and freely available to all, so it is of potential huge value to the community by being owned by everybody and changeable. However, those who fund Magma (either directly or indirectly) haven't funded Sage at the same level for some reason. I can't make Sage closed source and copy that very successful funding model. I've tried everything I can think of given the time and resources I have, and the only model left that seems able to support open source is having a company that does something else well and makes money, then using some of the profit to fund open source (Intel is the biggest contributor to Linux).

SageMath, Inc.

Since I failed to find any companies that passionately care about Sage like Intel/Google/RedHat/etc. care about Linux, I started one. I've been working on SageMathCloud extremely hard for over 3 years now, with the hopes that at least it could be a way to fund Sage development.

The Simons Foundation and Open Source Software

noreply@blogger.com (William Stein) — Sat, 05 Sep 2015 18:22:00 +0000

Jim Simons

Jim Simons is a mathematician who left academia to start a hedge fund that beat the stock market. He contributes back to the mathematical community through the Simons Foundation, which provides an enormous amount of support to mathematicians and physicists, and has many outreach programs.

SageMath is a large software package for mathematics that I started in 2005 with the goal of creating a free open source viable alternative to Magma, Mathematica, Maple, and Matlab. People frequently tell me I should approach the Simons Foundation for funding to support Sage. For example:

Jim Simons, after retiring from Renaissance Technologies with a cool 15 billion, has spent the last 10 years giving grants to people in the pure sciences. He's a true academic at heart [...] Anyways, he's very fond of academics and gives MacArthur-esque grants, especially to people who want to change the way mathematics is taught. Approach his fund. I'm 100% sure he'll give you a grant on the spot.

The National Science Foundation

Last month the http://sagemath.org website had 45,114 monthly active users. However, as far as I know, there is no NSF funding for Sage in the United States right now, and development is mostly done on a shoestring in spare time. We have recently failed to get several NSF grants for Sage, despite there being Sage-related grants in the past from NSF. I know that funding is random, and I will keep trying. I have two proposals for Sage funding submitted to NSF right now.

Several million dollars per year

I was incredibly excited in 2012 when David Eisenbud invited me to a meeting at the Simons Foundation headquarters in New York City with the following official description of their goals:

The purpose of this round table is to investigate what sorts of support would facilitate the development, deployment and maintenance of open-source software used for fundamental research in mathematics, statistics and theoretical physics. We hope that this group will consider what support is currently available, and whether there are projects that the Simons Foundation could undertake that would add significantly to the usefulness of computational tools for basic research. Modes of support that duplicate or marginally improve on support that is already available through the universities or the federal government will not be of interest to the foundation. Questions of software that is primarily educational in nature may be useful as a comparison, but are not of primary interest. The scale of foundation support will depend upon what is needed and on the potential scientific benefit, but could be substantial, perhaps up to several million dollars per year.

Current modes of funding for research software in mathematics, statistics and physics differ very significantly. There may be correspondingly great differences in what the foundation might accomplish in these areas. We hope that the round table members will be able to help the foundation understand the current landscape (what are the needs, what is available, whether it is useful, how it is supported) both in general and across the different disciplines, and will help us think creatively about new possibilities.

I flew across country to this the meeting, where we spent the day discussing ways in which "several million dollars per year" could revolutionize "the development, deployment and maintenance of open-source software used for fundamental research in mathematics...".

In the afternoon Jim Simons arrived, and shook our hands. He then lectured us with some anecdotes, didn't listen to what we had to say, and didn't seem to understand open source software. I was frustrated watching how he treated the other participants, so I didn't say a word to him. I feel bad for failing to express myself.

The Decision

In the backroom during a coffee break, David Eisenbud told me that it had already been decided that they were going to just fund Magma by making it freely available to all academics in North America. WTF? I explained to David that Magma is closed source and that not only does funding Magma not help open source software like Sage, it actively hurts it. A huge motivation for people to contribute to Sage is that they do not have access to Magma (which was very expensive).

I wandered out of that meeting in a daze; things had gone so differently than I had expected. How could a goal to "facilitate the development, deployment and maintenance of open-source software... perhaps up to several million dollars per year" result in a decision that would make things possibly much worse for open source software?

That day I started thinking about creating what would become SageMathCloud. The engineering work needed to make Sage accessible to a wider audience wasn't going to happen without substantial funding (I had put years of my life into this problem but it's really hard, and I couldn't do it by myself). At least I could try to make it so people don't have to install Sage (which is very difficult). I also hoped a commercial entity could provide a more sustainable source of funding for open source mathematics software. Three years later, the net result of me starting SageMathCloud and spending almost every waking moment on it is that I've gone from having many grants to not, and SageMathCloud itself is losing money. But I remain cautiously optimistic and forge on...

We will not fund Sage

Prompted by numerous messages recently from people, I wrote to David Eisenbud this week. He suggested I write to Yuri Schinkel, who is the current director of the Simons Foundation:

Dear William,

Before I joined the foundation, there was a meeting conducted by David Eisenbud to discuss possible projects in this area, including Sage.

After that meeting it was decided that the foundation would support Magma.

Please keep me in the loop regarding developments at Sage, but I regret that we will not fund Sage at this time.

Best regards, Yuri

The Simons Foundation, the NSF, or any other foundation does not owe the Sage project anything. Sage is used by a lot of people for free, who together have their research and teaching supported by hundreds of millions of dollars in NSF grants. Meanwhile the Sage project barely hobbles along. I meet people who have fantastic development or documentations projects for Sage that they can't do because they are far too busy with their fulltime teaching jobs. More funding would have a massive impact. It's only fair that the US mathematical community is at least aware of a missed opportunity.
Funding in Europe for open source math software is much better.

Hacker News discussion

React, Flux, RethinkDB and SageMathCloud -- Summer 2015 update

noreply@blogger.com (William Stein) — Mon, 31 Aug 2015 15:56:00 +0000

I've been using databases and doing web development for over 20 years, and I've never really loved any database before and definitely didn't love any web development frameworks either. That all changed for me this summer...

SageMathCloud

SageMathCloud is a web application in which you collaboratively use Python, LaTeX, Markdown, Sage worksheets (sophisticated mathematics), task lists, R, Jupyter Notebooks, manage courses, write C programs, make chatrooms, and more. It is hosted on Google Compute Engine, but is also entirely open source and there is a pre-made Virtual Machine that you can download. A project in SMC is a Linux account, with resources constrained using cgroups and quotas. Many SMC users can collaborate on the same project, and have equal privileges in that project. Interaction with all file types (including Jupyter notebooks, task lists and course managements) is synchronized in realtime, like Google docs. There is also a global notifications feed that shows all editing activity on all files in all projects on which the user collaborates, which is a sort of highly technical version of Facebook's feed.

Rewrite motivation

I originally wrote the SageMathCloud frontend using progressive-refinement jQuery (no third-party framework beyond that) and the Cassandra database. These were reasonable choices when I started. There are much better approaches now, which are critical to dramatically improving the user experience with SMC, and also growing the developer base. So far SMC has had no nontrivial outside contributions, probably due to the difficulty of understanding the code. In fact, I think nobody besides me has ever even installed SMC, despite these install notes.

We (me, Jon Lee, Nicholas Ruhland) are currently completely rewriting the entire frontend of SMC using React.js, Flux, and RethinkDB. We started this rewrite in June 2015, with Jon being supported by Google Summer of Code (2015), Nich being supported some by NSF grants from Randy Leveque and Rekha Thomas, and with me being unemployed.

Terrible funding situation

I'm living on credit cards -- I have no NSF grant support anymore, and SageMathCloud is still losing a lot of money every month, and I'm unhappy about this situation. It was either completely quit working on SMC and instead teach or consult a lot, or lose tens of thousands of dollars. I am doing the latter right now. I was very caught off guard, since this is my first summer ever to not have NSF support since I got my Ph.D. in 2000, and I didn't expect to have my grant proposals all denied (which happened in June). There is some modest Angel investment in SageMath, Inc., but I can't bring myself to burn through that money on salary, since it would run out quickly, and I don't want to have to shut down the site due to not being able to pay the hosting bill. I've failed to get any significant free hosting, due to already getting free hosting in the past, and SageMath, Inc. not being in any incubators. For example, we tried very hard to get hosting from Google, but they flatly refused for these two reasons (they gave $60K in hosting to UW/Sage project in 2012). I'm clearly having trouble transitioning from an academic to an industry funding model. But if there are enough paying customers by January 2016, things will turn around.

Jon, Nich, and I have been working on this rewrite for three months, and hope to finish it by the end of September, when Jon and Nich will become busy with classes again. However, it seems unlikely we'll be able to finish at the current rate. Fortunately, I don't start teaching fulltime again until January, and we put a lot of work into doing a release in mid-August that fully uses RethinkDB and partly uses React.js, so that we can finish the second stage of the rewrite iteratively, without any major technical surprises.

RethinkDB

Cassandra is an excellent database for many applications, but it is not the right database for SMC and I'm making no further use of Cassandra. SMC is a realtime application that does a lot more reading than writing to the database, and SMC greatly benefits from realtime push updates from the database. I've tried quite hard in the past to build an appropriate architecture for SMC on top of Cassandra, but it is the wrong tool for the job. RethinkDB scales up linearly (with sharding and replication), and has high availability and automatic failover as of version 2.1.2. See https://github.com/rethinkdb/rethinkdb/issues/4678 for my painful path to ensuring RethinkDB actually works for me (the RethinkDB developers are incredibly helpful!).

React.js

I learned about React.js first from some "random podcast", then got more interested in it when Chris Swenson gave a demo at a Sage Days workshop in San Diego in May 2015. React (+Flux) is a web development framework that actually has solid ideas behind it, backed by an implementation that has been optimized and tested by a highly nontrivial real world application: namely the Facebook website. Even if I were to have the idea of React, implementing in a way that is actually usable would be difficult. The key idea of React.js is that -- surprisingly -- it is possible to write efficient client-side code that describes how to render the application purely as a function of its state.

React is different than jQuery. With jQuery, you write lots of code explaining how to transform the user interface of your application from one complicated state (that you might never have anticipated happening) to another complicated state. When using React.js you don't write code about how your application's visible state changes -- instead you write code to answer the question: "given this state, what should the application look like". For me, it's a game changer. This is like what one does when writing video games; the innovation is that some people at Facebook figured out how to practically program this way in a client side web browser application, then tuned their implementation based on huge amounts of real world data (Facebook has users). Oh, and they open sourced the result and ran several conferences explaining React.

React.js reminds me of when Andrew Wiles proved Fermat's Last Theorem in the mid 1990s. Wiles (and Ken Ribet) had genuine new ideas, which dramatically reshaped the landscape of number theory. The best number theorists quickly realized this and adopted to the new world, pushing the envelope of Wiles work far beyond what I expected could happen. Other people pretended like Wiles didn't exist and continued studying Fibonnaci numbers. I browsed the web development section of Barnes and Noble last night and there were dozens of books on jQuery and zero on React.js. I feel for anybody who tries to learn client-side web development by reading books at Barnes and Noble.

IPython/Jupyter and PhosphorJS

I recently met with Fernando Perez, who founded IPython/Jupyter. He seemed to tell me that currently 9 people are working fulltime on rewriting the Jupyter web notebook using the PhosphorJS framework. I tried to understand PhosphorJS based on the github page, but couldn't, except to deduce that it is mostly the work of one person from Bloomberg/Continuum Analytics. Fernando told me that they chose PhosphorJS since it very fast, and that their main motivation is to (1) make Jupyter better use their huge high-resolution monitors on their new institute at Berkeley, and (2) make it easier for developers like me to integrate/extend Jupyter into their applications. I don't understand (2), because PhosphorJS is perhaps the least popular web framework I've ever heard of (is it a web framework -- I can't tell?). I pushed Fernando to explain why they made that design choice, but didn't really understand the answer, except that they had spent a lot of time investigating alternatives (like React first). I'm intimidated by their resources and concerned that I'm making the wrong choice; however, I just can't understand why they have made what seems to me to be the wrong choice. I hope to understand more at the joint Sage/Jupyter Days 70 that we are organizing together in Berkeley, CA in November. (Edit: see https://github.com/ipython/ipython/issues/8239 for a discussion of why IPython/Jupyter uses PhosphorJS.)

Tables and RethinkDB

Our rewrite of SMC is built on Tables, Flux and React. Tables are client-side technology I wrote inspired by Facebook's GraphQL/Relay technology (and Meteor, Firebase, etc.); they synchronize data between clients and the backend database in realtime. Tables are defined by a JSON schema file, which specifies the fields in the table, and explains what get and set queries are allowed. A table is a subset of a much larger table in the database, with the subset defined by conditions that are relative to the user making the query. For example, the projects table has one entry for each project that the user is a collaborator on.

Tables are automatically synchronized between the user and the database whenever the database changes, using RethinkDB changefeeds. RethinkDB's innovation is to build realtime updates -- triggered when the result of a query to the database changes -- directly into the database at the lowest level. Of course it is possible to build something that looks the same from the outside using either a message queue (say using RabbitMQ or ZeroMQ), or by watching the replication stream from the database and triggering actions based on that (like Meteor does using MongoDB). RethinkDB's approach seems better to me, putting the abstraction at the right level. That said, based on mailing list traffic, searches, etc., it seems that very, very few people get RethinkDB yet. Also, despite years of development, RethinkDB only became "production ready" a few months ago, and only got automatic failover a few weeks ago. That said, after ironing out some kinks, I'm now using it with heavy traffic in production and it works very well.

Flux

Once data is automatically synchronized between the database and web browsers in realtime, we can build everything else on top of this. Facebook also introduced an architecture pattern that they call Flux, which works well with React. It's very different than MVC-style two-way binding frameworks, where objects are directly linked to UI elements, with an object changing causing the UI element to change and vice versa. In SMC each major part of the system has two objects associated to it: Actions and Stores. We think of them in terms of the classical CQRS pattern -- command query responsibility segregation. Actions are commands -- they are Javascript "functions" that get stuff done, but they do not return values; instead, they impact the state of the store. The store has functions that allow one to query for the state of the store, but they do not change the state of the store. The store functions must only be functions of the internal state of the store and nothing else. They might cache their results and format their output to be very convenient for rendering. But that's it.

Actions usually cause the corresponding store (or stores) to change. When a store changes, it emit a change event, which causes any React components that depend on the store to be updated, which in many cases means they are re-rendered. There are optimizations one can introduce to reduce the amount of re-rendering, which if one isn't careful leads to subtle bugs; pretty much the only subtle React UI bugs one hits are caused by such optimizations. When the UI re-renders, the user sees their view of the world change. The user then clicks buttons, types, etc., which triggers actions, which in turn update stores (and tables, hence propogating changes to the ultimate source of truth, which is the RethinkDB database). As stores update, the UI again updates, etc.

Status

So far, we have completely (re-)written the project listing, file manager, help/status page, new file page, project log, file finder, project settings, course management system, account settings, billing, project upgrade system, and file use notifications using React, Flux, and Tables, and the result works well. Bugs are much easier to fix, and it is easy (possible?) to understand the state of the system, since it is defined by the state of the database and the corresponding client-side stores. We've completely rethought everything about the UI in doing the rewrite of the above components, and it has taken several months. Also, as mentioned above, I completely rewrote most of the backend to use RethinkDB instead of Cassandra. There were also the weeks of misery for me after we made the switch over. Even after weeks of thinking/testing/wondering "what could go wrong?", we found out all kinds of surprising little things within hours of pushing everything into production, which took more than a week of sleep deprived days to sort out.

What's left? We have to rewrite the file editor tabs system, the project tabs system, and all the applications (except course management): editing text files using Codemirror, task lists (which are suprisingly complicated!), color xterm terminals, Jupyter notebooks (which will still use an iframe for the notebook itself), Sage worksheets (with complicated html output embedded in codemirror), compressed file de-archiver, the LaTeX editor, the wiki and markdown editors, and file chat. We hope to find a clean way to abstract away the various SMC applications as plugins, so that other people can easily write their own applications/plugins that will run inside of SMC. There will be a rich collection of example plugins to build on, namely the ones listed above, which are all driven by critical-to-us real world applications.

Discussion about this blog post on Hacker News.

Guiding principles for SageMath, Inc.

noreply@blogger.com (William Stein) — Wed, 27 May 2015 19:57:00 +0000

In February of this year (2015), I founded a Delaware C Corporation called "SageMath, Inc.". This is a first stab at the guiding principles for the company. It should help clarify the relationship between the company, the Sage project, and other projects like OpenDreamKit and Jupyter/IPython.

Company mission statement:

Make open source mathematical software ubiquitous.

This involves both creating the SageMathCloud website and supporting the development and distribution of the SageMath and other software, including Jupyter, Octave, Scilab, etc. Anything open source.

Company principles:

Absolutely all company funded software must be open source, under a GPLv3 compatible license. We are a 100% open source company.
Company independence and self-determination is far more important than money. A core principle is that SMI is not for sale at any price, and will not participate in any partnership (for cost) that would restrict our freedom. This means:
- reject any offers from corp development from big companies to purchase or partner,
- do not take any investment money unless absolutely necessary, and then only from the highest quality investors
- do not take venture capital ever
Be as open as possible about everything involving the company. What should not be open (since it is dangerous):
- security issues, passwords
- finances (which could attract trolls)
- private user data

What should be open:

aggregate usage data, e.g., number of users.
aggregate data that could help other open source projects improve their development, e.g., common problems we observe with Jupyter notebooks should be provided to their team.
guiding principles

Business model

SageMathCloud is freemium with the expectation that 2-5% of users pay.
Target audience: all potential users of cloud-based math-related software.

SageMathCloud mission

Make it as easy as possible to use open source mathematical software in the cloud.

This means:

Minimize onboard friction, so in less than 1 minute, you can create an account and be using Sage or Jupyter or LaTeX. Morever, the UI should be simple and streamlined specifically for the tasks, while still having deep functionality to support expert users. Also, everything persists and can be sorted, searched, used later, etc.
Minimize support friction, so one click from within SMC leads to a support forum, an easy way for admins to directly help, etc. This is not at all implemented yet. Also, a support marketplace where experts get paid to help non-experts (tutoring, etc.).
Minimize teaching friction, so everything involving software related to teaching a course is as easy as possible, including managing a list of students, distributing and collecting homework, and automated grading and feedback.
Minimize pay friction, sign up for a $7 monthly membership, then simple clear pay-as-you-go functionality if you need more power.

SageMathCloud Notifications are Now Better

noreply@blogger.com (William Stein) — Fri, 14 Nov 2014 22:31:00 +0000

I just made live a new notifications systems for SageMathCloud, which I spent all week writing.

These notifications are what you see when you click the bell in the upper right. This new system replaces the one I made live two weeks ago. Whenever somebody actively *edits* (using the web interface) any file in any project you collaborate on, a notification will get created or updated. If a person *comments* on any file in any project you collaborate on (using the chat interface to the right), then not only does the notification get updated, there is also a little red counter on top of the bell and also in the title of the SageMathCloud tab. In particular, people will now be much more likely to see the chats you make on files.

NOTE: I have not yet enabled any sort of daily email notification summary, but that is planned.

Some technical details: Why did this take all week? It's because the technology that makes it work behind the scenes is something that was fairly difficult for me to figure out how to implement. I implemented a way to create an object that can be used simultaneously by many clients and supports realtime synchronization.... but is stored by the distributed Cassandra database instead of a file in a project. Any changes to that object get synchronized around very quickly. It's similar to how synchronized text editing (with several people at once) works, but I rethought differential synchronization carefully, and also figured out how to synchronize using an eventually consistent database. This will be useful for implementing a lot other things in SageMathCloud that operate at a different level than "one single project". For example, I plan to add functions so you can access these same "synchronized databases" from Python processes -- then you'll be able to have sage worksheets (say) running on several different projects, but all saving their data to some common synchronized place (backed by the database). Another application will be a listing of the last 100 (say) files you've opened, with easy ways to store extra info about them. It will also be easy to make account and project settings more realtime, so when you change something, it automatically takes effect and is also synchronized across other browser tabs you may have open. If you're into modern Single Page App web development, this might remind you of Angular or React or Hoodie or Firebase -- what I did this week is probably kind of like some of the sync functionality of those frameworks, but I use Cassandra (instead of MongoDB, say) and differential synchronization.

I BSD-licensed the differential synchronization code that I wrote as part of the above.

A Non-technical Overview of the SageMathCloud Project

noreply@blogger.com (William Stein) — Fri, 17 Oct 2014 19:00:00 +0000

What problems is the SageMathCloud project trying to solve? What pain points does it address? Who are the competitors and what is the state of the technology right now?

What problems you’re trying to solve and why are these a problem?

Computational Education: How can I teach a course that involves mathematical computation and programming?
Computational Research: How can I carry out collaborative computational research projects?
Cloud computing: How can I get easy user-friendly collaborative access to a remote Linux server?

What are the pain points of the status quo and who feels the pain?

Student/Teacher pain:
- Getting students to install software needed for a course on their computers is a major pain; sometimes it is just impossible, due to no major math software (not even Sage) supporting all recent versions of Windows/Linux/OS X/iOS/Android.
- Getting university computer labs to install the software you need for a course is frustrating and expensive (time and money).
- Even if computer labs worked, they are often being used by another course, stuffy, and students can't possibly do all their homework there, so computation gets short shrift. Lab keyboards, hardware, etc., all hard to get used to. Crappy monitors.
- Painful confusing problems copying files around between teachers and students.
- Helping a student or collaborator with their specific problem is very hard without physical access to their computer.
Researcher pain:
- Making backups every few minutes of the complete state of everything when doing research often hard and distracting, but important for reproducibility.
- Copying around documents, emailing or pushing/pulling them to revision control is frustrating and confusing.
- Installing obscuring software is frustarting and distracting from the research they really want to do.
Everybody:
- It is frustrating not having LIVE working access to your files wherever you are. (Dropbox/Github doesn't solve this, since files are static.)
- It is difficult to leave computations running remotely.

Why is your technology poised to succeed?

When it works, SageMathCloud solves every pain point listed above.
The timing is right, due to massive improvements in web browsers during the last 3 years.
I am on full sabbatical this year, so at least success isn't totally impossible due to not working on the project.
I have been solving the above problems in less scalable ways for myself, colleagues and students since the 1990s.
SageMathCloud has many users that provide valuable feedback.
We have already solved difficult problems since I started this project in Summer 2012 (and launched first version in April 2013).

Who are your competitors?

There are no competitors with a similar range of functionality. However, there are many webapps that have some overlap in capabilities:

Mathematical overlap: Online Mathematica: "Bring Mathematica to life in the cloud"
Python overlap: Wakari: "Web-based Python Data Analysis"; also PythonAnywhere
Latex overlap: ShareLaTeX, WriteLaTeX
Web-based IDE's/terminals: target writing webapps (not research or math education): c9.io, nitrous.io, codio.com, terminal.com
Homework: WebAssign and WebWork

Right now, SageMathCloud gives away for free far more than any other similar site, due to very substantial temporary financial support from Google, the NSF and others.

What’s the total addressable market?

Though our primary focus is the college mathematics classroom, there is a larger market:

Students: all undergrad/high school students in the world taking a course involving programming or mathematics
Teachers: all teachers of such courses
Researchers: anybody working in areas that involve programming or data analysis

Moreover, the web-based platform for computing that we're building lends itself to many other collaborative applications.

What stage is your technology at?

The site is up and running and has 28,413 monthly active users
There are still many bugs
I have a precise todo list that would take me at least 2 months fulltime to finish.

Is your solution technically feasible at this point?

Yes. It is only a matter of time until the software is very polished.
Morever, we have compute resources to support significantly more users.
But without money (from paying customers or investment), if growth continues at the current rate then we will have to clamp down on free quotas for users.

What technical milestones remain?

Infrastructure for creating automatically-graded homework problems.
Fill in lots of details and polish.

Do you have external credibility with technical/business experts and customers?

Business experts: I don't even know any business experts.
Technical experts: I founded the Sage math software, which is 10 years old and relatively well known by mathematicians.
Customers: We have no customers; we haven't offered anything for sale.

Market research?

I know about math software and its users as a result of founding the Sage open source math software project, NSF-funded projects I've been involved in, etc.

Is the intellectual property around your technology protected?

The IP is software.
The website software is mostly new Javascript code we wrote that is copyright Univ. of Washington and mostly not open source; it depends on various open source libraries and components.
The Sage math software is entirely open source.

Who are the team members to move this technology forward?

I am the only person working on this project fulltime right now.

Everything: William Stein -- UW professor
Browser client code: Jon Lee, Andy Huchala, Nicholas Ruhland -- UW undergrads
Web design, analytics: Harald Schilly -- Austrian grad student
Hardware: Keith Clawson

Why are you the ideal team?

We are not the ideal team.
If I had money maybe I could build the ideal team, leveraging my experience and connections from the Sage project...

Public Sharing in SageMathCloud, Finally

noreply@blogger.com (William Stein) — Thu, 16 Oct 2014 20:29:00 +0000

SageMathCloud (SMC) is a free (NSF, Google and UW supported) website that lets you collaboratively work with Sage worksheets, IPython notebooks, LaTeX documents and much, much more. All work is snapshotted every few minutes, and copied out to several data centers, so if something goes wrong with a project running on one machine (right before your lecture begins or homework assignment is due), it will pop up on another machine. We designed the backend architecture from the ground up to be very horizontally scalable and have no single points of failure.

This post is about an important new feature: You can now mark a folder or file so that all other users can view it, and very easily copy it to their own project.

This solves problems:

Problem: You create a "template" project, e.g., with pre-installed software, worksheets, IPython notebooks, etc., and want other users to easily be able to clone it as a new project. Solution: Mark the home directory of the project public, and share the link widely.

Problem: You create a syllabus for a course, an assignment, a worksheet full of 3d images, etc., that you want to share with a group of students. Solution: Make the syllabus or worksheet public, and share the link with your students via an email and on the course website. (Note: You can also use a course document to share files with all students privately.) For example...

Problem: You run into a problem using SMC and want help. Solution: Make the worksheet or code that isn't working public, and post a link in a forum asking for help.

Problem: You write a blog post explaining how to solve a problem and write related code in an SMC worksheet, which you want your readers to see. Solution: Make that code public and post a link in your blog post.

Here's a screencast.

Each SMC project has its own local "project server", which takes some time to start up, and serves files, coordinates Sage, terminal, and IPython sessions, etc. Public sharing completely avoids having anything to do with the project server -- it works fine even if the project server is not running -- it's always fast and there is no startup time if the project server isn't running. Moreover, public sharing reads the live files from your project, so you can update the files in a public shared directory, add new files, etc., and users will see these changes (when they refresh, since it's not automatic).
As an example, here is the cloud-examples github repo as a share. If you click on it (and have a SageMathCloud account), you'll see this:

What Next?

There is an enormous amount of natural additional functionality to build on top of public sharing.

For example, not all document types can be previewed in read-only mode right now; in particular, IPython notebooks, task lists, LaTeX documents, images, and PDF files must be copied from the public share to another project before people can view them. It is better to release a first usable version of public sharing before systematically going through and implementing the additional features needed to support all of the above. You can make complicated Sage worksheets with embedded images and 3d graphics, and those can be previewed before copying them to a project.
Right now, the only way to visit a public share is to paste the URL into a browser tab and load it. Soon the projects page will be re-organized so you can search for publicly shared paths, see all public shares that you have previously visited, who shared them, how many +1's they've received, comments, etc.

Also, I plan to eventually make it so public shares will be visible to people who have not logged in, and when viewing a publicly shared file or directory, there will be an option to start it running in a limited project, which will vanish from existence after a period of inactivity (say).

There are also dozens of details that are not yet implemented. For example, it would be nice to be able to directly download files (and directories!) to your computer from a public share. And it's also natural to share a folder or file with a specific list of people, rather than sharing it publicly. If somebody is viewing a public file and you change it, they should likely see the update automatically. Right now when viewing a share, you don't even know who shared it, and if you open a worksheet it can automatically execute Javascript, which is potentially unsafe. Once public content is easily found, if somebody posts "evil" content publicly, there needs to be an easy way for users to report it.
Sharing has been thought about a great deal during the last few years in the context of sites such as Github, Facebook, Google+ and Twitter. With SMC, we've developed a foundation for interactive collaborative computing in a browser, and will introduce sharing on top of that in a way that is motivated by your problems. For example, as with Github or Google+, when somebody makes a copy of your publicly shared folder, this copy should be listed (under "copies") and it could start out public by default. There is much to do.

One reason it took so long to release the first version of public sharing is that I kept imagining that sharing would happen at the level of complete projects, just like sharing in Github. However, when thinking through your problems, it makes way more sense in SMC to share individual directories and files. Technically, sharing at this level works works well for read-only access, not for read-write access, since projects are mapped to Linux accounts. Another reason I have been very hesitant to support sharing is that I've had enormous trouble over the years with spammers posting content that gets me in trouble (with my University -- it is illegal for UW to host advertisements). However, by not letting search engines index content, the motivation for spammers to post nasty content is greatly reduced.

Imagine publicly sharing recipes for automatically gradable homework problems, which use the full power of everything installed in SMC, get forked, improved, used, etc.

SageMathCloud Course Management

noreply@blogger.com (William Stein) — Wed, 01 Oct 2014 19:05:00 +0000

SageMathCloud now has some very rudimentary course management functionality. Though still very basic, it makes it much, much easier to make files available to students, collect homework, etc., entirely using SageMathCloud (without having to use email or any other submissions systems or github to share files). To get started, create a new course by clicking on +New, then typing the name of your course and click "Course":

Course documents allow you to manage a list of students, create projects for each of them, share homework and folders with them, collect homework, and grade and return it to students.

Add Students

To add a student to your course, click on the Students tab, then type a student's name or email address in the "Add student" box to the right and press enter or click the button. Searching for an email address is best, since you can be certain that the person you're adding is really a student in your course (instead of an unknown SageMathCloud user with the same name); moreover, if your student doesn't already have an account, they will receive an invitation via email. (NOTE: There is currently no way to add dozens of students at once.)

Once you add a student, click on Create Project next to your student's name to create their project. You own the project, and they will be added as a collaborator, and invited by email if they do not yet have an account.

Don't worry, student projects are hidden by default from your main project listing.

(To delete a student, click to the right of the student. You can toggle whether deleted students are shown in settings.)

Add Assignments

To create an assignment, first click New in the upper left of your project to create a new folder, and create or add files to it, as usual. Click on the Assignments tab of the course, then search for the folder by typing some part of its name in the box on the far right. Click to select the folder and it will be added to your list of assignments. To make copies of this folder available to all of your students whose projects you have created, click the Assign button. NOTE: You can share arbitrary folders with any contents with your students -- folders don't have to contain "assignments", and may contain anything, Sage worksheets, IPython notebooks, LaTeX documents, etc.

Collecting and Grading Assignments

To collect an assignment from your students, click Collect to the right of an assignment to collect it from all students. (NOTE: There is currently no way to schedule collection to happen at a specific time -- it happens when you click the button. Click it again to update the collected files.)
Once the assignments are collected, click and select a student to jump to the folder that contains the collected version of a student's assignment. Edit the files there, indicating grades on each problem, etc. NOTE: There is no special support yet for recording grades, knowing which homework you have graded already, etc.
When you are done grading, click Return Graded to return the graded homework to the students. If the homework folder is called homework1, then the graded version will appear in the student's project as homework1-graded.

Course Settings

Set the title and description of the course in the Settings tab. When you change these, the new title and description propagates automatically to all student projects for this course.

Other

A Video Tutorial

Technical Remarks

The underlying file format of a .course file is a plain text file with one line in JSON format for each student, shared assignment, and for settings.

What is SageMathCloud: let's clear some things up

noreply@blogger.com (William Stein) — Wed, 27 Aug 2014 13:55:00 +0000

[PDF version of this blog post]

"You will have to close source and commercialize Sage. It's inevitable." -- Michael Monagan, cofounder of Maple, told me this in 2006.

SageMathCloud (SMC) is a website that I first launched in April 2013, through which you can use Sage and all other open source math software online, edit Latex documents, IPython notebooks, Sage worksheets, track your todo items, and many other types of documents. You can write, compile, and run code in most programming languages, and use a color command line terminal. There is realtime collaboration on everything through shared projects, terminals, etc. Each project comes with a default quota of 5GB disk space and 8GB of RAM.

SMC is fun to use, pretty to look at, frequently backs up your work in many ways, is fault tolerant, encourages collaboration, and provides a web-based way to use standard command-line tools.

The Relationship with the SageMath Software

The goal of the SageMath software project, which I founded in 2005, is to create a viable free open source alternative to Magma, Mathematica, Maple, and Matlab. SMC is not mathematics software -- instead, SMC is best viewed by analogy as a browser-based version of a Linux desktop environment like KDE or Gnome. The vast majority of the code we write for SMC involves text editor issues (problems similar to those confronted by Emacs or Vim), personal information management, support for editing LaTeX documents, terminals, file management, etc. There is almost no mathematics involved at all.

That said, the main software I use is Sage, so of course support for Sage is a primary focus. SMC is a software environment that is being optimized for its users, who are mostly college students and teachers who use Sage (or Python) in conjunction with their courses. A big motivation for the existence of SMC is to make Sage much more accessible, since growth of Sage has stagnated since 2011, with the number one show-stopper obstruction being the difficulty of students installing Sage.

Sage is Failing

Measured by the mission statement, Sage has overall failed. The core goal is to provide similar functionality to Magma (and the other Ma's) across the board, and the Sage development model and community has failed to do this across the board, since after 9 years, based on our current progress, we will never get there. There are numerous core areas of research mathematics that I'm personally familiar with (in arithmetic geometry), where Sage has barely moved in years and Sage does only a few percent of what Magma does. Unless there is a viable plan for the areas to all be systematically addressed in a reasonable timeframe, not just with arithmetic geometry in Magma, but with everything in Mathematica, Maple., etc, we are definitely failing at the main goal I have for the Sage math software project.

I have absolutely no doubt that money combined with good planning and management would make it possible to achieve our mission statement. I've seen this hundreds of times over at a small scale at Sage Days workshops during the last decade. And let's not forget that with very substantial funding, Linux now provides a viable free open source alternative to Microsoft Windows. Just providing Sage developers with travel expenses (and 0 salary) is enough to get a huge amount done, when possible. But all my attempts with foundations and other clients to get any significant funding, at even the level of 1% of the funding that Mathematica gets each year, has failed. For the life of the Sage project, we've never got more than maybe 0.1% of what Mathematica gets in revenue. It's just a fact that the mathematics community provides Mathematica $50+ million a year, enough to fund over 600 fulltime positions, and they won't provide enough to fund one single Sage developer fulltime.

But the Sage mission statement remains, and even if everybody else in the world gives up on it, I HAVE NOT. SMC is my last ditch strategy to provide resources and visibility so we can succeed at this goal and give the world a viable free open source alternative to the Ma's. I wish I were writing interesting mathematical software, but I'm not, because I'm sucking it up and playing the long game.

The Users of SMC

During the last academic year (e.g., April 2014) there were about 20K "monthly active users" (as defined by Google Analytics), 6K weekly active users, and usually around 300 simultaneous connected users. The summer months have been slower, due to less teaching.

Numerically most users are undergraduate students in courses, who are asked to use SMC in conjunction with a course. There's also quite a bit of usage of SMC by people doing research in mathematics, statistics, economics, etc. -- pretty much all computational sciences. Very roughly, people create Sage worksheets, IPython notebooks, and Latex documents in somewhat equal proportions.

What SMC runs on

Technically, SMC is a multi-datacenter web application without specific dependencies on particular cloud provider functionality. In particular, we use the Cassandra database, and custom backend services written in Node.js (about 15,000 lines of backend code). We also use Amazon's Route 53 service for geographically aware DNS. There are two racks containing dedicated computers on opposites sides of campus at University of Washington with 19 total machines, each with about 1TB SSD, 4TB+ HDD, and 96GB RAM. We also have dozens of VM's running at 2 Google data centers to the east.

A substantial fraction of the work in implementing SMC has been in designing and implementing (and reimplementing many times, in response to real usage) a robust replicated backend infrastructure for projects, with regular snapshots and automatic failover across data centers. As I write this, users have created 66677 projects; each project is a self-contained Linux account whose files are replicated across several data centers.

The Source Code of SMC

The underlying source of SMC, both the backend server and frontend client, is mostly written in CoffeeScript. The frontend (which is nearly 20,000 lines of code) is implemented using the "progressive refinement" approach to HTML5/CSS/Javascript web development. We do not use any Javascript single page app frameworks, though we make heavy use of Bootstrap3 and jQuery. All of the library dependencies of SMC, e.g., CodeMirror, Bootstrap, jQuery, etc. for SMC are licensed under very permissive BSD/MIT, etc. libraries. In particular, absolutely nothing in the Javascript software stack is GPL or AGPL licensed. The plan is that any SMC source code that will be open sourced will be released under the BSD license. Some of the SMC source code is not publicly available, and is owned by University of Washington. But other code, e.g., the realtime sync code, is already available.
Some of the functionality of SMC, for example Sage worksheets, communicate with a separate process via a TCP connection. That separate process is in some cases a GPL'd program such as Sage, R, or Octave, so the viral nature of the GPL does not apply to SMC. Also, of course the virtual machines are running the Linux operating system, which is mostly GPL licensed. (There is absolutely no AGPL-licensed code anywhere in the picture.)

Note that since none of the SMC server and client code links (even at an interpreter level) with any GPL'd software, that code can be legally distributed under any license (e.g., from BSD to commercial).
Also we plan to create a fully open source version of the Sage worksheet server part of SMC for inclusion with Sage. This is not our top priority, since there are several absolutely critical tasks that still must be finished first on SMC, e.g., basic course management.

The SMC Business Model

The University of Washington Center for Commercialization (C4C) has been very involved and supportive since the start of the projects. There are no financial investors or separate company; instead, funding comes from UW, some unspent grant funds that were about to expire, and a substantial Google "Academic Education Grant" ($60K). Our first customer is the "US Army Engineer Research and Development Center", which just started a support/license agreement to run their own SMC internally. We don't currently offer a SaaS product for sale yet -- the options for what can be sold by UW are constrained, since UW is a not-for-profit state university. Currently users receive enhancements to their projects (e.g., increased RAM or disk space) in exchange for explaining to me the interesting research or teaching they are doing with SMC.

The longterm plan is to start a separate for-profit company if we build a sufficient customer base. If this company is successful, it would also support fulltime development of Sage (e.g., via teaching buyouts for faculty, support of students, etc.), similar to how Magma (and Mathematica, etc.) development is funded.

In conclusion, in response to Michael Monagan, you are wrong. And you are right.

You don't really think that Sage has failed, do you?

noreply@blogger.com (William Stein) — Tue, 26 Aug 2014 11:46:00 +0000

I just received an email from a postdoc in Europe, and very longtime contributor to the Sage project. He's asking for a letter of recommendation, since he has to leave the world of mathematical software development (after a decade of training and experience), so that he can take a job at hedge fund. He ends his request with the question:

> P.S. You don't _really_ think that Sage has failed, do you?

After almost exactly 10 years of working on the Sage project, I absolutely do think it has failed to accomplish the stated goal of the mission statement: "Create a viable free open source alternative to Magma, Maple, Mathematica and Matlab.". When it was only a few years into the project, it was really hard to evaluate progress against such a lofty mission statement. However, after 10 years, it's clear to me that not only have we not got there, we are not going to ever get there before I retire. And that's definitely a failure.

Here's a very rough quote I overheard at lunch today at Sage Days 61, from John Voight, who wrote much quaternion algebra code in Magma: "I'm making a list of what is missing from Sage that Magma has for working with quaternion algebras. However, it's so incredibly daunting, that I don't want to put in everything. I've been working on Magma's quaternion algebras for over 10 years, as have several other people. It's truly daunting how much functionality Magma has compared to Sage."

The only possible way Sage will not fail at the stated mission is if I can get several million dollars a year in money to support developers to work fulltime on implementing interesting core mathematical algorithms. This is something that Magma has had for over 20 years, and that Maple, Matlab, and Mathematica also have. That I don't have such funding is probably why you are about to take a job at a hedge fund. If I had the money, I would try to hire a few of the absolute best people (rather than a bunch of amateurs), people like you, Robert Bradshaw, etc. -- we know who is good. (And clearly I mean serious salaries, not grad student wages!)

So yes, I think the current approach to Sage has failed. I am going to try another approach, namely SageMathCloud. If it works, maybe the world will get a free open source alternative to Magma, Mathematica, etc. Otherwise, maybe the world never ever will. If you care like I do about having such a thing, and you're teaching course, or whatever, maybe try using SageMathCloud. If enough people use SageMathCloud for college teaching, then maybe a business model will emerge, and Sage will get proper funding.