Thursday, October 16, 2014

Public Sharing in SageMathCloud, Finally

SageMathCloud (SMC) is a free (NSF, Google and UW supported) website that lets you collaboratively work with Sage worksheets, IPython notebooks, LaTeX documents and much, much more. All work is snapshotted every few minutes, and copied out to several data centers, so if something goes wrong with a project running on one machine (right before your lecture begins or homework assignment is due), it will pop up on another machine. We designed the backend architecture from the ground up to be very horizontally scalable and have no single points of failure.

This post is about an important new feature: You can now mark a folder or file so that all other users can view it, and very easily copy it to their own project.

This solves problems:
  • Problem: You create a "template" project, e.g., with pre-installed software, worksheets, IPython notebooks, etc., and want other users to easily be able to clone it as a new project. Solution: Mark the home directory of the project public, and share the link widely.

  • Problem: You create a syllabus for a course, an assignment, a worksheet full of 3d images, etc., that you want to share with a group of students. Solution: Make the syllabus or worksheet public, and share the link with your students via an email and on the course website. (Note: You can also use a course document to share files with all students privately.) For example...

  • Problem: You run into a problem using SMC and want help. Solution: Make the worksheet or code that isn't working public, and post a link in a forum asking for help.
  • Problem: You write a blog post explaining how to solve a problem and write related code in an SMC worksheet, which you want your readers to see. Solution: Make that code public and post a link in your blog post.
Here's a screencast.

Each SMC project has its own local "project server", which takes some time to start up, and serves files, coordinates Sage, terminal, and IPython sessions, etc. Public sharing completely avoids having anything to do with the project server -- it works fine even if the project server is not running -- it's always fast and there is no startup time if the project server isn't running. Moreover, public sharing reads the live files from your project, so you can update the files in a public shared directory, add new files, etc., and users will see these changes (when they refresh, since it's not automatic).
As an example, here is the cloud-examples github repo as a share. If you click on it (and have a SageMathCloud account), you'll see this:

What Next?

There is an enormous amount of natural additional functionality to build on top of public sharing.

For example, not all document types can be previewed in read-only mode right now; in particular, IPython notebooks, task lists, LaTeX documents, images, and PDF files must be copied from the public share to another project before people can view them. It is better to release a first usable version of public sharing before systematically going through and implementing the additional features needed to support all of the above. You can make complicated Sage worksheets with embedded images and 3d graphics, and those can be previewed before copying them to a project.
Right now, the only way to visit a public share is to paste the URL into a browser tab and load it. Soon the projects page will be re-organized so you can search for publicly shared paths, see all public shares that you have previously visited, who shared them, how many +1's they've received, comments, etc.

Also, I plan to eventually make it so public shares will be visible to people who have not logged in, and when viewing a publicly shared file or directory, there will be an option to start it running in a limited project, which will vanish from existence after a period of inactivity (say).

There are also dozens of details that are not yet implemented. For example, it would be nice to be able to directly download files (and directories!) to your computer from a public share. And it's also natural to share a folder or file with a specific list of people, rather than sharing it publicly. If somebody is viewing a public file and you change it, they should likely see the update automatically. Right now when viewing a share, you don't even know who shared it, and if you open a worksheet it can automatically execute Javascript, which is potentially unsafe.  Once public content is easily found, if somebody posts "evil" content publicly, there needs to be an easy way for users to report it.

Sharing will permeate everything

Sharing has been thought about a great deal during the last few years in the context of sites such as Github, Facebook, Google+ and Twitter. With SMC, we've developed a foundation for interactive collaborative computing in a browser, and will introduce sharing on top of that in a way that is motivated by your problems. For example, as with Github or Google+, when somebody makes a copy of your publicly shared folder, this copy should be listed (under "copies") and it could start out public by default. There is much to do.

One reason it took so long to release the first version of public sharing is that I kept imagining that sharing would happen at the level of complete projects, just like sharing in Github. However, when thinking through your problems, it makes way more sense in SMC to share individual directories and files. Technically, sharing at this level works works well for read-only access, not for read-write access, since projects are mapped to Linux accounts. Another reason I have been very hesitant to support sharing is that I've had enormous trouble over the years with spammers posting content that gets me in trouble (with my University -- it is illegal for UW to host advertisements). However, by not letting search engines index content, the motivation for spammers to post nasty content is greatly reduced.

Imagine publicly sharing recipes for automatically gradable homework problems, which use the full power of everything installed in SMC, get forked, improved, used, etc.