Python is the programming language of choice for most data science and machine learning work. Or, to be more precise, Python’s ecosystem. This is what makes Python so popular, as with so many libraries and frameworks built around it, users are spoiled for choice. Some of those libraries have made a name for themselves, and are even the subject of academic acclaim.
There’s just one problem. Python is not always the fastest environment in which to run data science and machine learning tasks. By now it’s more or less common knowledge that machine learning tasks are typically greatly accelerated when run on GPUs, but what about the rest of the data science pipelines?
Enter Dask. Dask is an open source framework whose goal is to natively scale Python. Nvidia adopted Dask by hiring its creator, Matthew Rocklin. Dask has been around for several years, and companies are adopting it at a certain pace. Saturn Cloud, a startup founded by Hugo Shi and Sebastian Metti, saw a path for enterprise adoption that could be much faster.
Today, Saturn Cloud is announcing a partnership with Snowflake, enabling access to Dask to a broader audience, and making that path even faster. ZDNet connected with Metti to discuss Saturn Cloud’s vision and execution.
One hundred times faster data science on the table, a few lines of code away
Saturn Cloud was founded in April 2019. From then till today’s announcement of the partnership with Snowflake, which Metti called “a kingmaker”, it has accomplished a few feats: a seed round of $4 million by SignalFire, a headcount of 20 people and growing, and most importantly, paying customers, the likes of Mount Sinai Hospital, Nestle, Trimark, WL Gore, and SM Energy, no less.
They must be doing something right. Co-founders Metti and Shi have been in the startup scene since their mid-twenties, and are deeply embedded in the data science and machine learning world too. Shi was one of the co-founders of Anaconda and is a core maintainer of Bokeh, both big in the Python world in their way.
What Saturn Cloud promises is simple, as much as it is tempting: run your data science tasks 100 times faster. How is this possible? In a nutshell, by offering Dask as a service. Unsurprisingly, this seems to have hit a nerve with the data science crowd. Metti said he’d like people to think of Saturn Cloud as a Databricks for Dask – quite the role model, we might add. Except Saturn Cloud claims to run tasks faster, due to Dask’s superior performance.
When discussing context for Dask’s adoption via Saturn Cloud, Metti mentioned a client where they had a machine learning model that took over 60 days to run. This meant they could iterate on this model four times a year. Using Saturn Cloud’s tooling, they brought that down to just 11 hours:
“That’s a huge breakthrough. Companies see one hundred times faster data science is on the table and it’s just a few lines of code away. This tells them two things. Wow, we could do so much better and at the same time — if our competition does that, they’re going to pass us right away because it’s so much faster”.
Admittedly, that’s a compelling argument, and it explains the kind of organic growth Saturn Cloud has been seeing. The partnership with Snowflake came via market pull too, Metti said: “We started seeing our users and their users talking about our respective tooling. We started seeing blog posts around the tools that we are integrating with Snowflake.
That naturally led to a discussion with us and their strategic partnerships team saying, hey, it looks like the market’s really pulling on this direction. They want to see 100 X faster data science, but they want to also do the querying through Snowflake, we should talk and make that happen”.
Many Snowflake users are doing analytics with SQL, and with large datasets, they want to be able to do advanced analytics without having to downsample, or having to wait a week, and this is what Saturn Cloud brings to the table as per Metti. He went on to add that the integration with Snowflake is bidirectional and seamless.
Growth and partnerships
Saturn Cloud’s growth is impressive, and the partnership with Snowflake seems set to boost it even further – a point emphasized by Metti, too. However, being largely based as it is on operationalizing Dask, to us it begged the question: how come Rocklin, Dask’s creator is not involved in this?
Rocklin has left Nvidia to pursue his own path, and has founded another company called Coiled, which also offers a Dask product, in addition to offering training. Metti acknowledged Rocklin has had a huge contribution in building the Dask codebase and ecosystem. He went on to add that startups’ paths converge and diverge, and everyone is experimenting and exploring:
“We’ll see over the next five years. I think companies like Snowflake and Databricks are the ones that are really shaking the market. So we’ll see where they go and the market goes. We’re operating on those tectonic plates. We get shaken around by how they decide to execute”.
Another partnership which Metti said has been key in Saturn Cloud’s growth is the one with AWS. We should also mention here that Saturn Cloud’s business model is not based on subscriptions, but it’s pay-per-use. Being an AWS partner means that Saturn Cloud users get to enjoy integration on both functionality and billing, with Saturn Cloud offering more fine-grained analytics on the latter.
Saturn Cloud seems set to continue on its growth trajectory, and is expanding in a number of ways. First, horizontally, as Metti said they are trying to become an end-to-end platform, adding features such as version control and dashboards (via integrations). Then vertically, by customizing the offering for clients in different domains, including providing access to datasets.
Plus, the usual round of improvements — from documentation to new features, the list never ends. Saturn Cloud has experienced most of its growth during the COVID-19 crisis, and is continuing to grow the team entirely remotely, which is a feat of its own. Culture is the hardest part to get right, and adding remote to the mix does not make it easier — especially if you want to be the next Databricks.