A decade ago, Ion Stoica and his colleagues at UC Berkeley’s school of computing identified the roadblock to performing advanced analytics. The challenge at the time was what we then called Big Data. Cheap storage and compute could be tapped, courtesy of the Hadoop project, but the jobs tended to take hours or days. Stoica and colleagues worked on a solution that exploited memory, and the result was the Apache Spark project. Created at UC Berkeley’s AMPLab, it has become the de facto standard for large-scale batch data processing, not to mention the technology that birthed a company currently valued at $28 billion.
Fast forward to the present, and Stoica and his colleagues have identified compute as the new bottleneck, as the embrace of machine learning has made workload processing more complex. They still consume plenty of data, of course.
Ironically, the disconnect is not from a lack of resources. All the ingredients for running AI models in production are now in place, and if run properly, it can be quite cost-effective. For instance, serverless services in the cloud have grown popular, but they have typically been restricted to serving simple apps built of functions where the big requirement was autoscaling. Storage has become quite cheap, and developers are faced with a richness of processor instances that can be matched to the problem, from GPUs to specialized ASICs. There are plenty of frameworks, such as TensorFlow, that help developers structure the orchestration of computing. And there is Kubernetes that can automate orchestration.
But full stop. Today, it requires end-to-end services that automate the deployment of ML models, working knowledge of Kubernetes, and/or a complex toolchain to handle the autoscaling. And unlike the relatively simple apps that are built from functions, machine learning and deep learning typically involve complex, multistep, iterative programs that, from a compute standpoint, consume resources like classic HPC (high-performance computing).
Developed at AMPLab’s successor RISELab, the solution is Ray, an open-source project hosted on GitHub. Stoica, along with fellow lab member Robert Nishihara and fellow Berkeley professor Michael I. Jordan created the project, and they have cofounded the company, Anyscale, to commercialize it. With $60 million in funding, It’s backed by some of the same venture partners who are behind Databricks. In a few words, Ray will enable developers and data scientists to launch serverless compute for their own ML models and apps without requiring knowledge of the underlying plumbing. Today, the Ray community is kicking off the second Ray Summit featuring the usual early adopter suspects showing how data scientists and developers on laptops have pulled this off.
Simply stated, Ray provides an API for building distributed applications. It enables any developer working on a laptop to deploy a model on a serverless environment, where deployment and autoscaling are automated under the covers. It delivers a serverless experience without requiring the developer to sign up for a specific cloud serverless service or know anything about setting up and running such infrastructure.
A Ray cluster consists of a head node and a set of worker nodes that can work on any infrastructure, on-premises or in a public cloud. Its capabilities include an autoscaler that introspects pending tasks, and then activates the minimum number of nodes to run them, and monitors execution to ramp up more nodes or close them down. There is some assembly required, however, as the developer needs to register to compute instance types.
Ray can start and stop VMs in the cloud of choice; the ray docs provide information about how to do this in each of the major clouds and Kubernetes.
One would be forgiven for getting a sense that Ray is déjà vu all over again. Stoica, who was instrumental in fostering Spark’s emergence, is taking on a similar role with Ray. Both originated out of UC Berkeley, and as open-source projects, both are going the community route. Just as Spark boasted a portfolio of dozens of open source libraries contributed by the community, the same will be true with Ray. The major difference is one of target audience: while Spark, and Databricks, was aimed at data scientists and data engineers, Ray will be primarily aimed at developers seeking shortcuts for getting complex machine learning models into production.
And about that logo. Yup, it looks an awful lot like Kafka, doesn’t it? But don’t get fooled. The input to or output from running a model on a Ray cluster may involve a Kafka stream, but that’s as close as the connection between the two gets.
Just as Spark was developed in Scala and initially optimized for it, Ray was designed for Python and its ecosystem of libraries as a first-class citizen but had an API that is sufficiently open to be invoked from other languages. But initially, some languages and models will be more equal than others. Any library from any language can call compute through Ray’s API. Still, libraries can be optimized with specialized execution routines to exploit Ray’s serverless orchestration more efficiently, with Horovod being the poster child.
Just as Databricks was formed to deliver a commercial platform-as-a-service for optimized Spark, Anyscale will follow in the same footsteps. Stoica, who continues to be executive chairman of Databricks, is reprising his role with the new startup, and as noted before, is launching with some of the same venture backers. Anyscale’s service is currently in beta.
We would imagine that Anyscale would add some bells and whistles, such as prepopulating characteristics of popular node types (e.g., Amazon EC2 C6g) and a richer management console beyond the basic dashboard with the open-source community edition. And, while Anyscale bills its API as “universal,” meaning it can be accessed from programs written in any language, don’t be surprised if the company (like Databricks before it) develops optimizations.