Happy New Year! There is no scarcity of Machine Learning Operations products being introduced to the industry. Since June of 2020, over 84 new ML toolsets incorporating but not confined to All-in-One, data-pipeline, and model training applications were born. In this list of almost 300 MLOps tools, there are 180 startups. Out of these 180 startups, more than 60 raised capital in 2020, and about two-thirds are fixated on data-pipelines and modeling-training. The Data-pipeline, associated with the Modelling and Training categories have led the way in part because 80% of the time necessary to construct models requires complete data sets, wrangling of data and understanding the sources of where the data resides, as well as imputing, compiling, organizing and labeling that data.
In this post, my objective is to provide an overview of the ML landscape before another burst of investments takes place in 2021. More significantly, we believe this post will help you narrow the focus on applications that may resonate most with you so you can discover the organizations you are most interested in working with or investing in. For the current toolset itself, of these 280+enterprises, we’ll highlight four groups to help you leverage and target:
First is the newcomer category, the All-in-one Platform. Almost $2.6B has been invested in the category. This type of implementation includes most, if not all, ML procedure phases, from data preparation to model registration and evaluation. It also encompasses Data ingestion, Data preparation, Data exploration, Feature engineering, Model design and training, Model Testing, Deployment, comparison measurement and maintenance. This kind of application does not provide the ability to shop for models in the AI marketplace, but there are out of the box algorithms you can test, which you can apply for training and fitting.
What is already prevalent are AI Marketplaces, where you can shop or buy models that you think would be a good fit for your data. These AI marketplaces are quickly becoming a centralized hub to offer your models to ML designers. These out of the box models saves enormous time, with boosting and bagging features already pre-packaged.
The second area of focus of these new applications is in Data-pipeline. Wrangling-data is where all the heavy lifting takes place; Data Management, querying, labeling, arrangement, ingestion, augmentations, warehousing, versioning, and analytics most often reside. About 80% of the data preparation stage is undertaken in these sub-categories before any training or modeling can be presented. A complete dataset is what most ML engineers desire; some of these companies make it effortless to train new data sets for complexity. This category leads the way for investors, where almost $4B has been invested since 2015.
Modelling and Training
The sub-categories here include interoperability, Framework, experiment tracking, Distributed Training, and Benchmarking. Investments in modeling and training were substantial in 2020, and the emphasis remains on interoperability. Discovering the best fit for your model is not time-consuming; it’s more about experimentation. Once you detect the “problem statement” from your customers and are transparent about what the firm expects to learn or extract from the data, the experimenting of discovering an algorithm that matches the data ensures your model is precise. During experimentation, over-fitting and under-fitting will occur. You most likely will struggle between choosing a model with low bias and high variance or low variance but a higher bias. When this decision is presented to you, you will most likely discover the well-known strategy of creating an ensemble or a blended approach to make the data fit perfectly fit. While the number of modeling and training organizations in this data set is high, about 90 companies, but, only $300M has been invested in aggregate. Keep in mind that only 15-20% of your time building models are spent in this category.
Hardware will remain at the core of all investments. We will need more computing capacity to query our data with more sophisticated ML methodologies, such as Deep Learning and NNs being used more often with larger data sets. Suppose you don’t care excessively about insights and only care about your data’s fidelity and accuracy from your model. In that case, you might want to introduce more variables and complexities in the data to attain even greater precision. But this demands more powerful computing power. Two companies have raised the most money in the hardware space, accelerators, www. graphcore.ai and www.sambanova.ai, and many other companies focused on bringing AI to Edge devices (building chips optimized for inference on consumer devices with low power). In a recent Google AI conference, bring AI to the edge had a dedicated track. While we’re still at the very beginning of 2021, I believe investments in Hardware startups. Of these 280+ companies, almost $3B has been investments in Hardware.
Geography and Academic Interest in MLOps
From a geographic vantage point, most of the investments are primarily from the Bay Area, but some startups are now being established in other US hubs. Boston is a distant second. As the Bay Area remains the epicenter of MLOps. Having said this, the AI research scene appeared to have relaxed in 2020 — Google disrupted hiring for AI researchers (perhaps due to the pandemic), and Uber dismissed the entire AI team. However, the ML production scene is still evolving. Academia in AI/ML is proliferating. Amazon and other Cloud Providers are encouraging scholars to collaborate due to the immense shortage of data science professionals. This link to Amazon Science Scholars is one such site dedicated to recruiting scholars. Amazon is profoundly invested in R&D with hundreds of researchers and applied scientists committed to innovation across every company. The Amazon Scholars program has broadened academics’ opportunities, not only at Amazon but additionally pretty much for every MLOPs firm. On slack channels of MLOps enterprises like H20.ai, there is a channel dedicated to #academics. By applying research models in practice to solve challenging technical problems, MLOps companies are in a unique place to measure the impact of their research ideas. Our internal team recruited an ML professor, Mr Gordon Jemwa, who also functions as a moderator for our internet discussions at UCB. He is now a part of our ML Measurement Survey, helping to buildout the weightings for questions and answers, so we have an accurate readiness score and analysis to understand an organization’s maturity before embarking on an ML implementation.
Credit: Google News