The Lakehouse architecture, which combines the advantages of data lakes and data warehouses to provide companies with a central data and analysis platform, is now expanding Databricks with a cross-organizational and cross-team machine learning platform. At the Data + AI Summit, the company announced the official launch of Databricks Machine Learning, which will enable data engineers, data scientists and product owners to work together on ML projects.
Consistent MLOps platform
With MLflow, Databricks had already launched an open source project for the life cycle management of machine learning projects, which is now under the umbrella of the Linux Foundation alongside Apache Spark, Delta Lake, Koalas and the recently introduced Delta Sharing is managed. Databricks machine learning is now to go one step further and bring together the entire process from data architecture and pipelines (data engineering) to model training (data science) to the provision of the applications based on it (data products).
The aim is to create a central, collaborative platform for data teams in companies that bundles all the tools required, from preparing the data to experimenting through to productive operation. The platform also supports the teams with two new functions: Databricks AutoML and Databricks Feature Store. With AutoML, many of the steps that data scientists previously had to perform manually in ML model development and training can be largely automated – without the models becoming a black box, promises Databricks. Data scientists should retain control over how a model works exactly, adapt it and also be able to validate unknown data sets. Thanks to integration with MLflow, all important parameters, metrics and ML models should be able to be tracked at any time.
Keep an eye on and manage features
The feature store takes on the role of a single point of truth for all features already available in the organization or company. Data teams can use the store to see how the features are structured and where they are already being used – including the data sources used for the calculation. The feature store does not support data teams with data lineage, but also helps specifically to avoid phenomena such as online-offline skew, which can be noticeable as a varying model performance between real-time and batch applications.
Learn more about Databricks Machine Learning, Databricks AutoML, and the Databricks Feature Store summarizes the blog post for the official announcement at the Data + AI Summit. The ML platform is initially available as a public preview for the provider’s customers.
Disclaimer: This article is generated from the feed and not edited by our team.
Credit: Google News