Monday, March 1, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Machine Learning

Kaskada Accelerates ML Workflow with Its Feature Store

February 6, 2020
in Machine Learning
Kaskada Accelerates ML Workflow with Its Feature Store
588
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

There’s a lot of surface area in the typical data science workflow for the purveyors of automation to attack. What moves the needle for the folks at the startup Kaskada is the feature engineering and deployment stage, which it’s seeking to streamline with a new automated feature store.

The typical data science workflow is fraught with inefficiency, according to Kaskada CEO and co-founder Davor Bonaci, who previously was a senior engineer at Google who worked on Apache Beam.

You might also like

Machine Learning Courses Market Overview, Revenue, Industry Verticals, and Forecast Evaluation 2020 to 2026 – NeighborWebSJ

Machine learning could aid mental health diagnoses: Study – ETCIO.com

Google’s deep learning finds a critical path in AI chips

For example, data scientists often will do much of the work in Jupyter, the popular data science notebook, where they will explore the data and identify the key data features that they will use as inputs for their predictive models. They will typically develop these features in Python within the Jupyter environment.

“That’s a really good way to do science,” Bonaci says. “You can visualize things. You can do things with a few lines of code.”

The problem with this approach, however, is that the output of Jupyter notebooks isn’t production-ready code. That leads many organizations to employ teams of engineers whose job is to rewrite the resulting Python into something more scalable and production-ready, such as Scala, that can be deployed within an Apache Spark framework.

This approach is proven and it works, but it’s slower and more expensive than it needs to be, according to Bonaci, who hopes to accelerate the workflow with the software he’s developing at Kaskada, which yesterday announced an $8 million Series A round of funding.

Kaskada is aiming to simplify the engineering worfklow for data scientists (Image source: Kaskada)

Kaskada accelerates the data science in several ways. First, it provides a studio where data scientists can explore the data and define the data features they plan to use in their production machine learning models. It also creates a feature store that houses the pre-defined features until they’re called into use as feature vectors.

The feature store is a critical component of the Kaskada offering, as it simplifies the roll-out of feature vectors into production machine learning models. Instead of fumbling around with code, Kaskada allows developers to call feature vectors from the feature store via an API.

Lastly, the company automatically compiles the layer of coded needed to instantiate the vectors from the feature store. It does this using Scala , which it automatically deploys in containers that run in Kubernetes cloud environments.

This approach not only reduces the odds of something going wrong with a machine learning deployment, but it makes the process more reliable and repeatable as well.

“Instead of having the output of a data scientist being the notebook, we make the data scientist responsible for populating the feature store,” Bonaci says. “So the output is no longer the notebook. It’s actually the computed value, and we give data scientists the experience to work together, to collaborate to populate the feature store. And once they populate the feature store, data scientists can simply query it without rewriting any pipelines.”

This process allows data scientists to deploy features with the “click of the button,” which cuts weeks off the typical model deployment scenario, Bonaci says.

“The need to rewrite notebooks in Spark has gone away,” he says. “They just come and query a simple API from the feature store to get those values out to drive the model and get to prediction…That’s the unique innovation that we are bringing to market.”

This approach does not come without costs (not to mention the actual money that users must pay Kaskada to use their service). Instead of using familiar tools like Juypter and frameworks like Pandas or scikit-learn, the data scientist, for the most part, must work within the confines of the Kaskada environment. And you’re not going to use the Kaskada environment for arbitrary data science work; Bonaci says the system, which uses Apache Cassandra and Akka under the covers, is geared primarily to building recommendation engines and real-time predictions for websites and mobile apps.

“Technically, we’re kind of a compiler between the studio and the feature store,” he says. “We are compiling code from whatever the data scientist defines [and] automatically generating a real-time distributed system. That’s where the rewriting goes away.  We generate automatically a distributed system from what you define in our software.”

But data scientists get other benefits once they select Kaskada. For starters, once the data scientist has used her data science skill to select the features to use in the model, Kaskada will automatically keep the resulting feature vectors (the series of integers that go into the ML inference model) up to date based on incoming data. Hooks to pub-sub systems like Apache Kafka and AWS Kinesis keep machine learning models fresh with the latest streaming data.

Davor Bonaci is the co-founder and CEO of Kaskada

The typical Kaskada customer will have hundreds or thousands of features for each user or business object that it wants to create predictions or recommendations for, Bonaci says. Visualize “a matrix with as many rows as you have users, and it’s computed in real-time based on the stream coming in,” he says.

Another benefit is the reduced need for data scientists to be experts in deploying distributed systems. Because Kaskada handles the packaging, deployment, and management of the production features into the cloud ML environment (or possibly on-prem environments, if the customer is large enough, Bonaci says), that’s one less hard-to-find machine learning engineer that the company needs to hire.

Kaskada is ideal for organizations that just want to let their data scientists be data scientists and not engineers, Bonaci says. “It’s is fully managed service where data scientists come and apply their domain experience, and the things just work,” he says.

Once they define the features in Kaskada, it makes the resulting model development much easier, he says. It’s all about letting data scientists focus on what they’re best at, which is pushing the art of machine learning, Bonaci says.

“We’ve raised the level of abstraction and enabled these expert data scientists to be able to do it themselves, without having to depend on data engineers to rewrite things,” he says. “The data engineer will still have a role, but the data scientist [no longer] has to wait for the data engineer for something to happen to see the result in production. That is kind of the conceptual, next-generation system that we are trying to bring to market.”

Kaskada is based in Seattle, Washington, and its service is currently still in beta. For more info, see www.kaskada.com.

Related Items:

An Open Source Alternative to AWS SageMaker

It’s Time for MLOps Standards, Cloudera Says

Machine Learning Hits a Scaling Bump

Credit: Google News

Previous Post

Parennials: Millennial Parents' Views on Tech and Content

Next Post

Academics steal data from air-gapped systems using screen brightness variations

Related Posts

Machine Learning Courses Market Overview, Revenue, Industry Verticals, and Forecast Evaluation 2020 to 2026 – NeighborWebSJ
Machine Learning

Machine Learning Courses Market Overview, Revenue, Industry Verticals, and Forecast Evaluation 2020 to 2026 – NeighborWebSJ

March 1, 2021
Machine learning could aid mental health diagnoses: Study – ETCIO.com
Machine Learning

Machine learning could aid mental health diagnoses: Study – ETCIO.com

March 1, 2021
Google’s deep learning finds a critical path in AI chips
Machine Learning

Google’s deep learning finds a critical path in AI chips

March 1, 2021
Machine Learning & Big Data Analytics Education Market 2021 Global Industry Size, Reviews, Segments, Revenue, and Forecast to 2027 – NeighborWebSJ
Machine Learning

Machine Learning & Big Data Analytics Education Market 2021 Global Industry Size, Reviews, Segments, Revenue, and Forecast to 2027 – NeighborWebSJ

March 1, 2021
Machine Learning as a Service (MLaaS) Market Analysis Technological Innovation by Leading Industry Experts and Forecast to 2028 – The Daily Chronicle
Machine Learning

Machine Learning as a Service (MLaaS) Market Global Sales, Revenue, Price and Gross Margin Forecast To 2028 – The Bisouv Network

March 1, 2021
Next Post
Academics steal data from air-gapped systems using screen brightness variations

Academics steal data from air-gapped systems using screen brightness variations

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

SolarWinds Blames Intern for Weak Password That Led to Biggest Attack in 2020
Internet Privacy

SolarWinds Blames Intern for Weak Password That Led to Biggest Attack in 2020

March 1, 2021
(Part 2 of 4) How to Modernize Enterprise Data and Analytics Platform – by Alaa Mahjoub, M.Sc. Eng.
Data Science

(Part 2 of 4) How to Modernize Enterprise Data and Analytics Platform – by Alaa Mahjoub, M.Sc. Eng.

March 1, 2021
Machine Learning Courses Market Overview, Revenue, Industry Verticals, and Forecast Evaluation 2020 to 2026 – NeighborWebSJ
Machine Learning

Machine Learning Courses Market Overview, Revenue, Industry Verticals, and Forecast Evaluation 2020 to 2026 – NeighborWebSJ

March 1, 2021
Benefits of Data Integration – Data Science Central
Data Science

Benefits of Data Integration – Data Science Central

March 1, 2021
Machine learning could aid mental health diagnoses: Study – ETCIO.com
Machine Learning

Machine learning could aid mental health diagnoses: Study – ETCIO.com

March 1, 2021
The Bayesian vs frequentist approaches: implications for machine learning – Part two
Data Science

The Bayesian vs frequentist approaches: implications for machine learning – Part two

March 1, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • SolarWinds Blames Intern for Weak Password That Led to Biggest Attack in 2020 March 1, 2021
  • (Part 2 of 4) How to Modernize Enterprise Data and Analytics Platform – by Alaa Mahjoub, M.Sc. Eng. March 1, 2021
  • Machine Learning Courses Market Overview, Revenue, Industry Verticals, and Forecast Evaluation 2020 to 2026 – NeighborWebSJ March 1, 2021
  • Benefits of Data Integration – Data Science Central March 1, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates