Monday, April 12, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Data Science

How Data Virtualization Can Remove the Bottlenecks to Data Science

January 26, 2021
in Data Science
How Data Virtualization Can Remove the Bottlenecks to Data Science
585
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

Many of the stages in a typical data science lifecycle have more to do with data than they do with science. Before the data scientist can actually engage in science, there are several steps they must first complete:

You might also like

Data Center Infrastructure Market is Projected to Reach USD 100 Billion by 2027

Interpretive Analytics in One Picture

Job Scope For MSBI In 2021

 

  1. Find where the right data is located.
  2. Access that data, which requires an understanding of the bureaucracy of the organization in terms of ownership, credentials, access methods, and access technologies.
  3. Transform the data into a format that is easy to use.
  4. Combine that data with other data in other sources, which may be formatted differently.
  5. Profile and cleanse the data to eliminate incomplete or inconsistent data points.

 

Complicating issues is that 87 percent of data science projects never make it into production. The reason behind this high failure rate is because the variety of data sources, diversity of data types and data volumes make it a complex task to access the right data at right time by data scientists.

 

Eliminating the Bottlenecks

While a number of technologies are competing to bridge the illusive gap between data and the data scientist, one modern data integration and data management technology is solving this issue using a novel approach. Rather than physically moving data so that it can be discovered, accessed, and leveraged by the data scientist, data virtualization (DV) provides data scientists with a real-time view of the data in its existing locations.

 

Architecturally, data virtualization occupies a layer between the different data sources and the consuming applications. The DV layer itself contains no data; it only contains the metadata necessary to access the different data sources. And while the technology does not eliminate data preparation activities, it greatly accelerates them, effectively eliminating the key bottlenecks in the data science lifecycle.

 

It’s important to recognize some of the ways that data virtualization can remove the log jam in a typical data science workflow and how it can be used to overcome the four challenges of the typical data science lifecycle:   

 

  • Identifying Useful Data: DV provides data scientists with a single unified SQL interface for accessing all data including physical data lakes, Spark or Presto implementations, APIs delivering Salesforce and/or social media data, or flat and/or JSON files. Some DV solutions also provide data catalog capabilities, which enable data scientists to discover data with search-engine-like features and also recommend or rate different data sets. 

 

  • Modifying Data into a Useful Format: Data virtualization facilitates the combination of data from different sources using SQL in joins, aggregations, and transformation. In some data virtualization solutions, they also provide administrative tools that offer drag-and-drop simplicity. Data scientists can leverage their own notebooks, such as Jupiter, for such operations, or use the notebooks included in some DV offerings. In either case, these notebooks provide highly flexible, visual interfaces and intuitive features like automatically generated charts.

 

  • Analyzing Data: With data virtualization, analysis can begin almost immediately at the point of access, and when identifying useful data or modifying it into different formats, the data scientist is already executing queries.

 

  • Preparing and Executing Data Science Algorithms: Advanced DV solutions provide query optimizers, which streamline query performance using techniques such as maximizing the push-down of processes to the sources. The optimizer might push down only a part of the operation, depending on the best expected results. DV can also accelerate model scoring and provide frameworks like Python, for example, to automatically publish models as REST APIs.

 

  • Sharing Results with Business Users: Leveraging a data catalog as part of a data virtualization implementation, data scientists can share their queries with other team members, for a more collaborative, iterative workflow. Data scientists can execute filters or aggregations and share them with others to see if they are on the right track. At any time in the workflow, data scientists can ask for feedback regarding queries-in-process. Once the models are in place, and the results are ready, data virtualization provides different ways of sharing that information with business users. The DV solution might use its native driver to deliver the data directly to a specific application like Tableau, MicroStrategy, or Power BI. Users of those tools would connect to the data virtualization server and see the results directly in their chosen tool.

 

Data Virtualization and the Data Science Lifecycle

Data science can be streamlined by eliminating a number of key bottlenecks, all of which have to do with data. Fortunately, data virtualization is a technology that has proven that it can remove all of them. The technology can be strategically deployed at all of the critical phases of the data science lifecycle, accelerating data science initiatives with real-time access to disparate sources of data and enabling the business to rest easier knowing that decisions are being made using complete and proven data  

 

About the Author: Paul Moxon is the VP Data Architectures and Chief Evangelist at Denodo, a leading provider of data virtualization software. Paul has over 30 years of experience with enterprise middleware technologies with leading software companies, such as BEA Systems and Progress Software.  For more information visit https://www.denodo.com or https://twitter.com/denodo.

 

 

 


Credit: Data Science Central By: Paul Moxon

Previous Post

SettleMint looks to blockchain self-sovereign identity with IdentiMint

Next Post

That cute robot cop can instantly work out who you are

Related Posts

Data Center Infrastructure Market is Projected to Reach USD 100 Billion by 2027
Data Science

Data Center Infrastructure Market is Projected to Reach USD 100 Billion by 2027

April 12, 2021
Interpretive Analytics in One Picture
Data Science

Interpretive Analytics in One Picture

April 12, 2021
Job Scope For MSBI In 2021
Data Science

Job Scope For MSBI In 2021

April 11, 2021
Leveraging SAP’s Enterprise Data Management tools to enable ML/AI success
Data Science

Leveraging SAP’s Enterprise Data Management tools to enable ML/AI success

April 11, 2021
Vue.js vs AngularJS Development in 2021: Side-by-Side Comparison
Data Science

Vue.js vs AngularJS Development in 2021: Side-by-Side Comparison

April 10, 2021
Next Post
That cute robot cop can instantly work out who you are

That cute robot cop can instantly work out who you are

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

IIT Hyderabad Offers Interdisciplinary PhD in Artificial Intelligence, Machine Learning and Information Theory
Machine Learning

IIT Hyderabad Offers Interdisciplinary PhD in Artificial Intelligence, Machine Learning and Information Theory

April 12, 2021
Ransomware: The internet’s biggest security crisis is getting worse. We need a way out
Internet Security

Ransomware: The internet’s biggest security crisis is getting worse. We need a way out

April 12, 2021
Data Center Infrastructure Market is Projected to Reach USD 100 Billion by 2027
Data Science

Data Center Infrastructure Market is Projected to Reach USD 100 Billion by 2027

April 12, 2021
Hawaiʻi’s Keck Observatory Aids in Discovery of Rare “Quadruply Imaged Quasars”
Machine Learning

Hawaiʻi’s Keck Observatory Aids in Discovery of Rare “Quadruply Imaged Quasars”

April 12, 2021
Interpretive Analytics in One Picture
Data Science

Interpretive Analytics in One Picture

April 12, 2021
AI and Machine Learning Driven Contract Lifecycle Management for Government Contractors
Machine Learning

AI and Machine Learning Driven Contract Lifecycle Management for Government Contractors

April 12, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • IIT Hyderabad Offers Interdisciplinary PhD in Artificial Intelligence, Machine Learning and Information Theory April 12, 2021
  • Ransomware: The internet’s biggest security crisis is getting worse. We need a way out April 12, 2021
  • Data Center Infrastructure Market is Projected to Reach USD 100 Billion by 2027 April 12, 2021
  • Hawaiʻi’s Keck Observatory Aids in Discovery of Rare “Quadruply Imaged Quasars” April 12, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates