Thursday, February 25, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Technology Companies

How data becomes knowledge, Part 1: From data to knowledge

January 12, 2019
in Technology Companies
How data becomes knowledge, Part 1: From data to knowledge
589
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

Credit: IBM

How data becomes knowledge, Part 1

You might also like

The Case of the Miscoded Credentials – IBM Developer

Two new Call for Code for Racial Justice projects just went open source – IBM Developer

Introducing the technology preview of IBM API Hub on IBM Developer, where you can discover, try, adopt, and consume APIs from IBM and our ecosystem partners – IBM Developer


Content series:

This content is part # of # in the series: How data becomes knowledge, Part 1

https://www.ibm.com/developerworks/library/?series_title_by=**auto**

Stay tuned for additional content in this series.

This content is part of the series:How data becomes knowledge, Part 1

Stay tuned for additional content in this series.

Over the past few years, information science has made significant leaps
forward. As local servers gave ground to cloud services, SQL databases and
data tables began to migrate toward NoSQL and key-value data stores. Then
came the advent of big data and the associated scaling technologies to
handle the large volumes, varieties, and velocities of data.

The DIKW model

The DIKW model

  • Data: Data is a collection of facts,
    signals, or symbols. In this form, it might be raw, inconsistent,
    or unorganized. As such, it is not useful.
  • Information: Information is a collection
    of data that is arranged and ordered in a consistent way. Data in
    the form of information becomes more useful because storage and
    retrieval are easy.
  • Knowledge: Knowledge is a collection of
    information with its associated context. The context is in the
    form of relationships between information sets collected over
    time. Knowledge is the outcome of experience working with a pool
    of information.
  • Wisdom: Wisdom is the ability to select
    the best way to reach the desired outcome based on knowledge.
    Wisdom is the outcome of experience from or knowledge of earlier
    attempts to reach a successful outcome.

Major advances in hardware and software made all this possible. Data
storage is not expensive; therefore, it’s possible to store vast amounts
of data cheaply.

Data analytics makes sense of all this data and produces information from
it. Based on this information, you can make decisions and take actions.
The result is a corresponding evolution in the field of data analytics.
Cognitive processing such as machine learning and deep learning now
augment analytics.

Analysts need to clean and check the validity of input data before using it
for analysis. Structured data allows for easy retrieval, and so raw data
must be prepared and formatted before data analysis can begin. The
data-information-knowledge-wisdom (DIKW) model is useful for understanding
how raw data turns into useful information, and then into knowledge, and
finally wisdom.

About this series

There are three articles in the How data becomes knowledge series:

  • From data to knowledge: This article traces the
    path from raw data to stored knowledge. It identifies various data
    sources and the differences between structured and unstructured data.
    Then, it identifies what makes data valuable before applying the DIKW
    model to data science.
  • Data lakes and data swamps: This article
    introduces the terminology surrounding data lakes and data warehouses,
    explores the evolution and benefits of data lakes, and explains how the advent of machine learning is a compelling reason to move to data lake architectures. Managing data by
    structuring and cataloging it makes it more useful and valuable.
    Knowing and being able to trust the data source and data are crucial
    factors in ensuring high-quality data. Data governance provides help
    in this regard.
  • Extracting dark data: This article
    of the series discusses the factors that lead to the creation of dark data, the steps you can take to curate and manage data more effectively, and the methods you can use to extract and use dark data after the fact. Of all data, 90% is
    unstructured, which makes efficient querying a problem. Machine
    learning helps bring structure and therefore efficiency to this dark
    data. Relational data is more valuable because it can produce better
    insights.

Data sources

Raw data comes from diverse sources. A significant source of data continues
to be the traditional relational database. Another major source of data is
machine-generated and real-time data, such as from Internet of Things
(IoT) devices. Data mining tools scrape websites or social media and
generate data. Machines also generate data in the form of transactions and
log files.

Human interactions over digital media produce data in the form of text and
email messages, images, and videos. The human brain is adept at extracting
information from these diverse media formats. In contrast, this kind of
data is a challenge for computers to understand. Machines tend to produce
structured data, while humans tend to produce unstructured data.

Structured and unstructured
data

Structured data has a high degree of organization, which makes storing it
in a relational database easy. Simple queries and search algorithms can
efficiently retrieve this data, which makes processing structured data
with computers easy and efficient.

In contrast, unstructured data lacks a machine-readable structure. Humans
are currently better and more efficient than machines at reading and
extracting such data, but the effort is both time-consuming and energy
expensive. Human-centric processes are also prone to errors. So, what
makes data valuable, and how can you apply the DIKW model?

What makes data valuable?

Data is typically a jumble of raw facts, and users need to sift through it
to properly interpret and organize the data. Only then does the data
become useful. Data also comes in multiple formats. For example, images
and videos can hold a lot of data that requires interpretation to extract
information from them. The process of reviewing and filtering data for
relevant facts is costly in terms of time and resources. This process is
also subjective, inconsistent, and error-prone.

Information, in contrast, is a collection of consistently organized and
structured facts. Users invest less time and energy finding relevant
facts. They can easily find a category of relevance or interest within the
information. This makes information more valuable than raw data.

Knowledge is the application of information to answer a question or
solve a problem. In other words, information with context or meaning is
knowledge. An earlier successful outcome serves as the basis for assigning
this context to information. Thus, knowledge depends on the memory of and
learning successful outcomes, and so the process of converting
information to knowledge is deterministic. Again, this process is costly
in terms of time and resources; therefore, knowledge is more valuable than
simple information.

When data undergoes data analysis, it becomes more relevant, useful, and
valuable. Real-world problems don’t have simple solutions: To solve
such problems, you must apply information from multiple contexts. Combining
data sources helps provide diverse contexts that are useful in real-world
problem solving and decision making. In short, data becomes valuable when it
meets the following criteria:

  • It is available promptly.
  • It is concise, well organized, and relevant.
  • It has meaning and context based on experience.
  • It is an aggregate of multiple data sources.

Data is a valuable commodity when it can reduce the time, effort, and
resources required to solve problems and help make sound decisions.

DIKW model variants

Many variants of the DIKW model exist. One variant, proposed by Milan
Zeleny in 1987, is DIKWE, which adds an apex layer for
enlightenment. Another variant, proposed by Russell Ackoff in
1989, is DIKUW, which adds an intermediate layer for
understanding. Some experts model this as DIKIW,
where the second I stands for insight or
intelligence.

The DIKW model helps us describe methods for problem-solving or decision
making. Although developed before the advent of machine learning, it still
models many concepts used in data science and machine learning.

Knowledge is the most valuable distillation of data, and although knowledge
gives you the means to solve a problem, it doesn’t necessarily show you
the best way to do so. The ability to pick the best way to reach the
desired outcome comes from experience gained in earlier attempts to reach
a successful solution.

Wisdom is the ability to pick the best choice leading to a successful outcome.
People gain wisdom through experience and knowledge, some of which comes
from:

  • Developing an understanding of problem-solving methods
  • Developing insights by analyzing data and information for a given
    context
  • Gathering intelligence from other people solving the same
    problems

The many variations of the DIKW model now begin to make sense.

Application to data science and
machine learning

You’ve already seen that when people perform repetitive tasks, those tasks
are error-prone, inconsistent, and subjective. You have also noticed that
machines do not perform well when dealing with unstructured data. Humans
are adept at interpreting unstructured data, evaluating options and risks,
and deciding a course of action in split seconds.

A machine running traditional algorithms struggles to do the same in real
time primarily because programming becomes increasingly complex. It is
time-consuming to evaluate many options and navigate decision trees in a
serial manner. Parallel algorithms are an alternative, but they require a
lot of processing power. However, even with this added power, these
algorithms cannot easily adapt and deal with the uncertainty of real-world
problems, especially when data is unstructured.

Neural networks modeled on human brain cells have been around for decades
but have suffered from a lack of suitable computer processor architecture
to exploit their strengths. The evolution of the graphics processing unit (GPU)
architecture for general-purpose computing has allowed neural networks to
come into their own. This evolution has led to a surge in the use of
machine learning to deal with unstructured data, with considerable
success.

Figure 1 shows how you can adapt the DIKW model to
data science. The darker layers show the traditional DIKW model; the lighter layers
show the processes that lead to the distillation of data to the next-higher layer.

Figure 1. The DIKW model applied to data science

Triangle of the model, with data at the bottom       and model at the top

Triangle of the model, with data at the bottom and model at the top

Traditional data science methods can handle the first process layer:
converting raw data into information. Machine learning can now help
extract knowledge from information. Machine learning algorithms find
context in information by recognizing patterns, grouping, or classifying
information. Data scientists create machine learning models by using
manual optimization and tweaking to achieve the best outcomes, selecting
the model best suited to the specific task. However, the advent of deep
learning means that machines can perform these tasks autonomously as well.

Deep learning

Deep learning is a specialized subset of machine learning inspired by
neuroscience and the working of the human brain. Deep learning algorithms
differ from other machine learning algorithms in that they use many layers
of several types of neural networks. These layers form a structured
hierarchy and, just like a human brain, pass the output of an earlier
layer to the next layer.

This cascade of layers gives deep learning networks the ability to learn
abstract concepts and perform more complex tasks than simple, single-task
pattern recognition and classification. Deep learning algorithms can use
both supervised and unsupervised learning and often use a hybrid of these
learning methods, an approach that makes them adaptive when used in
real-world applications.

When used for real-time speech, image, and video processing applications,
deep learning algorithms can deal with the uncertain or incomplete inputs
often caused by noisy environmental factors. As a result, they have much
better efficiency than simple machine learning algorithms.

Going forward

Data is a valuable commodity — when it can reduce the time, effort,
and resources required to solve problems and help us make sound decisions.
Machines can efficiently deal with structured data, but 90 percent of all
data is unstructured, including texts, emails, images, and video.

Humans are better suited than machines in dealing with unstructured data,
but humans are error-prone, inconsistent, and subjective when they perform
repetitive tasks, such as extracting information from unstructured data
and storing it as structured data (data-entry). The process is also expensive in terms
of time, resources, and energy consumption.

The DIKW model helps us understand the process behind the conversion of data into
information and knowledge. Machine learning techniques help make the extraction
of knowledge easier to perform or even autonomous through adaptation and
optimization of successful outcomes. Therefore, deep learning makes it possible to
augment data analysis and significantly reduce the time, effort, and resources
required to solve problems and help us make sound decisions. Part 2 of this series shows how data lakes help
speed up and reduce the costs of data ingestion by allowing storage of
large volumes of multiformat data.


Downloadable resources

Related topics

Credit: IBM

Previous Post

Artificial intelligence vs. the hackers

Next Post

With machine learning, conservative financial industry shows its progressive side

Related Posts

Introducing the technology preview of IBM API Hub on IBM Developer, where you can discover, try, adopt, and consume APIs from IBM and our ecosystem partners – IBM Developer
Technology Companies

The Case of the Miscoded Credentials – IBM Developer

February 24, 2021
Six courses to build your technology skills in 2021 – IBM Developer
Technology Companies

Two new Call for Code for Racial Justice projects just went open source – IBM Developer

February 20, 2021
Introducing the technology preview of IBM API Hub on IBM Developer, where you can discover, try, adopt, and consume APIs from IBM and our ecosystem partners – IBM Developer
Technology Companies

Introducing the technology preview of IBM API Hub on IBM Developer, where you can discover, try, adopt, and consume APIs from IBM and our ecosystem partners – IBM Developer

February 18, 2021
Six courses to build your technology skills in 2021 – IBM Developer
Technology Companies

Project OWL announces new release of ClusterDuck Protocol to build emergency mesh networks – IBM Developer

February 17, 2021
Six courses to build your technology skills in 2021 – IBM Developer
Technology Companies

reasons to join open source Project Alvearie and solve healthcare’s toughest problems – IBM Developer

February 17, 2021
Next Post
With machine learning, conservative financial industry shows its progressive side

With machine learning, conservative financial industry shows its progressive side

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Google funds Linux kernel developers to work exclusively on security
Internet Security

Google funds Linux kernel developers to work exclusively on security

February 25, 2021
Online Trackers Increasingly Switching to Invasive CNAME Cloaking Technique
Internet Privacy

Online Trackers Increasingly Switching to Invasive CNAME Cloaking Technique

February 25, 2021
Off-chain reporting: Toward a new general purpose secure compute framework by Chainlink
Big Data

Off-chain reporting: Toward a new general purpose secure compute framework by Chainlink

February 25, 2021
Even Small Companies Use AI, Machine Learning
Machine Learning

Even Small Companies Use AI, Machine Learning

February 25, 2021
How Is Machine Learning Revolutionizing Supply Chain Management | by Gina Shaw | Feb, 2021
Neural Networks

How Is Machine Learning Revolutionizing Supply Chain Management | by Gina Shaw | Feb, 2021

February 25, 2021
Reaching customers at scale without losing their trust: Wednesday’s daily brief
Digital Marketing

Reaching customers at scale without losing their trust: Wednesday’s daily brief

February 25, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Google funds Linux kernel developers to work exclusively on security February 25, 2021
  • Online Trackers Increasingly Switching to Invasive CNAME Cloaking Technique February 25, 2021
  • Off-chain reporting: Toward a new general purpose secure compute framework by Chainlink February 25, 2021
  • Even Small Companies Use AI, Machine Learning February 25, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates