Wednesday, April 14, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Data Science

Bridging the Digital Divide – Data Science Central

November 1, 2019
in Data Science
Bridging the Digital Divide – Data Science Central
586
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter
Why is data so important?

Despite being about as prevalent as electricity, it can be difficult to adequately explain how critical data is to the modern world. From business operations to tackling the environmental crisis, data is the key to unlocking insight and developing intelligent solutions across every sector. Although Big Data has been in the news for at least a couple of decades, other types of data are now getting air time as well. Open data, External data, Training data, Dark data – these are new contours to an already multi-faceted conversation, and it’s understandable that the general public is getting overwhelmed.

You might also like

6 Limitations of Desktop System That QuickBooks Hosting Helps Overcome

Robust Artificial Intelligence of Document Attestation to Ensure Identity Theft

Trends in custom software development in 2021

For those of us in the industry, though, there are really only two types of data: the data you have and the data you don’t. What makes things confusing is that the line between these two isn’t always clear. Open data and public data, as their name suggests, should be easy to access. They’re not. Even the data within your organization – data that essentially belongs to you – can be locked away behind operational silos or internal restrictions. People are clamouring to develop AI products, but they can’t be built without a corpus of training data. This often has to be synthetically generated, meaning you have to build the ballpark before playing catch. The upshot is that organizations that have already made the difficult step in establishing a data science division are now facing the very real problem of actually getting access to the data they need to drive insight.

Virtually every new business can leverage data to gain a competitive advantage, but the fact of the matter is that there are a lot of professionals trained in using data who are spending most of their time just finding it. Because of this, the “importance” of data is difficult to quantify. Like the Internet, the limitless uses of data will only be fully understood when everyone – business, government, and individual – can get their hands on it.

A Digital Divide

There is a canyon between those who have data and those who want it. The first five years of the open data movement involved governments of every size publishing whatever data they could, wherever they could. The result was a mess. Standards were lacking, access was difficult, and even when you got it, the data was often one or more of the three Ps of open data: partial, PDFs, or a piece of s***. Due to these limitations, using public data became a skill that belonged to the minority, which was directly contrary to its purpose and to the principles of transparency on which the open data movement was based.

 “The result was a mess. Standards were lacking, access was difficult, and even when you got it, the data was often one or more of the three Ps of open data: partial, PDFs, or a piece of s***.” 

A long suspension bridge over water

The past few years have seen improvements. Governments doubled down on policy and reworked their open data portals to make access easier and provide more use-cases to consumers. This is a good thing. Governments and organizations that are focused on releasing data should iterate in order to streamline the process. But it is not their role to build a bridge across the divide. I’ve sat in the meetings and it’s too easy for well-intentioned governments to get bogged down in the highly idiosyncratic technical details of delivering data to a spectrum of users and use-cases.

At the same time, it’s not the user’s job to figure out how to access data. Average citizens should be data literate, but that doesn’t mean they all have to learn how to run Python scripts. Without the ability to test out their ideas, most consumers aren’t eager to commit the time, energy, and money to figure out ways to get at the data they’re interested in using.

The result of this is that both data producers and data users stay on either side of the digital canyon and theorize about the advantages that one’s access to the other could provide.

There’s a prediction that data will disrupt virtually every industry in the next decade — from business and government to food production and medical research — but this prediction is based on the hypothesis that data is, or will be, fundamentally easy to access. The reality is that this isn’t currently the case, and it won’t be until a bridge is built between those who have data and those who can use it. Put simply, data can’t be useful until it’s usable.

The Myth of Usability

I speak to a lot of people about data and although everyone agrees that it’s important, there are two distinct groups: idealists and realists. Idealists want to use data in hundreds of different ways. They want to develop apps, build visualizations, and plug new feeds into their BI tools. They want AI now. The difference between them and realists – who, incidentally, want the same things – is knowing how difficult it’s going to be.

At this point, there’s little doubt about the value of data in general, but any confidence in this value assumes that it is ready to be consumed. The truth is that whether we’re talking about data that exists inside of an organization or data that’s generated outside of it, there are operational hurdles that need to be overcome before you start to see the kind of game-changing insight that everyone’s excited about.

Take procurement data, for example. Many countries, including Canada, release procurement data as “open” information that’s available for general consumption and use. It’s worth taking a minute to think about how valuable this is. Every day, companies win contracts with the government – the largest, most trusted buyer in the market – and these contracts are made available for public consumption. In an ideal scenario, citizens would be able to use this data to see how their tax dollars are being spent and ask important questions about their government’s purchasing process. How many contracts did multinational corporations win over Canadian companies? What’s the spend on defence? Education? The environment? The business use case is also powerful. A bank can use this data to make more efficient decisions around activities like approving small business loans, or augment their database with the information in order to fine-tune risk ratings.

But this isn’t happening.

A screen with minified code

Why not?

Well, the data that’s released is incompatible with how people want to use it. Currently, in order to query the data or inject it into a model, you have to do one or all of the following:

  • connect to the feed directly and monitor it for updates;
  • normalize the feed into standard formats;
  • roll up subsidiary organizations to parent companies;
  • run sophisticated entity resolution scripts over the data to provide a master record of organizations and the contracts they win; and finally,
  • use some analytics tool to make the millions of records human-readable.

I don’t know about you, but that’s not the kind of tech most people have at their disposal.

The government, despite the great strides in improving the quality of their procurement data, can’t be on the hook for layering all of the above on the feed; it’s their job to get the data out the door. But by the same token, neither organizations nor individuals should have to develop the infrastructure. In both cases, the time, energy, and technological expertise that’s required is prohibitive.

The work that’s necessary to make data like this compatible with the end-user can’t be the job of the provider or the consumer – there needs to be an intermediary set of tools, products, and processes that can automate this process.

The Role of Government

Although governments are increasingly eager to be transparent, they’re not really coached on how to best open up their records. While they have been told that their data is valuable, they cannot know what datasets represent the highest value or how exactly that value will flow downstream. Frequently, municipal governments are having to figure out their process and policies with small, scrappy open data divisions that have to figure out what to release and how to release it. This is a tall order, and it also ensures that the idiosyncratic nature of the data being released by governments isn’t getting fixed any time soon. If, for example, San Francisco releases the city’s building permits, that’s useful for San Francisco. If you can benchmark that data against similar data sets in New York and Chicago you have market indicators you can use to model economic health across the country. When every city does it, you have something else entirely. The point is that more data provides bigger opportunities. Instead of thinking about the individual opportunities that public data provides, it’s helpful to think about open data as a global puzzle, where every municipality and county has a piece to contribute.

The size, structure, and nature of government bodies make them poor candidates for the kind of overarching standardization and aggregation that’s necessary in order for government data to be usable at scale. It’s a predictable problem. Governments act independent of one another because it’s important for them to tailor what they do to better serve their citizens. But when it comes to data, this customization hampers the efficacy of the entire movement; it isolates one government from another, which means that only a fraction of the problems that their data can address are being solved.

Our governments have access to a wealth of data that they should make available, but asking them to also be curators and catalysts of digital best practices is too much to ask.

Bridging the Divide

Despite how hard it can be to get data, the past few years have seen the development of new products and solutions that help individuals and organizations access more data with less effort. DataOps frameworks are becoming commonplace, and data science continues to grow as a discipline. Government strategies have evolved, too. Ontario recently announced the formation of a Digital and Data Task Force designed to help citizens and businesses benefit directly from the data economy. This task force will work alongside the province’s open data policy and will help educate and define best practices to enable people to use data effectively.

Canadian Parliament buildings

In 2013, McKinsey put the value of open data at $3–5 trillion per year, a number that’s trotted out whenever anyone asks about the value of public data in the private sector. While it’s impressive, this number means nothing whatsoever to the average data consumer who can’t figure out how to load a shapefile on their computer. The value of data is obvious and the benefits are enormous, but at ground level the operational hurdles that need to be overcome in order to unlock this value have slowed innovation to a crawl.

In 2018, 98.6% of companies aspired to be a data-driven culture. This near-total agreement that data is the single most important thing to get right is illustrated by the size of data science divisions and the massive appetite for analytics and BI tools in the market.

Behind this number is the reality that the desire to use data to achieve insight is a far cry to actually getting it done, which might explain why, in the same year, Gartner found that 87% of organizations have low business intelligence maturity.

Why the disconnect? It’s the same problem. Data providers push more data into the wild while consumers try to figure out how to use it. Data scientists are stuck in the prep and process phase while business divisions demand results. There is a divide between the people who have data and the people who can use it, and we’ll never uncover the benefit of a data-driven world unless we find a way to build a bridge between the two.

___

An earlier version of this article originally appeared in Public Sector Digest’s Summer 2016 Issue

Originally published on ThinkData Works Blog. 


Credit: Data Science Central By: Lewis Wynne-Jones

Previous Post

S&P 500 Has More Runway for Gains in Short-Term: Stock Market Prognosticator

Next Post

Cyber-attack hits Utah wind and solar energy provider

Related Posts

6 Limitations of Desktop System That QuickBooks Hosting Helps Overcome
Data Science

6 Limitations of Desktop System That QuickBooks Hosting Helps Overcome

April 13, 2021
Robust Artificial Intelligence of Document Attestation to Ensure Identity Theft
Data Science

Robust Artificial Intelligence of Document Attestation to Ensure Identity Theft

April 13, 2021
Trends in custom software development in 2021
Data Science

Trends in custom software development in 2021

April 13, 2021
Epoch and Map of the Energy Transition through the Consensus Validator
Data Science

Epoch and Map of the Energy Transition through the Consensus Validator

April 13, 2021
NetSuite ERP ushering a digital era for SMEs
Data Science

NetSuite ERP ushering a digital era for SMEs

April 12, 2021
Next Post
Cyber-attack hits Utah wind and solar energy provider

Cyber-attack hits Utah wind and solar energy provider

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

ML Ops and the Promise of Machine Learning at Scale
Machine Learning

ML Ops and the Promise of Machine Learning at Scale

April 14, 2021
How to Enter Your First Zindi Competition | by Davis David
Neural Networks

How to Enter Your First Zindi Competition | by Davis David

April 14, 2021
B2B Content Marketing – Facing Challenges
Marketing Technology

B2B Content Marketing – Facing Challenges

April 14, 2021
Five Top Quality APIs
Learn to Code

Five Top Quality APIs

April 14, 2021
Cybersecurity: Victims are spotting cyber attacks much more quickly – but there’s a catch
Internet Security

Cybersecurity: Victims are spotting cyber attacks much more quickly – but there’s a catch

April 14, 2021
Detecting the “Next” SolarWinds-Style Cyber Attack
Internet Privacy

Detecting the “Next” SolarWinds-Style Cyber Attack

April 14, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • ML Ops and the Promise of Machine Learning at Scale April 14, 2021
  • How to Enter Your First Zindi Competition | by Davis David April 14, 2021
  • B2B Content Marketing – Facing Challenges April 14, 2021
  • Five Top Quality APIs April 14, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates