Tuesday, March 9, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Big Data

Tableau makes Johns Hopkins Coronavirus data available for the rest of us

March 23, 2020
in Big Data
Tableau makes Johns Hopkins Coronavirus data available for the rest of us
585
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

Johns Hopkins’ COVID-19 Global Cases Interactive Dashboard 

You might also like

Streamlining data science with open source: Data version control and continuous machine learning

Microsoft Ignite Data and Analytics roundup: Platform extensions are the key theme

At Microsoft Ignite, expanding reach for Azure ML & Purview, Power BI Premium


Credit: Center for Systems Science and Engineering at Johns Hopkins University

I’m a data person and, if you’re reading this, you probably are too. I like to be informed about, and maintain control over situations. Data helps me with the former and at least provides the illusion of attaining the latter. While the Coronavirus/COVID-19 crisis is impossible to control, and the figures around it can be stress-inducing, analyzing the data and understanding it a little better does provide some level of understanding that can help.

Johns Hopkins provides Coronavirus data

With that in mind, you may be aware that Johns Hopkins University’s (JHU’s) Center for Systems Science and Engineering (CSSE) maintains a continuously updated dashboard of the virus’ spread, based on officially reported case counts from around the world, and pictured in the figure above. While that dashboard itself is useful, pulling the data down from Hopkins’ GitHub repo and doing your own analysis on it can be a little tricky.

With this in mind, Tableau took a look at that data, did its own data engineering/ETL work on it and has published the output of that effort as its COVID-19 Data Resource Hub. Within the Resource Hub site, the consolidated data is available in Tableau’s own .hyper and .tde formats, and also available in Google Sheet and CSV formats, making it compatible with BI tools beyond Tableau.

Tableau makes it more accessible

On Friday, I spoke with Steve Schwartz, Director of Public Affairs at Tableau, and Sarah Goehri, Tableau’s Corporate Communications Manager. With each of us working from home, talking to each other via headsets and Web cams, Schwartz explained to me that with the company’s establishment of Tableau Foundation, it now has a track record working with data sets surrounding global health subject matter (more info on that here). Because of that, and because it found the JHU data difficult to work with in its raw form, Tableau decided to do the data engineering work and make the data available to everyone.

Tableau has published a rather simple “starter dashboard” based on the data, and made it available on Tableau Public. It has also set made the .hyper, .tde and CSV versions of the data set available through online data catalog platform data.world. While this creates an “extra hop” to get to the data, access is still relatively straightforward and enhances its availability on all platforms – not just Tableau. It also facilitates collaboration and discussion around use of the data set itself, something data.world explicitly focuses on.

Catalog shopping with data.world

When you visit the data.world repository for Tableau’s version of the JHU Coronoavirus data, you’ll need to sign in. Assuming you don’t already have a data.world account, you can connect via Google, Facebook, GitHub or Twitter credentials, then change the password on your data.world account to something unique and secure.

Once that’s done, you can enable integration with your BI platform via the “Open in App” option, then use the data.world connector for one of the supported platforms (including Microsoft Power BI, Excel, MicroStrategy, Google Data Studio, Plot.ly, Jupyter notebooks, RStudio and many others) to gain access to the data. That connection process will require you to supply owner, dataset id and query information that data.world will present to you for simple copying and pasting into the connection dialog that is invoked.

Because my own skill set in Microsoft Power BI is much stronger than it is for other tools, I chose to analyze the data there.  Power BI Desktop is a free download and includes a beta data.world connector. Power BI Desktop runs only on Windows, though it can be hosted within other operating systems via certain virtualization platforms. (For example, Mac users can run Power BI Desktop using Parallels).

Measures, dimensions and granularity

Once you’re connected to the data, you’ll see its schema is relatively straightforward. The granularity of the data is at the date level, per geography – either country_region or, for certain countries, province_state (in certain cases, the granularity may drill down even a bit lower – see details below).

The data set’s measures are “cases” and “difference,” which provide cumulative and daily cases totals, respectively, per day, for each tracked geography. Case_type is an in important dimension, as it lets you filter or break down by active cases, deaths, recoveries, and total confirmed cases. To avoid double counting, you should either look only at confirmed cases, or else look only at one or all of the other types (since rows with “Confirmed” cases essentially represent the sum of corresponding rows with case_type of “Active,”http://www.zdnet.com/”Deaths” and “Recovered”).

The country_region and province_state names will let you map the data, but if you’d prefer not to use location names, “lat” and “long” columns provide latitude and longitude data that can be used as well.  Another column, simply called “location” provides lat/long data together in POINT(long, lat) format.  The latest_date column reports data of the data set’s the most recent data and prep_flow_runtime would seem to provide the date and time that Tableau’s data prep process was run.

Surprisingly, I found that using lat/long sometimes provides multiple data points per province_ state, ostensibly including a generic data point and for the entire area as well as some more location-specific ones, the latter often having zero or a relatively small number of cases. I would advise exercising caution around this feature in the data set’s granularity.

Visualizing the data

Once you’ve connected to and loaded the data, visualizing it is relatively straightforward. While it’s not clear whether that’s a blessing or a curse, it does help certain things become clear. I filtered the data to exclude rows with case_type = Confirmed so I could perform analysis across data with case-type values of Active, Deaths and Recovered.

I share some of these analyses with you below. Please note that Tableau provides the data on an as-is basis, and I do the same with my analyses. None of the analyses presented below has been rigorously vetted, and they’re current only to Saturday, 3/21/2020, based on values in the latest_date column in the data set.  

That said, depending on your tolerance for news, you may wish to keep reading, or else stop here.

Let’s start with the obvious. Here’s a bubble map aggregating all novel Coronavirus active cases in the US (including Alaska, Hawaii, Puerto Rico and Guam):

us-active-cases-map.png

Coronavirus active cases in the US, including all 50 states, Puerto Rico and Guam

A simple combo chart of the cases and difference columns by the date column shows just how much we need to flatten the proverbial curve. Here’s the chart for New York State, with total active cases plotted as a line and new active cases as columns:

total-and-new-cases-in-ny.png

Total and new active cases, by date, for New York State. Line shows total case count; columns visualize number of new cases, each day, starting on March 8th, 2020.

We’re used to seeing maps and plots like these in news coverage. They can be intimidating, overwhelming and, honestly, they’re not very detailed. What I found most useful about the Tableau-supplied data set was that it gave me the ability to go deeper and perform more specific analyses.

For example, you can also look at the countries with the highest number of cases, breaking down by case­_type. I did this by creating a stacked bar chart, sorted by case count in descending order, showing the seven most affected countries, excluding China (since its case count so dominates the others). When I ran this on Saturday, the US was in seventh place (again, excluding China), behind France. When I refreshed the data this morning, the US had unfortunately pulled into sixth place, with France now trailing:  

major-affected-countries.png

Stacked column chart of active cases, deaths and recoveries, for highest impacted countries, excluding China.

You can drill down to the individual country level, too. Here’s a pie chart of active cases by province_state, filtered by country, that reveals the ratio of cases in Hubei province, relative to others in China:

china-cases-by-province.png

Active cases in China, by province, with Hubei dominating 

And here’s the corresponding chart for the US, with New York State, Washington State and California, dominating, in that order:

us-cases-by-state.png

US Cases, by state. As of March 21st’s data, New York State accounted for more than a third of them.  

Adding a timeline filter and setting the effective date back 10 days, to March 11th, can show how Washington State had the most number of cases at that time. Ten days makes a big difference, as we’ve learned, and as this visualization shows.

us-cases-by-state-on-march-12th.png

Time machine rewind: US Coronavirus active cases by state, as of March 11th

We can analyze the data using a more free form approach, as well. Power BI’s Q&A feature allows querying the data in relatively plain English. To test this, I asked Q&A to show me the number of newly reported deaths, per day, in the US, starting on March 1st, by entering “difference by date for Deaths and US as area chart on or after March 1 (date).”  Here’s what I got back:

us-deaths-post-march-1st.png

Coronavirus deaths in the United States each day from March 1st through 21st, 2020 (inclusive).

I should point out that the data I’m getting back from the Hopkins data set does not seem to match up with the data reported in New York Governor Andrew Cuomo’s news conference this morning (Cuomo’s numbers were likely more recent and therefore, unfortunately, higher). Though nothing seems available yet, I am keeping my eye on the open data sites for both the City and State of New York, to see if I could add their data to my analysis, in the future.

I’ll do my best to end on a positive note.  Here’s what I got back when I asked Q&A to show me cumulative, global “Recovered cases by date”:

recovered-cases-by-date.png

Recovered cases, globally, over time

We’re getting close to 100,000 recoveries. Let’s hope this curve gets steeper, not flatter, until the crisis has passed.

Credit: Zdnet

Previous Post

Simplilearn offers free learning

Next Post

Big data courses sharpen professionals’ analytical skills - Mon, March 23 2020

Related Posts

Streamlining data science with open source: Data version control and continuous machine learning
Big Data

Streamlining data science with open source: Data version control and continuous machine learning

March 4, 2021
Microsoft Ignite Data and Analytics roundup: Platform extensions are the key theme
Big Data

Microsoft Ignite Data and Analytics roundup: Platform extensions are the key theme

March 3, 2021
At Microsoft Ignite, expanding reach for Azure ML & Purview, Power BI Premium
Big Data

At Microsoft Ignite, expanding reach for Azure ML & Purview, Power BI Premium

March 3, 2021
IBM Cloud Satellite goes GA
Big Data

IBM Cloud Satellite goes GA

March 1, 2021
DataStax Astra goes serverless | ZDNet
Big Data

DataStax Astra goes serverless | ZDNet

February 25, 2021
Next Post
Big data courses sharpen professionals’ analytical skills – Mon, March 23 2020

Big data courses sharpen professionals’ analytical skills - Mon, March 23 2020

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Microsoft Exchange zero-day attacks: 30,000 servers hit already, says report
Internet Security

Microsoft Exchange zero-day attacks: 30,000 servers hit already, says report

March 9, 2021
Is investing in AI the highest ROI opportunity?
Data Science

Is investing in AI the highest ROI opportunity?

March 9, 2021
Dalhousie researchers use machine learning to track COVID-related emotions on social media | Provincial | News
Machine Learning

Dalhousie researchers use machine learning to track COVID-related emotions on social media | Provincial | News

March 9, 2021
Ezviz C3X outdoor security camera review: Simple setup, superb features Review
Internet Security

Ezviz C3X outdoor security camera review: Simple setup, superb features Review

March 9, 2021
Operationalizing AI – Introduction to the ModelOps Pipeline
Data Science

Operationalizing AI – Introduction to the ModelOps Pipeline

March 9, 2021
SCA invests in Australian AI and machine learning company
Machine Learning

SCA invests in Australian AI and machine learning company

March 9, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Microsoft Exchange zero-day attacks: 30,000 servers hit already, says report March 9, 2021
  • Is investing in AI the highest ROI opportunity? March 9, 2021
  • Dalhousie researchers use machine learning to track COVID-related emotions on social media | Provincial | News March 9, 2021
  • Ezviz C3X outdoor security camera review: Simple setup, superb features Review March 9, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates