Monday, March 1, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Big Data

Data meets science: Open access, code, datasets, and knowledge graphs for machine learning research and beyond

February 17, 2021
in Big Data
Data meets science: Open access, code, datasets, and knowledge graphs for machine learning research and beyond
586
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

Science and data are interwoven in many ways. The scientific method has lent a good part of its overall approach and practices to data-driven analytics, software development, and data science. Now data science and software lend some tools to scientific research.

Special feature


You might also like

DataStax Astra goes serverless | ZDNet

Off-chain reporting: Toward a new general purpose secure compute framework by Chainlink

Cutting-edge Katana Graph scores $28.5 million Series A Led by Intel Capital


Turning Big Data into Business Insights

Businesses are good at collecting data, and the Internet of Things is taking it to the next level. But, the most advanced organizations are using it to power digital transformation.

Read More

Science, data, and data science

“To succeed at becoming a data-driven organization, your employees should always use data to start, continue, or conclude every single business decision, no matter how major or minor”.

That quote belongs to Ashish Thusoo, author of the DataOps book, founder of Qubole, and one of the people who built the data-driven culture in Facebook as early as 2007.

As we noted in our 2017 coverage of DataOps in conversation with Thusoo, to anyone with a science background, this should sound familiar. It’s the quintessence of the scientific method: developing hypotheses and putting them to the test with data.

It’s clear how data-driven culture, and even software practices like agile, which is all about iterative development, have borrowed from science. Now an emergent ecosystem of solutions centered around scientific research and publication may be about to repay the loan.

annie-spratt-dank9gjvdy-unsplash.jpg

The interplay between science and data is a long-standing one. Now it’s time data repays its debt to science. (Photo by Annie Spratt on Unsplash)

http://www.zdnet.com/

Traditionally, scientific research has relied on peer review. The peer-review and publication process can take anywhere from a few months to a few years to complete. In addition, the business model of many scientific publishers does not make research accessible to everyone.

To make research readily available to as many people as possible as soon as possible, many researchers choose to publish their work on pre-print repositories like Arxiv or Zenodo. Pre-prints solve the open access issues, as they are immediately accessible for free.

The reproducibility crisis and artificial intelligence

Most pre-prints will be revised, in minor or major ways, while others may not be published at all. But even for the ones that do go through the review and publication process successfully, an equally important issue remains: Reproducibility.

Reproducibility is a major principle of the scientific method. It means that a result obtained by an experiment or observational study should be achieved again with a high degree of agreement when the study is replicated with the same methodology by different researchers.

According to a 2016 Nature survey, more than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments.

This so-called reproducibility or replication crisis has not left artificial intelligence intact either. Although the writing has been on the wall for a while, 2020 may have been a watershed moment.

That was when Nature published a damning response written by 31 scientists to a study from Google Health that had appeared in the journal earlier.

Critics argued that the Google team provided so little information about its code and how it was tested that the study amounted to nothing more than a promotion of proprietary tech.

As opposed to sometimes obscure research, AI has the public’s attention and is backed and capitalized by the likes of Google. Plus, AI’s machine learning subdomain with its black box models makes the issue especially pertinent. Hence, this incident was widely reported on and brought reproducibility to the fore.

Reproducible research, code, data, and graphs

Enter Papers with Code. Papers with Code is another repository for research, with its mission statement citing the creation of a free and open resource with machine learning papers, code, and evaluation tables as its goal. It highlights trending machine learning research and the code to implement it.

Papers with Code was founded by Robert Stojnic and Ross Taylor in 2018. Stojnic and Taylor have joined Facebook AI in 2019. Since then, the team has grown, they have partnered with Arxiv, and expanded to more disciplines.

The latest addition to Papers with Code’s arsenal is data. The repository now indexes 3,000+ research datasets from machine learning. Users can now find datasets by task and modality, compare usage over time, and browse benchmarks.

Also, integration with schema.org, and therefore wider discoverability and availability of those datasets via Google’s dataset search, seems to be in the roadmap.

As far as reproducible research goes, we should also mention open-source technology by eLife that lets authors publish Executable Research Articles, treating live code and data as first-class citizens. And the good news doesn’t end there.

1-9slwqghev0kzwex9ehrlmw.gif

Connected Papers is the latest addition to an emerging ecosystem for research

Another significant boost to research in any domain comes from the ability to find and explore relevant work. We have seen for example how knowledge graphs have been used to do precisely that for COVID-19 related research.

Connected Papers is a free visual tool that helps researchers and applied scientists find and explore papers relevant to their field of work, in any domain. It creates a graph for each paper in its repository, by analyzing about 50,000 papers and selecting the few dozen with the strongest connections to the origin paper.

On Feb. 3, Connected Papers also announced a partnership with Arxiv. Now every paper page on Arxiv will link to a graph of Connected Papers. Interestingly, Connected Papers arranges papers according to their similarity. That means that even papers that do not directly cite each other can be strongly connected and very closely positioned.

The COVID GRAPH and Open Research Knowledge Graph (ORKG) teams have focused on COVID-19, and emphasized annotation and structure, respectively. Connected Papers seems to expand coverage, and emphasize algorithmic similarity.

Towards a better research ecosystem

Open access, discoverability, reproducibility, code, datasets, and knowledge graphs. This is all good news for research, and machine learning research too, obviously. It seems like steps towards a healthier, more productive research ecosystem are being taken.

This is especially true considering how many of these initiatives are either already connected, or can easily be connected. However, there’s also one major issue we see connecting all those otherwise commendable efforts: Sustainability. Let’s do a quick recap.

Arxiv, which is in many ways a vital hub in this ecosystem, is a community of volunteers supported by staff at Cornell University. Papers with Code is now part of Facebook AI, with the tension in striking a balance between open research and commercial interests being a well-known issue.

Connected Papers started as a weekend side project between friends, and then it got traction. Today, it is self-funded and free to use, with one sponsor that we know of and a call for more sponsors. COVID GRAPH is a volunteer effort, and ORKG is a publicly funded research project.

Those are different ways different teams have found towards what seems like a common goal: A better research ecosystem. Essentially, they are all trying to grapple with the dilemma of how to produce public goods that belong in the Commons, in a challenging, commercially-oriented environment.

In principle, that’s not very far off from the dilemma open source creators are facing. Significant differences do exist, of course — we don’t expect to see anyone from the research ecosystem getting venture capital funding anytime soon, for example. We do, however, hope to see them live long and prosper.


Credit: Zdnet

Previous Post

ShapeShift report cites Cosmos, Polkadot, and NEAR as key emerging smart contracts tech

Next Post

How Digital Transformation Will Help Businesses in 2021?

Related Posts

DataStax Astra goes serverless | ZDNet
Big Data

DataStax Astra goes serverless | ZDNet

February 25, 2021
Off-chain reporting: Toward a new general purpose secure compute framework by Chainlink
Big Data

Off-chain reporting: Toward a new general purpose secure compute framework by Chainlink

February 25, 2021
Cutting-edge Katana Graph scores $28.5 million Series A Led by Intel Capital
Big Data

Cutting-edge Katana Graph scores $28.5 million Series A Led by Intel Capital

February 24, 2021
Hasura connects GraphQL to the REST of the world
Big Data

Hasura connects GraphQL to the REST of the world

February 23, 2021
As Power BI aces Gartner’s new Magic Quadrant, what’s the story behind Microsoft’s success?
Big Data

As Power BI aces Gartner’s new Magic Quadrant, what’s the story behind Microsoft’s success?

February 19, 2021
Next Post
How Digital Transformation Will Help Businesses in 2021?

How Digital Transformation Will Help Businesses in 2021?

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Benefits of Data Integration – Data Science Central
Data Science

Benefits of Data Integration – Data Science Central

March 1, 2021
Machine learning could aid mental health diagnoses: Study – ETCIO.com
Machine Learning

Machine learning could aid mental health diagnoses: Study – ETCIO.com

March 1, 2021
The Bayesian vs frequentist approaches: implications for machine learning – Part two
Data Science

The Bayesian vs frequentist approaches: implications for machine learning – Part two

March 1, 2021
Google’s deep learning finds a critical path in AI chips
Machine Learning

Google’s deep learning finds a critical path in AI chips

March 1, 2021
9 Tips to Effectively Manage and Analyze Big Data in eLearning
Data Science

9 Tips to Effectively Manage and Analyze Big Data in eLearning

March 1, 2021
Machine Learning & Big Data Analytics Education Market 2021 Global Industry Size, Reviews, Segments, Revenue, and Forecast to 2027 – NeighborWebSJ
Machine Learning

Machine Learning & Big Data Analytics Education Market 2021 Global Industry Size, Reviews, Segments, Revenue, and Forecast to 2027 – NeighborWebSJ

March 1, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Benefits of Data Integration – Data Science Central March 1, 2021
  • Machine learning could aid mental health diagnoses: Study – ETCIO.com March 1, 2021
  • The Bayesian vs frequentist approaches: implications for machine learning – Part two March 1, 2021
  • Google’s deep learning finds a critical path in AI chips March 1, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates