Saturday, February 27, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Data Science

Simulated Statistics is the New Black

May 27, 2020
in Data Science
Simulated Statistics is the New Black
585
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

Over the years I’ve often been asked by beginners where they should start in statistics, what they should do first, and which parts of statistics they should prioritise to get them to where they want to be (which is usually a higher paid job).

You might also like

The Education Industrial Complex: The Hammer We Have

Increasing Adoption of Informatics will Promote Growth of Data Analytics Outsourcing Market

The Ethereum Virtual Machine (EVM)

Now, as I’m almost completely self-taught I don’t really consider myself an authority in where one should get started, and I struggle to answer this question with any great conviction.

Sure, I have some thoughts about this subject, but they are coloured by my own experiences.

So I thought I’d reach out to some of our statistics friends to see what they can bring to the party.

Each of the statisticians in this post were asked the same question:

If you had to start statistics all over again, where would you start?

The answers were astounding — they turned out to be a roadmap of how to become a modern statistician from scratch.

In short, how to be a future statistician without ever needing a single lesson!

There is a schism in statistics, and that is between the frequentists and the Bayesians.

Let’s see what the statisticians have to say about this debate.

We start with Kirk Borne (Twitter: @KirkDBorne), astrophysicist and rocket scientist (well, rocket data scientist). Surprisingly, he tells me he’s never never had any interest in being an astronaut!

“I am not a statistician, nor have I ever had a single course in statistics, though I did teach it at a university. How’s that possible?”

Funnily enough, that was the same for me! So where did he get all his stats from?

“I learned basic statistics in undergraduate physics and then I learned more in graduate school and beyond while doing data analysis as an astrophysicist for many years. I then learned more stats when I started exploring data mining, statistical learning, and machine learning about 22 years ago. I have not stopped learning statistics ever since then”.

This is starting to sound eerily like my stats education. All you need to do is drop the ‘astro’ from astrophysics and they’re identical! So what does he think of starting stats all over again?

“I would have started with Bayesian inference instead of devoting all of my early years to simple descriptive data analysis. That would have led me to statistical learning and machine learning much earlier. And I would have learned to explore and exploit the wonders and powers of Bayesian networks much sooner”.

This is also what Frank Harrell, author and professor of biostatistics at Vanderbilt University School of Medicine at Nashville thinks about hitting the reset button on statistics (Twitter: @f2harrell). He told me:

“I would start with Bayesian statistics and thoroughly learn that before learning anything about sampling distributions or hypothesis tests”.

And Lillian Pierson, CEO of Data-Mania (Twitter: @Strategy_Gal) also mentioned Bayesian statistics when I asked her where she would start:

“If I had to start statistics all over again, I’d start by tackling 3 basics: t-test, Bayesian probability & Pearson correlation”.

Personally, I haven’t done very much Bayesian stats, and it’s one of my biggest regrets in statistics. I can see the potential in doing things the Bayesian way, but as I’ve never had a teacher or a mentor I’ve never really found a way in.

Maybe one day I will — but until then I will continue to pass on the messages from the statisticians in here.

Repeat after me:

Learn Bayesian stats.

Learn Bayesian stats.

LEARN BAYESIAN STATS!

As I was reaching out and gathering quotations I got a rather cryptic response from Josh Wills (Twitter: @josh_wills), software engineer at Slack and founder of the Apache Crunch project (he also describes himself as an ‘ex-statistician’):

“Computation before calculus is the pithy answer”, he told me.

This intrigued me, so I asked him if he could elaborate a little, and here is his reply:

“So I think stats can be and is taught in three ways:

1. a set of recipes

2. from the perspective of calculus — mostly integrals and what not, and

3. computationally (like the bootstrap as a fundamental thing)”

“Most folks do the recipes approach, which doesn’t really help with understanding stuff but is what you do when you don’t know calculus”.

Ah, I understand the ‘set of recipes approach’, but I didn’t know anyone was still doing the calculus approach. He went further:

“I was a math major, so I did the calculus based approach, because that’s what you did back in the day. You mostly do some integrals with a head nod to computational techniques for distributions that are too hard to do via integrals. But the computational approach, even though it was discovered last, is actually the right and good way to teach stats”.

Whew, thank God for that — I thought he was saying that we should all learn the calculus approach!

“The computational approach can be made accessible to folks who don’t know calculus, and it’s actually most of what you use in the hard parts of real world statistics problems anyway. The calculus approach is historically interesting, but (and I feel heretical for saying this) it should be relegated to a later course on the history of statistical thought — not part of the intro sequence”.

It’s interesting to see the evolution of statistics in this light and shows just how far we’ve come — and in particular how much computers and computing power have developed over the past couple of decades.

It’s truly mind-blowing to think that when I was doing my PhD 20 years ago it was difficult getting hold of data, and when you did get some, you had to network computers together to get enough computing power. Now we’re all swimming in data and err, well, we still struggle to get enough computing power to do what we want — but it’s still way more than we used to have!

I also got a really interesting perspective from Cassie Kozyrkov, Head of Decision Intelligence at Google (Twitter: @quaesita), who told me that she’d:

“Probably enjoy making a bonfire out of printed statistical tables!”

Well, amen to that, but seriously though, where would you start again with stats?

“Simulation! If I had to start all over again, I’d want to start with a simulation-based approach to statistics”.

OK, I’m with you, but why specifically simulation?

“The ‘traditional’ approach taught in most STAT101 classes was developed in the days before computers and is unnecessarily reliant on restrictive assumptions that cram statistical questions into formats you can tackle analytically with common distributions and those nasty obsolete printed tables”.

Got you. So what exactly have you got against the printed tables?

“Well, I often wonder whether traditional courses do more harm than good, since I keep seeing their survivors making ‘Type III errors’ — correctly answering the wrong convenient questions. With simulation, you can go back to first principles and discover the real magic of statistics”.

Statistics has magic?

“Sure it does! My favorite part is that learning statistics with simulation forces you to confront the role that your assumptions play. After all, in statistics, your assumptions are at least as important as your data, if not more so”.

And when it came to offering his advice, Gregory Piatetsky, founder of KDnuggets (Twitter: @kdnuggets), suggested that:

“I would start with Leo Breiman’s paper on Two Cultures, plus I would study Bayesian inferencing”.

If you haven’t read that paper (which is open access), Leo Breiman lays out the case for algorithmic modelling, where statistics are simulated as a black box model rather than following a prescribed statistical model.

This is what Cassie was getting at — statistical models rarely fit real-world data, and we are left to either try to shoe-horn the data into the model (getting the right answer to the wrong question) or switch it up and do something completely different — simulations!

This is an excerpt of my original post, which is quite long — too long to post here in its entireity (there are more than 30 world-class contributors!).

If you’re enjoying reading, you might be interested to hear what Dez Blanchfield had to say about domain experts, or what Michael Friendly and Alberto Cairo said about the past, present and future of data visualisation.

There’s also a free book to download detailing all the comments made by the contributors, including what Jacqueline Nolis and Kristen Kehrer had to say about starting their careers over.

And don’t get me started with the epic suggestions that Natalie Dean and Jen Stirrup had about Information Flow and Detective Work.

Awesome — you really don’t want to miss them!

Read more here


Credit: Data Science Central By: Lee Baker

Previous Post

Stop Blaming Kate Middleton’s Problems on Meghan Markle

Next Post

Qihoo & Baidu disrupt malware botnet with hundreds of thousands of victims

Related Posts

The Education Industrial Complex: The Hammer We Have
Data Science

The Education Industrial Complex: The Hammer We Have

February 27, 2021
Increasing Adoption of Informatics will Promote Growth of Data Analytics Outsourcing Market
Data Science

Increasing Adoption of Informatics will Promote Growth of Data Analytics Outsourcing Market

February 27, 2021
The Ethereum Virtual Machine (EVM)
Data Science

The Ethereum Virtual Machine (EVM)

February 27, 2021
Levels of Measurement (Nominal, Ordinal, Interval, Ratio) in Statistics
Data Science

Levels of Measurement (Nominal, Ordinal, Interval, Ratio) in Statistics

February 27, 2021
Give Your Business Users Simple Augmented Analytics
Data Science

Give Your Business Users Simple Augmented Analytics

February 26, 2021
Next Post
Qihoo & Baidu disrupt malware botnet with hundreds of thousands of victims

Qihoo & Baidu disrupt malware botnet with hundreds of thousands of victims

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Oxford University lab with COVID-19 research links targeted by hackers
Internet Security

Oxford University lab with COVID-19 research links targeted by hackers

February 27, 2021
The Education Industrial Complex: The Hammer We Have
Data Science

The Education Industrial Complex: The Hammer We Have

February 27, 2021
New AI Machine Learning Reduces Mental Health Misdiagnosis
Machine Learning

New AI Machine Learning Reduces Mental Health Misdiagnosis

February 27, 2021
Fastest VPN in 2021 | ZDNet
Internet Security

Fastest VPN in 2021 | ZDNet

February 27, 2021
Increasing Adoption of Informatics will Promote Growth of Data Analytics Outsourcing Market
Data Science

Increasing Adoption of Informatics will Promote Growth of Data Analytics Outsourcing Market

February 27, 2021
MindMed Closes Acquisition of HealthMode, a Leading Machine Learning Digital Medicine Company
Machine Learning

MindMed Closes Acquisition of HealthMode, a Leading Machine Learning Digital Medicine Company

February 27, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Oxford University lab with COVID-19 research links targeted by hackers February 27, 2021
  • The Education Industrial Complex: The Hammer We Have February 27, 2021
  • New AI Machine Learning Reduces Mental Health Misdiagnosis February 27, 2021
  • Fastest VPN in 2021 | ZDNet February 27, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates