Sunday, February 28, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Data Science

Neuromancer Blues: Threading vs Multiprocessing

May 4, 2020
in Data Science
Neuromancer Blues: Threading vs Multiprocessing
586
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

Neuromancer Blues” is a series of posts where I would like the reader to find guidance about overall data science topics such as data wrangling, database connectivity, applied mathematics and programming tips to boost code efficiency, readability and speed. My examples and coding snippets will be as short and sweet as possible in order to convey the key idea instead of providing a lengthy code with poor readability that damages the purpose of the post.

Threading (AKA multithreading) and multiprocessing is a topic I wanted to write for a long time. This first post will be focused on introducing both concepts with emphasis on threading, and why is so important for developers in finance. Future Neuromancer series posts will spend more time on multiprocessing and programming efficiency issues such as race conditions or deadlocks.

You might also like

The Education Industrial Complex: The Hammer We Have

Increasing Adoption of Informatics will Promote Growth of Data Analytics Outsourcing Market

The Ethereum Virtual Machine (EVM)

The Basics
Let’s clarify key concepts that will be recurrent in this post:

  • Concurrency: when two tasks can start, run, and complete in overlapping time periods. 
  • Parallelism: when tasks literally run at the same time e.g.  multi-core processor
  • I/O-bound task: tasks that spends most of the time in i/o (input/output) state i.e. network read/write, database queries, etc.
  • CPU-bond task: tasks that spends most of the time in CPU i.e. browsing multiple websites, machine algo training, etc.
  • GIL (Global Interpreter Lock): GIL is a lock that avoids a single python process to run threads in parallel (using multiple cores) at any point in time, yet they can be run concurrently. Although not visible for single-threaded tasks, GIL becomes an issue when performing multi-threaded code or CPU-bound tasks.
  • Deadlock: event that occurs when two or more threads/tasks are waiting on each other, not executing and blocking the whole program. 
  • Race conditions: event that occurs when two or more threads/tasks run in parallel but end up delivering a flawed result as a consequence of an incorrect sequence of the operations.

In Coding we need to avoid both race conditions and deadlocks. The best recipe to avoid race conditions is to apply Thread Safety policies using either approaches that avoid a shared state or methods related to  synchronization. In addition, we wish also to avoid deadlocks by having processes crossing over building mutual dependencies i.e. reduce the need to lock anything as much as you can. These topic is more advanced and deserve another series of posts in the future. 

Threading vs Multiprocessing: Brief Intro
Now we are in the same page thus let’s answer what is the difference between threading and multiprocessing and why it matters so much. The table below provides a comprehensive comparison between methods.

In a nutshell, threading is used to run multiple threads/tasks at the same time inside the same process , yet it will not enhance speed if we are already using 100 % CPU time. On the other hand, multiprocessing allows the programmer to open multiple processors on a given CPU, each one of them with their own memory and with no GIL limitations. 

Python threads are used in I/O-bound tasks mainly where the execution involves some waiting time. In Finance a straightforward example is when querying an external database, for which reason we will simulate a similar i/O-bound task using yahoo finance data.

As I mentioned in the introduction, I would like to concentrate on threading for this post so let’s get down to business.

Classic Approach: Using Threading Module
The threading module provided with Python includes a simple-to-implement locking mechanism that allows you to synchronize threads. In other words, this module allows you to have different parts of your program run concurrently and improve the code readability.

Let’s first understand what’s the point of doing threading. The code snippet underneath shows a looping process where we execute a I/O-bound tasks reading financial data from yahoo finance using pandas datareader module. Although pandas datareader allows bulking –  downloading data from a list of tickers at once – we are going to naively run each I/O-task on a stand-alone basis per ticker and following a sequential execution approach i.e. a new call only starts when the former call is finished.

Please notice about the one second delay introduced within our io_task() function with time module. For the sake of simplicity, we are only downloading price data for less than five years, for which reason this is a pretty fast query by nature. The introduction of this one second delay simulates that our query is taking more time e.g. downloading data   such as +100 fundamental indicators, which is a more realistic simulation.


Credit: Data Science Central By: Carlos Salas, CFA, CQF

Previous Post

How Big Data Is Attacking the Coronavirus - The Wall Street Journal

Next Post

Australia's COVIDSafe contact tracing story is full of holes and we should worry

Related Posts

The Education Industrial Complex: The Hammer We Have
Data Science

The Education Industrial Complex: The Hammer We Have

February 27, 2021
Increasing Adoption of Informatics will Promote Growth of Data Analytics Outsourcing Market
Data Science

Increasing Adoption of Informatics will Promote Growth of Data Analytics Outsourcing Market

February 27, 2021
The Ethereum Virtual Machine (EVM)
Data Science

The Ethereum Virtual Machine (EVM)

February 27, 2021
Levels of Measurement (Nominal, Ordinal, Interval, Ratio) in Statistics
Data Science

Levels of Measurement (Nominal, Ordinal, Interval, Ratio) in Statistics

February 27, 2021
Give Your Business Users Simple Augmented Analytics
Data Science

Give Your Business Users Simple Augmented Analytics

February 26, 2021
Next Post
Australia’s COVIDSafe contact tracing story is full of holes and we should worry

Australia's COVIDSafe contact tracing story is full of holes and we should worry

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Privacy Commissioner asks for clarity on minister’s powers in Critical Infrastructure Bill
Internet Security

Privacy Commissioner asks for clarity on minister’s powers in Critical Infrastructure Bill

February 28, 2021
Top Master’s Programs In Machine Learning In The US
Machine Learning

Top Master’s Programs In Machine Learning In The US

February 28, 2021
TikTok agrees to pay $92 million to settle teen privacy class-action lawsuit
Internet Security

TikTok agrees to pay $92 million to settle teen privacy class-action lawsuit

February 28, 2021
Machine Learning as a Service (MLaaS) Market 2020 Emerging Trend and Advancement Outlook 2025
Machine Learning

Key Company Profile, Production Revenue, Product Picture and Specifications 2025

February 28, 2021
Cybercrime groups are selling their hacking skills. Some countries are buying
Internet Security

Cybercrime groups are selling their hacking skills. Some countries are buying

February 28, 2021
New AI Machine Learning Reduces Mental Health Misdiagnosis
Machine Learning

Machine Learning May Reduce Mental Health Misdiagnosis

February 28, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Privacy Commissioner asks for clarity on minister’s powers in Critical Infrastructure Bill February 28, 2021
  • Top Master’s Programs In Machine Learning In The US February 28, 2021
  • TikTok agrees to pay $92 million to settle teen privacy class-action lawsuit February 28, 2021
  • Key Company Profile, Production Revenue, Product Picture and Specifications 2025 February 28, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates