Monday, March 8, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Data Science

Break That Data – Data Science Central

January 31, 2020
in Data Science
Discover how machine learning can solve finance industry challenges by Jannes Klaas
585
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

Even though processing and storage have become cheap and enterprises are adopting high performance analytics infrastructure, still in majority of cases the analytics study is constrained by local system. Multiple scenarios like working on a proof-of-concept in a small enterprise which can’t afford investing on heavy analytics infra, academic projects, hobby projects… the list goes on and the common factor here is system constraints.

Though we can’t escape the scenarios where we are endlessly staring at the blinking cursor, we can definitely decrease such instances. A faster end result or a swifter turnaround of sample case testing can really make the difference with a lesser frustrating code execution sessions. The two primary reasons of time intensive computations are data size and algorithm complexity. In the following article, we shall look into some effective measures to decrease the data size, without compromising on the output.

You might also like

An Easy Way to Solve Complex Optimization Problems in Machine Learning

A Plethora of Machine Learning Articles: Part 2

The Effect IoT Has Had on Software Testing

  • Ingest and hold limited data: Data size is one of the crucial limiting factor in faster output
    1. Avoid ‘Select *’ – A representative calculation presented here shows that a million rows with just 5 columns can be as big as 80 MB data. By current data standards, we frequently work on data sets with 10s of millions of rows. What if there were more columns and rows. Add onto this the processing complexity and multiple variables which would hold some subset of data. To avoid the so called ‘endless processing’ here rather than selecting all the columns in the data table, import only necessary columns
    2. Drop the unnecessary – Drop the columns on which you have already done some processing and are never going to use
    3. Use ‘where’ clause – Rather than working on the whole blob of data and then running loops on subsets, at times it better to just ingest the data of that one city or one product category. This would greatly boost processing speed as at every step there is lesser quantum of data and also algorithm doesn’t have to work on unnecessary pattern recognition
    4. Treat frequent and long tail data separately – If there is clear & wide distinction in occurrence frequency of certain data points (e.g. sub categories), at times its better to identify and process long tail data separately from frequent occurring data set. One of the scenarios can be recommendation engines like Market Basket Analysis, where Support and Confidence values influence the processing time. If we use a large value of Support, the outcome would be fast but long tail items could be missed while if we keep it small – at times even 0.001 – will increase the processing time and output rules exponentially. Better to find rules among long tail items separately
    5. Remove variables – Frequently remove unnecessary and copied variables to free memory [rm()]
    6. Process data in smaller chunks to avoid memory blocking of long duration
  • Loops: The overlooked monster which can plague the system with maximum possible inefficiencies
    1. Break/Jump loop steps – Always put logic to break the loop, break(), or to jump to next value in loop, next(), so that if a necessary condition is not satisfied, the processing can be terminated without wasting precious time and processing power
    2. Loop vs apply() – Processing speed of inbuilt functions is much higher compared to using loops. Hence, instead of looping and creating subsets to perform any transformation/ aggregation/ calculation, its generally advisable to use apply(), which() functions or use data.table object rather than dataframe.
    3. Parallel processing – Unless forced, R would use only one of the cores of your machine by default. Use packages like doParallel and foreach to not only allocate more core explicitly but also promote running the loop in multiple parallel processing branches.
    4. Shoot down high time consuming loop/code activity – Track the key values. Once the loop starts it becomes very difficult to know the values at every step. Use print () statement to print the in-loop relevant values. It would help in knowing at which value loop is slowing down
    5. Avoid IF in loops – Till possible check the condition of running the loop, outside the loop

                      e.g. ConditionOfLoop = (data$col1 + data$col2)>threshold

                     now take the row number of the ConditionOfLoop for True or False condition, before using the rows in loop

  • Convert the format: At times converting the format of the concerned dataset could work wonders in increasing the processing power. For e.g, if there are millions of Rules as outcome of Market Basket Analysis, its better to convert the Rules into dataframe (comma separated entities for longer Rules) and then do the processing


Credit: Data Science Central By: saurabh ajmera

Previous Post

Dementia Drivers and AI Autonomous Cars

Next Post

Hacker snoops on art sale and walks away with $3.1m, victims fight each other in court

Related Posts

An Easy Way to Solve Complex Optimization Problems in Machine Learning
Data Science

An Easy Way to Solve Complex Optimization Problems in Machine Learning

March 8, 2021
A Plethora of Machine Learning Articles: Part 2
Data Science

A Plethora of Machine Learning Articles: Part 2

March 4, 2021
The Effect IoT Has Had on Software Testing
Data Science

The Effect IoT Has Had on Software Testing

March 3, 2021
Why Cloud Data Discovery Matters for Your Business
Data Science

Why Cloud Data Discovery Matters for Your Business

March 2, 2021
DSC Weekly Digest 01 March 2021
Data Science

DSC Weekly Digest 01 March 2021

March 2, 2021
Next Post
Hacker snoops on art sale and walks away with $3.1m, victims fight each other in court

Hacker snoops on art sale and walks away with $3.1m, victims fight each other in court

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Dataiku named as Gartner Leader for Data Science and Machine Learning
Machine Learning

Dataiku named as Gartner Leader for Data Science and Machine Learning

March 8, 2021
Bill establishing cyber abuse takedown scheme for adults enters Parliament
Internet Security

eSafety defends detail of Online Safety Bill as the ‘sausage that’s being made’

March 8, 2021
An Easy Way to Solve Complex Optimization Problems in Machine Learning
Data Science

An Easy Way to Solve Complex Optimization Problems in Machine Learning

March 8, 2021
Machine Learning Patentability In 2019: 5 Cases Analyzed And Lessons Learned Part 4 – Intellectual Property
Machine Learning

Podcast: Non-Binding Guidance: FDA Regulatory Developments In AI And Machine Learning – Food, Drugs, Healthcare, Life Sciences

March 8, 2021
Here’s an adorable factory game about machine learning and cats
Machine Learning

Here’s an adorable factory game about machine learning and cats

March 8, 2021
How Machine Learning Is Changing Influencer Marketing
Machine Learning

How Machine Learning Is Changing Influencer Marketing

March 8, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Dataiku named as Gartner Leader for Data Science and Machine Learning March 8, 2021
  • eSafety defends detail of Online Safety Bill as the ‘sausage that’s being made’ March 8, 2021
  • An Easy Way to Solve Complex Optimization Problems in Machine Learning March 8, 2021
  • Podcast: Non-Binding Guidance: FDA Regulatory Developments In AI And Machine Learning – Food, Drugs, Healthcare, Life Sciences March 8, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates