Friday, March 5, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Data Science

18 Handy Resources for Machine Learning Practitioners

June 12, 2020
in Data Science
What are Data Pipelines ?
585
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

Machine Learning is a diverse field covering a wide territory and has impacted many verticals. It is able to tackle tasks in language and image processing, anomaly detection, credit scoring sentiment analysis, forecasting alongside dozens of other downstream tasks. A proficient developer, in this line of work; has to be able to draw, borrow, and steal from many adjacent fields such as mathematics, statistics, programming, and most importantly common sense. I for one have drawn tremendous benefits from myriad of tools available to break down complex tasks into smaller more manageable components. It turns out that developing and training a model only takes a small fraction of the project duration. The bulk of the time and resources are spent on data acquisition, preparation, hyperparameter tuning, optimization, and model deployment. I have been successful in building a systematic knowledge base that has helped my team to tackle some common yet tough challenges. The following is an attempt to identify some of them:

  1. Building an efficient and reliable end-to-end deep learning pipeline can be very challenging. Fortunately, there are a myriad of ‘workflow management’ tools that can dramatically lessen the difficulties of this task. Jenkins, Airflow, and KubeFlow are to name a few. While each one has strengths and weaknesses, my favorite is Airflow. Fortunately there are many online tutorials for Airflow but my favorite is the following video series on YouTube by Tuan Vu
  2. It does not take much for any serious ML practitioner to recognize the importance of feature engineering. Time spent on analyzing and transforming feature columns in a dataset will produce dramatic improvements in the outcome. While feature engineering is critical, it can be complex and time consuming. I am absolutely impressed by the capabilities of a package called Automunge used for feature engineering and transformation. The tool can tackle complex numerical and categorical transformations, custom infills, and ‘feature importance analysis’, oversampling, and much more. Truly remarkable
  3. If you are a heavy user of Jupyter notebooks and want to scale your reliance on them, check out papermill. The environment allows parameterization of notebooks and enables the execution of various routines via Python API, as well as CLI. papermill can help you store notebooks in a number of locations including AWS S3, Azure data blobs, and Azure data lakes. Last but not least, papermill supports powerful features needed to write unit tests
  4. Although I have not been a user, but I have heard rave reviews about Deequ. Deequ can be viewed as a tool for testing large datasets. It is an open source tool developed by Amazon and is intended to generate data quality metrics for large datasets destined for production. This is done in accordance with the quality constraints set by the user. Effective use of this tool will eliminate the need for developing code to manually perform checks and balances. The tool is implemented on Apache Spark and is designed to scale with large datasets
  5. If you are involved in Language Processing. You will find the site “The Super Duper NLP Repo” invaluable. If you can’t find things there; the chances are that you don’t need them.
  6. I have used Docker for a long time and frankly I can’t imagine how things were done before its availability. I recently went through the following videos and learned quite a bit specially when it comes to nuances: (registration required) 
  7. How to Get Started with Docker
  8. Simplify All the Things with Docker Compose
  9. Hands on Helm
  10. Build and Deploy Multi Container Applications to AWS
  11. They say pandas are powerful, proven, fast, and user friendly. I am in agreement with the first three. That should explain why I keep the following cheat sheets handy at all times. (Introduction to Pandas for Data Science, 10 minute pandas)
  12. Like feature engineering, hyperparameter tuning is a critical and resource intensive phase in the development of machine learning pipeline. There are many approaches for hyperparameter optimizations such as grid, random, manual, and automated search (using Bayesian optimization). If you choose to pursue the automated path, I recommend evaluating Ax (Adaptive Experimentation Platform). This package has been developed by Facebook and is quite mature and easy to use
  13. If your deep learning works the very first time; be assured that you are doing something wrong. I have found the following list to be useful should you be looking for ways to troubleshoot your model:
  14. 37 Reasons why your Neural Network is not working
  15. I have been an early user of Tensorflow and have not shied away by saying that Tensorflow is “Google’s Revenge on Humanity”. I am pleased to indicate that my views of this framework have completely changed (for the better) once I started using the 2.0 version. Very, very powerful and much easier to use.
  16. If you have found difficulty finding an appropriate dataset for running experiments, I recommend checking out Google’s Dataset Search Engine
  17. Developing, training, and tuning a machine learning model are just the beginnings. A rigorous testing regime is required before models can be deployed in the real world. I have found this paper to be a very thorough and systematic for addressing pre-production testing. It has distinct sections covering model, data, infrastructure, and production testing.
  18. If you are new to the field of Language Processing, you will find a tremendous value in learning about opensource tools such as SpaCy, NLTK, Flair, and StanfordNLP. This article is a great summary of such tools.
  19. I would like to extend a big shout-out to Chris Fregly. Chris was a principal at PipelineAI and recently joined the AWS machine learning team. I have never met anyone with such a commanding knowledge of the building blocks needed to build an end-to-end AI pipeline. He is intimately familiar with a plethora of tools and frameworks and presents his technical knowledge with an infectious enthusiasm and tone. He hosts a free monthly workshop that is super informative.
  20. Call it bad karma or bad luck, but I have rarely been able to deal with a nice, symmetric, and balanced dataset. I have found the following resources beneficial when I had to contend with an unbalanced dataset:
  21. Step-By-Step Framework for Imbalanced Classification Projects
  22. Undersampling Algorithms for Imbalanced Classification
  23. The Impact of Imbalanced Training Data for Convolutional Neural Net…
  24. Weight normalization is a crucial task when it comes to training deep networks. It has many benefits such as:
  25. Enables faster training by allowing higher learning rates
  26. Has “Regularization” effect
  27. It easies weight initialization

I have found this article to be an excellent summary for various types of normalizations

You might also like

A Plethora of Machine Learning Articles: Part 2

The Effect IoT Has Had on Software Testing

Why Cloud Data Discovery Matters for Your Business

  1. If you are like me and spend more than you like for cloud-based GPUs, check out Google’s Colab Pro. For less than $10/mo. you can have access to GPUs/TPUs and a decent amount of memory to help with your prototyping
  2. If ‘Model Explainability’ is a paramount requirement for your project and you are considering using SHAP or LIME, you will find the following summaries beneficial:
  3. SHAP and LIME Python Libraries: Part 1 – Great Explainers, with Pro…
  4. SHAP and LIME Python Libraries: Part 2 – Using SHAP and LIME

 


Credit: Data Science Central By: Al Gharakhanian

Previous Post

Stunning PS5 ‘Project Athia’ Tease Nails Your Next-Gen Promise

Next Post

Stalkerware detection rates are improving across antivirus products

Related Posts

A Plethora of Machine Learning Articles: Part 2
Data Science

A Plethora of Machine Learning Articles: Part 2

March 4, 2021
The Effect IoT Has Had on Software Testing
Data Science

The Effect IoT Has Had on Software Testing

March 3, 2021
Why Cloud Data Discovery Matters for Your Business
Data Science

Why Cloud Data Discovery Matters for Your Business

March 2, 2021
DSC Weekly Digest 01 March 2021
Data Science

DSC Weekly Digest 01 March 2021

March 2, 2021
Companies in the Global Data Science Platforms Resorting to Product Innovation to Stay Ahead in the Game
Data Science

Companies in the Global Data Science Platforms Resorting to Product Innovation to Stay Ahead in the Game

March 2, 2021
Next Post
Stalkerware detection rates are improving across antivirus products

Stalkerware detection rates are improving across antivirus products

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Accellion zero-day claims a new victim in cybersecurity company Qualys
Internet Security

Accellion zero-day claims a new victim in cybersecurity company Qualys

March 5, 2021
How to Meet the Enterprise-Grade Challenge of Scaling AI 
Artificial Intelligence

How to Meet the Enterprise-Grade Challenge of Scaling AI 

March 5, 2021
Comprehensive Report on Machine Learning Market 2021 | Size, Growth, Demand, Opportunities & Forecast To 2027
Machine Learning

Comprehensive Report on Machine Learning Market 2021 | Size, Growth, Demand, Opportunities & Forecast To 2027

March 5, 2021
GAO report finds DOD’s weapons programs lack clear cybersecurity guidelines
Internet Security

GAO report finds DOD’s weapons programs lack clear cybersecurity guidelines

March 5, 2021
Convergence of AI, 5G and Augmented Reality Poses New Security Risks 
Artificial Intelligence

Convergence of AI, 5G and Augmented Reality Poses New Security Risks 

March 5, 2021
2021 Gartner Magic Quadrant for Data Science and Machine Learning Platforms
Machine Learning

2021 Gartner Magic Quadrant for Data Science and Machine Learning Platforms

March 5, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Accellion zero-day claims a new victim in cybersecurity company Qualys March 5, 2021
  • How to Meet the Enterprise-Grade Challenge of Scaling AI  March 5, 2021
  • Comprehensive Report on Machine Learning Market 2021 | Size, Growth, Demand, Opportunities & Forecast To 2027 March 5, 2021
  • GAO report finds DOD’s weapons programs lack clear cybersecurity guidelines March 5, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates