Machine learning (ML) is one of the hottest topics nowadays, and it turned to be the core component of many industries. For instance, ML is used for the development of autonomous driving technology in the automotive industry. ML is also used in the recommendation systems of social media and entertainment platforms, is used in the forecasting systems of the weather and global warming related activities, and also is used in understanding diseases in the field of biology and medicine.
ML can be seen as a virtual layer in software that contains the logic and the required processes for the machine (or the computer) to learn. The machine usually has inputs (to observe variable from the surrounding environment) and outputs (to interact with that environment). The learning concept can be achieved if the machine can interact with the environment in a way that minimize a certain variable (loss), or maximize a certain variable (gain). To achieve that goal, this software layer requires implementing mathematical and statistical functions in order to build a model that can learn from the historical observations (we can relate this to the training dataset) in a way that model interacts with the environment and optimize its goal, either loss or gain, in the future observations (we can relate this to the testing data).
Traditionally, experienced software engineers and developers are those who work as ML engineers and developers after developing additional mathematical and statistical skills. Software engineers have other skills such as databases, software technology, software debugging, and frontend-backend interactions. Those skills are required in order to build the end-to-end software codes needed for production and for delivering ready-to-use products.
1. Why Corporate AI projects fail?
2. How AI Will Power the Next Wave of Healthcare Innovation?
3. Machine Learning by Using Regression Model
4. Top Data Science Platforms in 2021 Other than Kaggle
Due to the high demand for ML engineers and developers, and given that ML usually requires a lot of prototyping cycles for model development before starting the production, the data science field started to emerge. Data science focuses more on creating prototypes for the techniques and the methodologies used for building and training models. Those prototypes can be developed using high level programming languages and platforms such as python and MatLab. Python, for instance, has rich libraries that developers can use for quickly building their models. Those libraries are getting developed in a very fast pace, and because of this development, certain ML prototypes can be built in relatively low number of code line. Also, certain highly specialized libraries are started to appear, and those libraries can be used in a specific discipline. For instance, Biopython is a library specialized in bioinformatics and computational biology, and this library is ready to read/write certain formats such as gene sequencing files.
In my opinion, this development in the libraries will continue towards specialization. Also, the development of optimal ML model will depends more on the data itself, and less on the methodology. The effort of reading data, preprocessing data, and results interpretation will continue to expand and will continue in getting bigger portions of the total effort exerted in the ML development process.
In many scientific disciplines, ML started to gain popularity, and started to achieve significant results. That can encourage specialists within each discipline to learn about the relevant ML libraries. I predict that biologists will be able to build their ML models by their own in 2030. I predict that civil engineers will be able to build their earthquake ML models. I predict that some among medical specialists, biologists, civil engineers, chemical engineers, mechanical engineers, … etc will be the DATA SCIENTISTS of 2030. I believe everyone in our field needs to decide at what side we want to be on: SOFTWARE ENGINEERING or THE DISCIPLINES.