Operationalization refers to the deployment of a machine learning model after it has been trained and evaluated to the point where it is ready to be used outside of a development or test environment.
Typical Model Deployment:
- Get the model file (any file format)
- Create a scoring script (.py)
- Optionally create a schema file describing the web service input (.json)
- Create a real-time scoring web service
- Call the web service from applications
- Repeat the process each time the model is re-trained
The model training process can be very compute-intensive, with training times that can potentially span across many hours, days, or even weeks. A trained model, on the other hand, is used to make decisions on new data quickly. In other words, it infers things about new data it is given based on its training. Making these decisions on new data on-demand is called Real-time Inferencing.
Unlike real-time inferencing, which makes predictions on data as it is received, Batch Inferencing is run on large quantities (batches) of existing data. Typically, batch inferencing is run on a recurring schedule against data stored in a database or other data store. The resulting predictions are then written to a data store for later use by applications, data scientists, developers, or end-users.
Batch scoring is used typically when historical time-based data is to be considered that is as long as possible and the value has to be predicted at a certain level of granularity. It typically involves latency requirements of hours or days so it doesn’t require using train models deployed to restful web services, as is done for real-time inference. Use Batch Inferencing when:
- No need for real-time
- Inferencing results can be persisted
- Post-processing or analysis of the predictions is needed
- Inferencing is complex
Due to the scheduled nature of batch inferencing, predictions are not usually available for new data. However, in many common real-world scenarios, predictions are needed on both newly arriving and existing historical data. This is where Lambda Architecture comes in.
The gist of the Lambda architecture is that ingested data is processed at two different speeds. A Hot path that tries to make predictions against the data in real-time, and a cold path that makes predictions in a batch fashion, which might take days to complete.