As you can see above, there are various models available based on the use case, and each one has its own advantages and disadvantages. So it always makes sense to compare the performance of the model against each other using benchmarking criteria and based on that, choose the preferred model. Once a model chosen its important to monitor the model on an ongoing performance as the performance of the model may degrade over time. Thus the parameters might need to be tweaked to make the model more effective or replaced by another model. Typically the types of metrics used for comparing models are
- Classification Measures: These measures include comparative criteria include the Receiver Operating Characteristic (ROC) charts and corresponding area under the curve, classification rates etc.
- Data Mining Measures: These measures include comparative criteria from the study of data mining include lift and gain measures and profit and loss measures.
- Statistical Measures: These measures include comparative criteria from statistical literature include Bayesian Information Criterion (BIC), Akaike’s Information Criterion (AIC), Gini statistics, Kolmogorov-Smirnov statistics, and Bin-Best Two-Way Kolmogorov-Smirnov tests.
While building, we must avoid overfitting (Learning based on variables which actually don’t have an influence) using regularization which reduces the magnitude of the coefficients so that the impact of individual variables is somewhat dulled.
Thus to build model effectively, it is important to bring the data together, enrich the data, build the models on the data, evaluate the best model and implement the model and constantly monitor the model performance to tune or replace the model as appropriate.