Skip to content

Latest commit

 

History

History
19 lines (16 loc) · 2.79 KB

04.baselines.md

File metadata and controls

19 lines (16 loc) · 2.79 KB

Tip 2: Use traditional methods to establish performance baselines {#baselines}

Deep learning requires practitioners to consider a larger number and variety of tuning parameters (that is, algorithmic settings) than more traditional machine learning methods. These settings are often called hyperparameters. Their extensiveness can make it easy to fall into the trap of performing an unnecessarily convoluted analysis. Hence, before applying deep learning to a given problem, it is ideal to implement simpler models with fewer hyperparameters at the beginning of each study [@doi:10.1038/s41580-021-00407-0]. Such models include logistic regression, random forests, k-nearest neighbors, Naive Bayes, and support vector machines. They can help to establish baseline performance expectations and the difficultly of a specific prediction problem. While performance baselines available from existing literature can also serve as helpful guides, an implementation of a simpler model that uses the same software framework as planned for deep learning can greatly help with assessing the correctness of data processing steps, performance evaluation pipelines, resource requirement estimates, and computational performance estimates. Furthermore, in some cases, it can even be useful to combine simpler baseline models with deep neural networks, as such hybrid models can improve generalization performance, model interpretability, and confidence estimation [@arxiv:1803.04765; @arxiv:1805.11783].

Another potential pitfall arises from comparing the performance of baseline conventional models trained with default settings with the performance of deep learning models that have undergone rigorous tuning and optimization. Since conventional off-the-shelf machine learning algorithms (for example, support vector machines and random forests) are also likely to benefit from hyperparameter tuning, such incongruity prevents the comparison of equally optimized models and can lead to false conclusions about model efficacy. Hu and Greene [@doi:10.1142/9789813279827_0033] discuss this under the umbrella of what they call the "Continental Breakfast Included" effect. They describe how the unequal tuning of hyperparameters across different learning algorithms can especially skew evaluation when the performance of an algorithm varies substantially with modest changes to its hyperparameters. Therefore, practitioners should tune the settings of both traditional machine learning and deep learning-based methods before making claims about relative performance differences. Performance comparisons among machine learning and deep learning models are only informative when the models are equally well optimized.

In sum, practitioners are encouraged to create and fully tune several traditional models and standard pipelines before implementing a deep learning model.