When faced with poor model performance, most machine learning engineers will turn to iterating on their model - tuning hyperparameters, experimenting with different architectures, employing multiple models, etc. However, in most cases, these issues in performance can be resolved more easily and effectively with data-centric solutions. Although data is the fuel for machine learning projects, practioners often underinvest in ensuring data quality. This reference will lay out the 5 most simple and crucial steps you can take to increase your data quality and model performance using a defect book.
Figure 1. A bare-bones pill defect book.
A defect book is a list of key defects and their detailed definitions paired with sample images (Figure 1). It serves as the source of ground truth and guideline labelers should turn to when faced with questions like “should this area in the image be considered defective?” or “is this model prediction correct?”. By following the tips in this reference, you will be able to systematically avoid labeling ambiguities for your machine learning projects and improve your model performance by creating a quality defect book. This reference demonstrates the LandingLens process on a simple example, but you will be able apply this methodology to any machine learning project. Follow along to create your own defect book using this steel defect dataset.
This section will go over best practices for this stage of the ML lifecycle: