Skip to main content

Tip 3: Give instructions on defect boundaries

Your defect book is becoming a reliable guideline for any labeler! You have provided counterexample samples and instructions on how to differentiate easily confused classes to help clarify confusing classes.


Figure 1. Scratch defect book with distinguishing visual features definitions, counterexample samples, and instructions on how to differentiate easily confused classes.

Another common source of labeling uncertainty that is easy to avoid using your label book is how to draw bounding boxes. Your labeler should know from your defect whether to label a chip defect sample with label style 1 (loose bounding box) or label style 2 (tight bounding box):


Figure 2. Left: bounding box drawn loosely around chip defect Right: bounding box drawn tightly around chip defect.

or how to multiple multiple instances of the same defect with label style 1 (encompassing all defects) or label style 2 (labeling separate defects with separate bounding boxes):


Figure 3. Left: bounding box encompassing all chip defects Right: labeling separate defects with separate bounding boxes.

Without explicit instructions, the labelers are likely to vary the tightness of the bounding box around the defect or how they label multiple instances of the same defect. A model fed this type of inconsistent data will be randomly penalized for not matching perfectly with the bounding box and get confused on how to predict. For instance, the model might detect the defect correctly but label a sample with label style 1 and get penalized since the original annotation was created with label style 2.


Figure 4. Model prediction in label style 1 and real label in label style 2.

During evaluation, model performance will be artificially low because it cannot match the bounding box exactly to the inconsistent labels. This issue becomes especially prominent when labeling very small defects, and prevents fair evaluation of the model’s performance.

A defect book can prevent these issues by:

  • Explicitly addressing how to label multiple instances
  • Giving pixel-wise rules for how tight to make bounding boxes or segmentation areas


Which is the best improvement over the bare-bones defect book gives the labeler the best instructions as to how to draw bounding boxes and segmentation areas?

A. 1a


Click to reveal solution!
A. This one is close! This improvement has precise instructions on how to label multiple instances and how tight the bounding boxes are, as well as some supporting real labeled examples. However, it does not explicitly demonstrate how to label multiple instances of the same defect.

B. Without labeling instructions, there will likely be variation in how labelers choose to draw bounding boxes.

C. This defect book improves on choice A by including an example of how to label multiple chips and a description of the image as well. You can give your images descriptions in landing lens and view them by clicking through them:

You can apply these tips to this stage of the ML lifecycle: