Skip to main content

Tip 5: Give examples of rare cases

One final tip that you can use for your machine learning project is to include examples of rare cases. As you and your team of labelers are going through the dataset, you will begin to encounter anomalies or rare cases that you may be unsure how to label. Ignoring the labeling of these cases will have negative downstream consequences for your model performance - your model will not understand how to handle them and ignore them as well. For instance, take this image of a chip defect that has a red center:


Figure 1. Chip defect with a red center.

Although the sample is visually different than the rest of your chip defect dataset, your model will still need to correctly predict that this image represents a defective pill. Whenever you come across an irregularity in your dataset, it is good practice to include that image in your defect book. Irregularities can include color, shape, the cropping of the image, etc. This will prevent labelers from tagging similarly irregularities in contradictory ways or treating them as “OK”.

Remember, your defect book is a living, breathing document that should be constantly updated. Don’t be afraid to add more examples - the more, the better.

In general, each class featured in a defect book will have at least:

  • 4 representative examples
  • 3 counterexamples
  • 3 borderline cases
  • Some edge cases if needed


How can this chip defect section be improved to cover the red center chip case in Figure 1?

Click to reveal solution!
If your instinct was to include an example of the overexposed chip defect with an image description, you are correct! Adding this rare case to your defect book can help labelers make the right labeling decision when faced with a case they have not encountered before.

You can apply these tips to this stage of the ML lifecycle: