Skip to main content

Tip 4: Provide clear explanations of borderline cases

You are probably sensing a theme here - the more ways to eliminate ambiguity with your defect book, the better. This was your defect book last time you updated it:


Figure 1. Scratch defect book with distinguishing visual features definitions, counterexample samples, and instructions on how to differentiate easily confused classes. If you spotted that this description is missing labeling instructions, give yourself a pat on the back!

As you personally go through your dataset and label more samples, you may notice that your initial understanding of your defect was incomplete or incorrect. For example, these pills are classified as “OK” (not defective):


Figure 2. Three non-defective (OK) pill samples with small scratches.

However, this is inconsistent with the current definition of the scratch defect. There are scratch defects smaller in length than the middle “OK” sample in Figure 2, but are classified as a defect in the scratch defect examples. In fact, if you arranged the pills on a spectrum according scratch length, you can visualize the labeling inconsistency:


Figure 3. Scratched pills, ordered by scratch length (left: shortest, right: longest) along with their initial labels: “OK” and “NG” (not good)

You can also plot the quality of the samples in Figure 3 against scratch length, to achieve a numerical understanding of how the samples were labelled:


Figure 4. Sample quality versus scratch length.

Once you visualize the severity of the defect on a spectrum, you decide on a threshold for a case to be considered a defect. You might, after some thought about your product or consulting with the internal team, decide that the threshold is three millimeters. If you relabel the samples according to this threshold, your spectrum will exhibit a clear delineation between acceptable and defective cases:


Figure 5. Pill scratch abnormalities ordered by increasing scratch length. Those below the threshold of three millimeters are relabeled as “OK” and those above are labeled as “NG”.

Clear delineation reduces labeling errors as well as false negatives and false positives. Replotting the graph in Figure 4 according to these new labels, you can observe that there is a now clear, deterministic function that maps a defect to a label by a metric, such as scratch length:


Figure 6. Relabeled sample quality versus scratch length. Any scratch that falls in the blue region will be labeled as NG.

A quality defect book will:

  • Include an image spectrum of cases for each defect. This spectrum will help labelers differentiate between “OK” and “NG” cases, so be sure to include difficult or borderline cases, not just obvious ones!
  • Determine a measurable threshold that samples can be evaluated against. Labelers should be able to easily decide if a case is “OK” or “NG” based on this threshold.


Take a second to think about how you would improve this scratch defect section in the defect book to help labelers handle borderline cases.

Click to reveal solution!
If your instinct was to display a spectrum of cases and include a measurable threshold in the description, you are correct! Machine learning engineers who include borderline cases and constantly update their defect book as they dive deeper into their project will find an improvement in labeling consistency and increased model performance.

You can apply these tips to this stage of the ML lifecycle: