Skip to main content

Tip 1: Describe your defect’s visual features

Detailed descriptions of each defect you want to detect is the first step towards helping your labelers understand what they should label. Vague and nonspecific descriptions can lead to confusion for a labeller. Take for example, this scratch defect definition in this bare-bones pill defect book:


scratch_defect

Figure 1. Scratch defect definition in bare-bones pill defect book.


It is a good start, but the vague description and lack of real labeled examples leaves a lot of room for interpretation of what the defect is. With this defect book, labelers will be forced to guess the defining characteristics of the class and interpret the images themselves. For example, labelers might interpret ‘scratch defect’ to be any white section that appears on the orange pill coating. This can lead to images like the one featured in Figure 2 being labeled as a scratch defect, when the correct label is actually chip defect. Inconsistently labelled images will cause your model to learn the wrong signals to predict defects.


chip_defect

Figure 2. Chip defect, often mislabelled as scratch defect.


A better defect book will:

  • Have clear and practical labeling rules. Avoid using words such as "big" or "small". Instead, describe in numerical terms like number of pixels or a reference object. It is especially helpful to write descriptions gives labelers visual clues in categories: color, size/length, shape, texture, density, contrast, location
  • Include real labeled examples. Avoid having examples in which the defects are only pointed or highlighted, it's important for the labelers to see how the labeled defect should look like.

Quiz:#

Which description of the scratch defect gives clear and practical labelling rules and real labeled examples?


A. 1a
B.1b

C.1c

D.1d

Click to reveal solution!
A. This defect description fails to describe in terms of these visual categories: color, size/length, shape, texture, density, contrast, location. Pro tip: When writing your defect description, you can check if you have sufficient detail by asking yourself: ‘Can someone with no prior knowledge of the project read my description and understand how to label a defect?’

B. Uh oh! This example has real labeled examples but uses unnecessarily complicated language like ‘patina’, when they could just use the word ‘coating’. Pro tip: Avoid using acronyms or specific terminologies without explaining them.

C. If you chose C, you are correct! This example features a clear description of the distinguishing visual features and real labeled examples, which removes ambiguity for labelers.

D. This description describes some visual features but does not use measurables that help the labeler distinguish the scratch defect from other defects or non-defective examples. Pro tip: Avoid using the name of the defect in the description.



You can apply these tips to this stage of the ML lifecycle:

data_lifecycle