Skip to main content

Common Usecases of Metadata

You can use these metadata tips to get the most out of your data!

Tip 1: Use metadata to tag ambiguous images#

You can use metadata to mark images that you are not sure how to label. During the labeling process, you or your labelers may find certain edge cases that are not covered by the existing defect book and that you are not confident how to label them. In these cases, you will usually want to mark those images as “ambiguous” and will find a time to discuss with SMEs to confirm.

For instance, going back to the pill defect project, if you notice your model is having issues deciding whether a scratch on a pill is OK or NG, then you would want to look into the scratch section of your dataset and see if there are any inconsistencies. Looking at your entire dataset is inefficient:

1_3

Figure 1. A small sample of the pill defect dataset

Instead, try tagging all confusing scratch defects with the ambiguous label so you can focus on the slice of data that is causing your model issues.

1_3

Figure 2. Pill scratch defects with the ‘ambiguous’ metadata tag

From there, you can better examine how you have labeled your data by using Tip 3 from the Defect Book tips and trying to create a clear rule for borderline cases:

1_3

Figure 3. Pill scratch defects data slice

You can also pull out this data slice and ask a subject matter expert for their opinion. Usually a subject matter expert’s time is limited, presenting them with a specific data slice can make your labeling review process more efficient.

Tip 2: Use metadata to mark a high priority data slice during evaluation#

Let’s say you deploy your model to the production line and discover 17 False Negative examples. You can upload them to LandingLens and tag them with the “leak-online” metadata. In the following iterations, you can use these tags to keep track of this important subset and make sure your new model can make accurate predictions on them.

Conversely, there might be a subset of images where the background is noisy or the region of interest is different from the majority of your samples. You can tag them with the ‘noise’ or ‘anomaly’, so you can show these oddities to your customers and/or try excluding them from model iterations.

Tip 3: Use metadata to track performance of data slices#

Tagging your images with metadata can help you retain information about your images and for you to track the performance of different production lines or camera setups. This process can also help you calculate your model’s performance on different data slices.

You can try separating your dataset into specific subgroups, for example: Tagging the production line that the data were collected from Tagging the configuration of the imaging system that the data were captured: exposure time, background, lighting, etc. Tagging different products (blue pills versus orange pills)

This will ultimately give you more insight into the model performance in different data slices.


You can apply these tips to this stage of the ML lifecycle:

data_lifecycle