Product detection in movies and TV shows using machine learning – part 4: Start a new dataset

Public image datasets are very handy when it comes to ML training but at some point you’ll face a product/logo that are not covered by any existing dataset. In our case, we are experimenting with detecting Cadbury Roses and Cadbury Heroes products. We need to construct an image dataset to cover these two products.

Two steps to put the elephant in the fridge:

Open the fridge

There are a few sources for in-the-wild images:

  1. Google Images – search “Cadbury Heroes” and “Cadbury Roses”, then use downloading tool such as Download All Images (https://download-all-images.mobilefirst.me/) to fetch all image results.
  2. Flickr – same process as above.
  3. Instagram – use image acquisition tool such as Instaloader (https://instaloader.github.io) to fetch images based on hashtags (#cadburyheroes and #cadburyroses).

The three sources provide around 5,000 raw images with a significant amount of duplicates and unrelated items. A manual process is needed to filter the dataset. Going through thousands of files is tedious, so to make things slightly easier, I made a small GUI application. When you first start the application, it prompts for your image directory. Then it loads the first image. You then use Left and Right arrow key to decide whether to keep the image for ML training or discard it (LEFT to skip and RIGHT to keep). No files are deleted and instead they are moved to corresponding sub-directories “skip” and “keep”. Once one of the two arrow keys is pressed, the application loads the next image. It’s pretty much a one-hand operation so you have the other hand free to feed yourself coffee/tea… The tool is available on Github. It’s based on wxPython and I’ve only tested it on Mac (pythonw).

Insert elephant

Labelling the dataset requires manual input of bounding box coordinates and label. A few tools are available including: LabelImg and Yolo_mark. I also set up “Video labelling tool” as one of the assignment topics for my CS module Media Technology. So hopefully we’ll see some interesting designs for video labelling. In this case we use Yolo_mark as it directly exports in the labelling format required by our framework.

Depending on the actual product and packaging, the logo layout and typeface varies. I am separating out as four classes Cadbury logo, “Heroes” (including the one with the star), “Roses” in precursive (new), and “Roses” in cursive (old) and code them as cadbury_logo, cadbury_heroes, cadbury_roses_precursive, and cadbury_roses_cursive.

Product detection in movies and TV shows using machine learning – part 1: Background

Product detection in movies and TV shows using machine learning – part 2: Dataset and implementations

Product detection in movies and TV shows using machine learning – part 3: Training and Results

One thought on “Product detection in movies and TV shows using machine learning – part 4: Start a new dataset

Leave a comment