Smart Cropping Logic for Turntable Video Selected Frame Images

type

status

date

slug

🧩 Purpose of Our Cropping Method

In our Turntable Video Dataset Generator, we needed a reliable way to cleanly crop the object in selected video frame.

While YOLO is powerful, the bounding boxes it returns often contain too much background, especially on one side. So, we introduced a post-detection custom cropping method that precisely trims the product closer to its true edges.

The python code is shared here:

In our Turntable Video Dataset Generator, we needed a clean and tight crop of the product in each frame — better than what YOLO alone could provide. Here's how our refined crop logic works:

✨ Key Assumption

We assume that:

Inside the actual product region, the edge map (generated by Canny edge detection) will have a higher density of white pixels (1s) — due to textures, text, contours, etc.

So, if we remove low-density edge areas from the outer regions, what remains is a cleaner, tighter box around the product.

1. YOLO Detection (Optional Pre-step)

We first use YOLOv5 ONNX to detect a rough bounding box of the product.

But YOLO often includes too much background, especially with round or shiny objects.

✅ That’s why we follow it with a custom crop step.

2. Grayscale + Canny Edge Detection

We convert the image to grayscale and then apply Canny edge detection, which:

Detects sharp intensity changes (gradients) in the image.

Highlights edges — including product outlines, label text, folds, etc.

📌 Canny Threshold Tuning:

We tune the lower threshold carefully to make sure faint product details aren’t missed.

We visualize the gradient histogram (via CSV or plots) to decide this threshold.

3. Noise Filtering – Remove Weak Edge Bands

To remove background lines or shadows, we check edge density in small local bands.

Assume the image is 1000×800 pixels:

For each column x, we look at a band around it:

Band width = ±5% of width → about 100 columns total

For each column in that band, we count how many white (edge) pixels it has, calculating how many 1s in the band and then divide by 100 as average.

If the average is less than 3% of image height (≈ 24 pixels),

we remove column x from the edge map

We repeat the same check for horizontal rows.

4. Walk-Inward Edge Cropping

After filtering, the image is clean—but some tiny edge dots still float around outside the product.

So we use a walk-inward strategy to scan inward from the left, right, top, and bottom.

At each step, we count how many white pixels (edges) are in the current column or row. For example we set the min_ones_ratio as 0.05 and then

Image height = 1000 → vertical_thresh = 0.05 × 1000 = 50

Image width = 800 → horizontal_thresh = 0.05 × 800 = 40

We keep scanning until we find a column or row with >50 or >40 edge pixels. This avoids being fooled by small noise dots.

The first rows/columns that pass the threshold define a tight bounding box.

We then crop the image accordingly.

To be safe, we add ~10 pixels of padding on all sides, in case edges got trimmed too close.

✅ This step removes thin lines or weak shadows that could mislead the cropping logic.