Skip to main content

Introduction

To reveal some of the dataset’s features that consist of mostly linear “zones” parallel to the y-axis (in computer vision the axes are flipped relative to the ones in Euclidean geometry) we can apply a range of well-known feature extraction methods from the original images to produce new images that highlight simple features such as edges and line segments. In the literature such feature extraction methods are divided into transform-based and backbone-based. Transform-based methods are simpler and aim to reveal features at raw pixel level such as edges or parametric level such as line segments. Backbone-based methods rely on a backbone neural network and produce parametric features such as line segments or lines. The reasons we embarked in this exercise is twofold:
  1. We can use the new transform-based images as input to downstream AD tasks. The transformed images can be used by themselves in interpretable AD schemes that rely on characterizing in a discriminative way the edges or the lines of each machine setting.
  2. The images we had at our disposal are blurry and such transforms may offer a downstream task a better chance to discriminate the anomalous from the nominal images.

Edge Detection - Sobel Transform

The Sobel transform is a well-known edge detection algorithm that is used in computer vision to detect edges in images. The Sobel operator is a discrete differentiation operator, computing an approximation of the gradient of the image intensity function. The operator is based on convolving the image with a small, separable, and integer-valued filter in the horizontal and vertical directions and is therefore relatively inexpensive in terms of computational cost. At each point in the image, the result of the Sobel–Feldman operator is either the corresponding gradient vector or the norm of this vector.

Transformation Results

The Sobel transform can be applied to the Seamagine dataset images to highlight edges in the seam zones.

Edge and Line Detection - Canny & Hough Transforms

In this section we look at another edge detection algorithm, the Canny edge detection algorithm paired with the Hough Transform for line detection. These two algorithms typically are applied together and result in a parametric representation of the edges in the image. The Canny edge detection algorithm is a widely used technique in computer vision for detecting edges in images. It works by identifying areas of rapid intensity change, which typically correspond to edges. The process involves several steps:
  1. Applying Gaussian smoothing to reduce noise
  2. Calculating intensity gradients
  3. Applying non-maximum suppression to thin the edges
  4. Using double thresholding to identify strong and weak edges
  5. Edge tracking to connect the weak edges to strong ones
Once edges are detected, the Hough Transform is often applied to detect lines within the image. The Hough Transform maps points in the image space to curves in a parameter space, enabling the detection of geometric shapes like lines. By analyzing the accumulation of points in this parameter space, the algorithm identifies line segments characterized by their orientation and position.

Parametric Line Detection - SOLD²

SOLD² is a state-of-the-art framework for detecting and describing line segments in images. Unlike traditional methods that focus solely on detection, it is designed to simultaneously detect line segments and learn robust descriptors for matching them across different images. SOLD² is a backbone-based feature extraction method and uses a heatmap-based approach to detect line segments, combined with a descriptor learning module for feature matching. The framework is end-to-end trainable and can be fine-tuned to the specific dataset. Its sensitivity and accuracy can be adjusted through parameters such as confidence thresholds and non-maximum suppression.
Due to time constraints no finetuning was performed on the pretrained models.
SOLD² looks promising for further evaluation in a subsequent phase of the project where we can use the produced line segments to provide a semantic characterization of the seam image. Seams are divided into zones and these zones are between lines or better splines1. Therefore, if we have a way to isolate zones, irrespective of the absolute pixel coordinates of each zone, we can produce a disentangled representation of each seam using per-zone features. The feature vector of the seam will be the concatenation of the per-zone feature vectors and this will allow tremendous flexibility in all kinds of downstream2 tasks. For example, in other sections of this report you can observe image-level representation learning algorithms and their performance. Disentangled representations will perform better almost by definition especially when coupled with domain-specific deformation-based anomaly detection models for certain zones.
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Footnotes

  1. A spline is a curve defined by a mathematical function that is piecewise-defined by polynomial functions.
  2. Downstream tasks are tasks that are performed after the feature extraction phase. They can be classification, clustering, anomaly detection, etc.