UNC COMP 775 Image Processing and Analysis

This recent paper survey gives an overview of 3DMM (PCA/ASM/AAM) : [1909.01815] 3D Morphable Face Models -- Past, Present and Future

Eigenface - Face Recognition using Principal Component Analysis - MachineLearningMastery.com

Problem setup: Assume we have a bunch of pictures of human faces, all in the same pixel dimension (e.g., all are r×c grayscale images). If we get M different pictures and vectorize each picture into L=r×c pixels, we can present the entire dataset as a L×M matrix (let’s call it matrix A), where each element in the matrix is the pixel’s grayscale value. Note that each row/column(depending on pca) of A corresponds to one picture - this is important to understand the following assignments.

Resources

luchaoqi/UNC-COMP-775-Project: UNC COMP 775 final project

SamuelDGeorge (Samuel George)

Assignment 1 Hough Transform

Hough transform can be used to find circles/disks in images. From high level, we follow the steps as bellows:

Take the derivatives of each pixel in the image. Especially considering the pixels on the circle , we can have bunch of norm vectors pointing to the center of circle.
Given a radius (estimation or ground truth of circle), we follow the norm vectors and vote the corresponding possible center i.e. for each pixel there is a corresponding vote as it's center and we can mark the center e.g. +1 in a matrix. Thus we have a accumulative matrix where local maxima could be highly possible center of circle.
Take the local maxima of accumulative matrix as circle and draw a circle with radius in step 2.

Note that sometimes people can do derivatives only for pixels on margins (using margin/edge detection e.g. Gaussian convolution / Laplacian of gaussian / Canny detection) to save computations.

Due to noise, we can't do above steps on only one pixel but all the pixels on the circle to get an estimation of center. And also due to noise, we do derivatives of gaussian first instead of image data itself in the process of edge detection - Introduction

Assignment 2 Segmentation via Deformable Models Methods

Preliminaries / background:

Cootes - overview

Index of /rbf/CVonline/LOCAL_COPIES/COOTES

PDMs

This is important for understanding PDM (representations of point distribution in PCA) / ASM (gradient-driven segmentation) / AAM (intensities-driven segmentation)

p119b.pdf

Active Shape Model (ASM)

ASM step by step DIP Lecture 25: Active shape models - YouTube

Cootes' paper

From Cootes's paper

From atlas(reference training data), we want to do segmentation on test data i.e. find landmarks on the test data ( $b_{test}$ )

1) PCA to get the shape space - Given training images $X$ , we manually select landmarks/mask for each image and perform PCA on those points to get eigenvectors $P$ and eigenvalues $b$ (projections on each eigenvector) such that any shape $Z$ (formed by multiple landmarks) can be represented as $Z \approx \bar{X} + P*b$ (This formula can be understood as representing data $Z$ in eigenspace and $P *b$ is just the vector from mean pointing to $Z$ , we use $\approx$ due to dimension reduction)

PCA on all landmarks across all images - not on one image. Imagine have all landmarks from one image on each column so each column represent a shape. See p119b.pdf shown above for details

2) Given test image $Y$ , we first need to find the target point $\hat{y}$ i.e. the nose point in $X$ should correspond to nose point in $Y$ . So we follow the normal vector of data point in $X$ , and find the point in $Y$ that has the maximum gradient along this direction (large gradient usually means edges) e.g. select 5 data points following the norm vector and 5 data points following the opposite norm vector direction and find the maxima among 11 data points selected.

This assume $X$ and $Y$ are in a common frame coordinate i.e. images after translation/rotation/scale

Also when we follow the normal vector of data point in $X$ , we are using $\bar{X}$ as initializations ( $\hat{y_0}$ ) where $b$ starts from 0, and then update $\hat{y}$ iteratively

DIP Lecture 25: Active shape models - YouTube:

3) Now we have an initial target $\hat{y}$ and this is not our final goal - the ground truth target point $y$ . There are multiple reasons such that 10 data points might not be enough to reach the ground truth $y$ and sometimes we might need to select 20 data points in step2 OR gradient is not appropriate enough etc. We repeat step2 using calculated $\hat{y}$ as initializations to get a new $\hat{y}$ until converged. Then the final $b$ that represent $y$ is the result.

Our final goal is to find $b$ to represent all landmarks $y$ (2-n dimension vector) such that $y \approx \hat{X} + P*b$ .

In summary: ASMs

We assume we have an initial estimate for the pose and shape parameters (eg the mean shape). This is iteratively updated as follows:

Look along normals through each model point to find the best local match for the model of the image appearance at that point (eg strongest nearby edge)

Update the pose and shape parameters to best fit the model instance to the found points

Repeat until convergence

AAM

PCA on shape and grey level information respectively to get $b_s, b_g$ , then do PCA on concatenated $b_s, b_g$ again.

The motivation is that we want to change the PDM information with constraint of intensities ( $\triangle I$ driven instead of norm vector driven).

In ASM, we move $PDM$ using normal vectors while in AAM, we move $PDM$ using $I$ .

cootes-eccv-98.pdf

As shown in ASM, the shape of any example (all landmarks) can be summarized by a vector $b_s$ .

Now we consider grey-level intensities, and do the same procedure in ASM and thus any example can be summarized by a vector $b_g$ .

Motivation: since there may be correlation between shape and grey-level variations, we apply a further PCA again to the data as follows

$\mathbf{b}=\left(\begin{array}{c}\mathbf{W}_{s} \mathbf{b}_{s} \\ \mathbf{b}_{g}\end{array}\right)=\left(\begin{array}{c}\mathbf{W}_{s} \mathbf{P}_{s}^{T}(\mathbf{x}-\overline{\mathbf{x}}) \\ \mathbf{P}_{g}^{T}(\mathbf{g}-\overline{\mathbf{g}})\end{array}\right)$

Where $\mathbf{W}_{s}$ is a diagonal matrix of weights for each shape parameter, allowing for the difference in units between the shape and grey models.

We have $b$ that can be represented as follows after PCA

$b=Q c=\left(\begin{array}{l}Q_{s} \\ Q_{g}\end{array}\right) c$

previously, we represent the shape and grey-levels of any example without PCA as $\mathbf{x}=\overline{\mathbf{x}}+\mathbf{P}_{s} \mathbf{b}_{s} \\ \mathbf{g}=\overline{\mathbf{g}}+\mathbf{P}_{g} \mathbf{b}_{g}$

now, for any given example, we can represent the shape as well as grey-levels as functions of $c$ as our target goal instead of $b$ in ASM.

$\mathbf{x}=\overline{\mathbf{x}}+\mathbf{P}_{s} \mathbf{W}_{s} \mathbf{Q}_{s} \mathbf{c}, \mathbf{g}=\overline{\mathbf{g}}+\mathbf{P}_{g} \mathbf{Q}_{g} \mathbf{c} \\ \mathbf{Q}=\left(\begin{array}{c}\mathbf{Q}_{s} \\ \mathbf{Q}_{g}\end{array}\right)$

Where $Q$ are eigenvectors and $c$ is a vector of appearance parameters controlling both the shape and grey-levels of the model.

From the lecture slides:

Like the picture shown above, given test image, similar to ASM, we use mean as initialization:

calculate $\triangle I$ to calculate $\triangle c$ using linear regression so we get $\triangle PDM$ - note this $\triangle I$ is the just the difference between reference and target
apply $\triangle PDM$ to test image to move points.
apply $\triangle I$ to reference image (atlas) to update intensities information.
repeat above steps until converged

Assignment 3 Registration Methods

The Mahalanobis distance is a measure between a sample point and a distribution.

Metrics to evaluate data dis-match:

Sum of squared distance - SSD - fails when contrast is reversed (reverse black and white image)
normalized correlation
quantile function - need to choose a feature
mutual information

Registration:

Elastic method
Fluid method
LDDMM
a) What is the main advantage of elastic registration methods over LDDMM methods? Simple and fast model that enforces smoothness of the displacement field as it regularize of $u$ - works pretty well for small displacement. b) What are the two main advantages of LDDMM methods over elastic registration methods? LDDMM works when large deformations are required. It is well known that elastic deformation models cannot guarantee diffeomorphic solutions (folding) if large deformations are required within the registration. LDDMM and fluid-based models can avoid folding.
As an integral involving only L2 norms give a metric whose geodesics may be used in LDDMM. L2 loss - You wanted the L2 norm on Lv, where L is a differential operator.