Random Noise: HOG

Histogram of Oriented Gradients.

Use derivative mask [1 0 –1], [1 0 –1]^T in one or both directions of the image. The gradient magnitude will be used.
A Cell (ex. 6x6 pixels) and a Block (ex. 3x3 cells) are defined as following figure.
Create oriented histogram bin: the bin can be spread over 0-180 degree if using signed values, and 0-360 degree if using unsigned values. For example, using a 20 degree wide bins, we get 9 bins. Every pixel (use gradient value) inside the cell is used as a weighted vote for the bins.
Local normalization in block. The normalization factor could be: , where v is the non-normalized vector containing all histograms in a given block, ||v||2 is the 2-norm: , e is a small constant.
The HOG descriptor is the vector of the components of the normalized cell histograms from all of the block region […,…,…]. The blocks are usually have 1/4 or 1/2 overlaps.

HOG is widely used descriptor due to the speed of computations, inherent robustness to slight object variation/deformations, and the ability to capture a coarse spatial layout of features.

Update: 4/17/2012
Some implementation details:
Suppose the image size is 320x240, cell size is 8x8, so we have 40x30 blocks. Suppose 18 orientations are used, we define two vectors for x,y directions:

double uu[9] = {1.0000,

0.9397,

0.7660,

0.500,

0.1736,

-0.1736,

-0.5000,

-0.7660,

-0.9397};

double vv[9] = {0.0000,

0.3420,

0.6428,

0.8660,

0.9848,

0.8660,

0.6428,

0.3420};

To check which orientation the current pixel locates at:

loop dx, dy within image size

double best_dot = 0;

int best_o = 0;

for (int o = 0; o < 9; o++) {

double dot = uu[o]*dx + vv[o]*dy;

if (dot > best_dot) {

best_dot = dot;

best_o = o;

} else if (-dot > best_dot) {

best_dot = -dot;

best_o = o+9;

}

Create the histograms for 18 orientations. The pixel values in each orientation are added up.

Computer energy in each cell by summing over orientations

Output features. The features could be:

contrast-sensitive features (size 18): it is the pixel energy in each orientation

contrast-insensitive features (size 9): the energy in each orientation (no sign)

texture features (size 4): check the neighbor 4 pixel energy

So the output features size is 40x30x31. (31=18+9+4)

Random Noise

Wednesday, April 04, 2012

HOG

No comments:

Post a Comment

About Me

Categories

Blog Archive