## Wednesday, April 04, 2012

### HOG

• Use derivative mask [1 0 –1], [1 0 –1]^T in one or both directions of the image. The gradient magnitude will be used.
• A Cell (ex. 6x6 pixels) and a Block (ex. 3x3 cells) are defined as following figure.
• Create oriented histogram bin: the bin can be spread over 0-180 degree if using signed values, and 0-360 degree if using unsigned values. For example, using a 20 degree wide bins, we get 9 bins. Every pixel (use gradient value) inside the cell is used as a weighted vote for the bins.
• Local normalization in block. The normalization factor could be: , where v is the non-normalized vector containing all histograms in a given block, ||v||2 is the 2-norm: , e is a small constant.
• The HOG descriptor is the vector of the components of the normalized cell histograms from all of the block region […,…,…]. The blocks are usually have 1/4 or 1/2 overlaps.
HOG is widely used descriptor due to the speed of computations, inherent robustness to slight object variation/deformations, and the ability to capture a coarse spatial layout of features.

Update: 4/17/2012
Some implementation details:
Suppose the image size is 320x240, cell size is 8x8, so we have 40x30 blocks.  Suppose 18 orientations are used, we define two vectors for x,y directions:

double uu[9] = {1.0000,
0.9397,
0.7660,
0.500,
0.1736,
-0.1736,
-0.5000,
-0.7660,
-0.9397};
double vv[9] = {0.0000,
0.3420,
0.6428,
0.8660,
0.9848,
0.9848,
0.8660,
0.6428,
0.3420};

To check which orientation the current pixel locates at:

loop dx, dy within image size
double best_dot = 0;
int best_o = 0;
for (int o = 0; o < 9; o++) {
double dot = uu[o]*dx + vv[o]*dy;
if (dot > best_dot) {
best_dot = dot;
best_o = o;
} else if (-dot > best_dot) {
best_dot = -dot;
best_o = o+9;
}
}

Create the histograms for 18 orientations.  The pixel values in each orientation are added up.
Computer energy in each cell by summing over orientations
Output features. The features could be:
contrast-sensitive features (size 18): it is the pixel energy in each orientation
contrast-insensitive features (size 9): the energy in each orientation (no sign)
texture features (size 4): check the neighbor 4 pixel energy

So the output features size is 40x30x31. (31=18+9+4)