Wednesday, April 04, 2012

HOG

Histogram of Oriented Gradients.
  • Use derivative mask [1 0 –1], [1 0 –1]^T in one or both directions of the image. The gradient magnitude will be used.
  • A Cell (ex. 6x6 pixels) and a Block (ex. 3x3 cells) are defined as following figure. HOG_Cell_Block
  • Create oriented histogram bin: the bin can be spread over 0-180 degree if using signed values, and 0-360 degree if using unsigned values. For example, using a 20 degree wide bins, we get 9 bins. Every pixel (use gradient value) inside the cell is used as a weighted vote for the bins.
  • Local normalization in block. The normalization factor could be:HOG_Norm_Factor , where v is the non-normalized vector containing all histograms in a given block, ||v||2 is the 2-norm:P_Norm , e is a small constant.
  • The HOG descriptor is the vector of the components of the normalized cell histograms from all of the block region […,…,…]. The blocks are usually have 1/4 or 1/2 overlaps. 
HOG is widely used descriptor due to the speed of computations, inherent robustness to slight object variation/deformations, and the ability to capture a coarse spatial layout of features.

Update: 4/17/2012
Some implementation details:
Suppose the image size is 320x240, cell size is 8x8, so we have 40x30 blocks.  Suppose 18 orientations are used, we define two vectors for x,y directions:

double uu[9] = {1.0000, 
        0.9397, 
        0.7660, 
        0.500, 
        0.1736, 
        -0.1736, 
        -0.5000, 
        -0.7660, 
        -0.9397};
double vv[9] = {0.0000, 
        0.3420, 
        0.6428, 
        0.8660, 
        0.9848, 
        0.9848, 
        0.8660, 
        0.6428, 
        0.3420};

To check which orientation the current pixel locates at:

loop dx, dy within image size
double best_dot = 0;
int best_o = 0;
for (int o = 0; o < 9; o++) {
    double dot = uu[o]*dx + vv[o]*dy;
    if (dot > best_dot) {
      best_dot = dot;
      best_o = o;
    } else if (-dot > best_dot) {
      best_dot = -dot;
      best_o = o+9;
    }
}

Create the histograms for 18 orientations.  The pixel values in each orientation are added up.
Computer energy in each cell by summing over orientations
Output features. The features could be:
contrast-sensitive features (size 18): it is the pixel energy in each orientation
contrast-insensitive features (size 9): the energy in each orientation (no sign)
texture features (size 4): check the neighbor 4 pixel energy 

So the output features size is 40x30x31. (31=18+9+4)



No comments:

Post a Comment