The prediction is from one or more previously encoded video frames. What H.264 differs from earlier standards include: a range of block size (from 4x4 to 16x16) and sub-pixel motion vectors.
Large block size(16x16, 8x16 …) means high-energy residual but small number of motion vectors. Small block size(4x4, 8x4 …) is on the other way. Usually large block size is appropriate for homogeneous areas and small block size is used for detailed areas.
For chroma blocks, it is the same as the luma component, except that the block sizes have half the horizontal and vertical resolution (16x8 –> 8x4). The motion vector are halved too.
Each motion vector must be coded and transmitted. The difference between the current vector and the predicted vector must be encoded and transmitted too.
In H.264, the offset between the two areas has 1/4 pixel resolution. At half-pixel positions, use interpolation from nearby integer-pixel samples using 6-tap Impulse Response Filter to get it. This means each half-pixel sample is a weighted sum of 6 neighboring integer samples. Once half-pixel samples are availabe, 1/4 pixel is produced using biliner interpolation between neighboring half or integer-pixel samples
Subpixel motion compensation can provide better compression performance (because subpixel makes the prediction more accurate, fewer data to encode), but it increased complexity too.