Thursday, May 19, 2011
In k-fold cross validation, the initial data are randomly partitioned into k mutually exclusive subsets or 'folds', D1, D2, ...Dk, each of approximately equal size. Training and testing is performed k times. In iteration i, partition Di is reserved as the test set, and the remaining partitions are collectively used to train the model. So each sample is used the same number of times for training and once for testing. For classification, the accuracy estimate is the overall number of correct classification from the k iterations, divided by the total number of items in the initial data.
In general, 10-fold cross-validation is recommended for estimating accuracy due to its relatively low bias and variance.