K-fold cross-validation

以下是取材自wiki的資料。

http://en.wikipedia.org/wiki/Cross-validation

In K-fold cross-validation, the original sample is partitioned into K subsamples. Of the K subsamples, a single subsample is retained as the validation data for testing the model, and the remaining K − 1 subsamples are used as training data. The cross-validation process is then repeated K times (the folds), with each of the K subsamples used exactly once as the validation data. The K results from the folds then can be averaged (or otherwise combined) to produce a single estimation. The advantage of this method over repeated random sub-sampling is that all observations are used for both training and validation, and each observation is used for validation exactly once. 10-fold cross-validation is commonly used.

If there are many more positive instances than negative instances in a dataset, there is a chance that a given fold may not contain any negative instances. To ensure that this does not happen, stratified K-fold cross-validation is used where each fold contains roughly the same proportion of class labels as in the original set of samples.

發佈留言