Cross Validation in Weka

So, here is the scenario again: you have 100 labeled data

Use training set

  • weka will take 100 labeled data
  • it will apply an algorithm to build a classifier from these 100 data
  • it applies that classifier AGAIN on these 100 data
  • it provides you with the performance of the classifier (applied to the same 100 data from which it was developed)

Use 10 fold CV

  • Weka takes 100 labeled data

  • it produces 10 equal sized sets. Each set is divided into two groups: 90 labeled data are used for training and 10 labeled data are used for testing.

  • it produces a classifier with an algorithm from 90 labeled data and applies that on the 10 testing data for set 1.

  • It does the same thing for set 2 to 10 and produces 9 more classifiers

  • it averages the performance of the 10 classifiers produced from 10 equal sized (90 training and 10 testing) sets

Let me know if that answers your question.


I would have answered in a comment but my reputation still doesn't allow me to:

In addition to Rushdi's accepted answer, I want to emphasize that the models which are created for the cross-validation fold sets are all discarded after the performance measurements have been carried out and averaged.

The resulting model is always based on the full training set, regardless of your test options. Since M-T-A was asking for an update to the quoted link, here it is: https://web.archive.org/web/20170519110106/http://list.waikato.ac.nz/pipermail/wekalist/2009-December/046633.html/. It's an answer from one of the WEKA maintainers, pointing out just what I wrote.