How to preserve ignorance in backpropagation learning?

You might be better off using (non-parametric) Bayesian methods such as Gaussian Processes or kernel methods. They provide a posterior distribution that not only gives you a prediction for a new data point, but also the certainty in the form of the variance.

With neural networks it is also possible, but less rigorous. You can use the softmax activation function in the output layer to produce something that resembles a probability distribution for each class. All outputs will be between zero and one and they sum up to one if you sum over all classes.