How many images(minimum) should be there in each classes for training YOLO?

There is no minimum images per class for training. Of course the lower number you have, the model will converge slowly and the accuracy will be low.

What important, according to Alexey's (popular forked darknet and the creator of YOLO v4) how to improve object detection is :

For each object which you want to detect - there must be at least 1 similar object in the Training dataset with about the same: shape, side of object, relative size, angle of rotation, tilt, illumination. So desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds - you should preferably have 2000 different images for each class or more, and you should train 2000*classes iterations or more

https://github.com/AlexeyAB/darknet

So I think you should have minimum 2000 images per class if you want to get the optimum accuracy. But 1000 per class is not bad also. Even with hundreds of images per class you can still get decent (not optimum) result. Just collect as many images as you can.