Why is time complexity for BFS/DFS not simply O(E) instead of O(E+V)?

If it is given that E = kV + c, for some real constants k and c then,
O(E + V) = O(kV + c + V) = O(V) = O(E) and your argument is correct.

An example of this is trees.

In general, however, E = O(V^2), and thus we cannot do better than O(V^2).

Why not write just O(E) always?
Note that there might be cases when there are no edges in the graph at all (i.e. O(E) ~ O(1)). Even for such cases, we'll have to go to each of the vertex (O(V)), we cannot finish in O(1) time.

Thus, only writing O(E) won't do in general.