Difference between Big-O and Little-O Notation

f ∈ O(g) says, essentially

For at least one choice of a constant k > 0, you can find a constant a such that the inequality 0 <= f(x) <= k g(x) holds for all x > a.

Note that O(g) is the set of all functions for which this condition holds.

f ∈ o(g) says, essentially

For every choice of a constant k > 0, you can find a constant a such that the inequality 0 <= f(x) < k g(x) holds for all x > a.

Once again, note that o(g) is a set.

In Big-O, it is only necessary that you find a particular multiplier k for which the inequality holds beyond some minimum x.

In Little-o, it must be that there is a minimum x after which the inequality holds no matter how small you make k, as long as it is not negative or zero.

These both describe upper bounds, although somewhat counter-intuitively, Little-o is the stronger statement. There is a much larger gap between the growth rates of f and g if f ∈ o(g) than if f ∈ O(g).

One illustration of the disparity is this: f ∈ O(f) is true, but f ∈ o(f) is false. Therefore, Big-O can be read as "f ∈ O(g) means that f's asymptotic growth is no faster than g's", whereas "f ∈ o(g) means that f's asymptotic growth is strictly slower than g's". It's like <= versus <.

More specifically, if the value of g(x) is a constant multiple of the value of f(x), then f ∈ O(g) is true. This is why you can drop constants when working with big-O notation.

However, for f ∈ o(g) to be true, then g must include a higher power of x in its formula, and so the relative separation between f(x) and g(x) must actually get larger as x gets larger.

To use purely math examples (rather than referring to algorithms):

The following are true for Big-O, but would not be true if you used little-o:

  • x² ∈ O(x²)
  • x² ∈ O(x² + x)
  • x² ∈ O(200 * x²)

The following are true for little-o:

  • x² ∈ o(x³)
  • x² ∈ o(x!)
  • ln(x) ∈ o(x)

Note that if f ∈ o(g), this implies f ∈ O(g). e.g. x² ∈ o(x³) so it is also true that x² ∈ O(x³), (again, think of O as <= and o as <)


Big-O is to little-o as is to <. Big-O is an inclusive upper bound, while little-o is a strict upper bound.

For example, the function f(n) = 3n is:

  • in O(n²), o(n²), and O(n)
  • not in O(lg n), o(lg n), or o(n)

Analogously, the number 1 is:

  • ≤ 2, < 2, and ≤ 1
  • not ≤ 0, < 0, or < 1

Here's a table, showing the general idea:

Big o table

(Note: the table is a good guide but its limit definition should be in terms of the superior limit instead of the normal limit. For example, 3 + (n mod 2) oscillates between 3 and 4 forever. It's in O(1) despite not having a normal limit, because it still has a lim sup: 4.)

I recommend memorizing how the Big-O notation converts to asymptotic comparisons. The comparisons are easier to remember, but less flexible because you can't say things like nO(1) = P.


In general

Asymptotic notation is something you can understand as: how do functions compare when zooming out? (A good way to test this is simply to use a tool like Desmos and play with your mouse wheel). In particular:

  • f(n) ∈ o(n) means: at some point, the more you zoom out, the more f(n) will be dominated by n (it will progressively diverge from it).
  • g(n) ∈ Θ(n) means: at some point, zooming out will not change how g(n) compare to n (if we remove ticks from the axis you couldn't tell the zoom level).

Finally h(n) ∈ O(n) means that function h can be in either of these two categories. It can either look a lot like n or it could be smaller and smaller than n when n increases. Basically, both f(n) and g(n) are also in O(n).

I think this Venn diagram (from this course) could help:

Venn Diagram for asymptotic notations

In computer science

In computer science, people will usually prove that a given algorithm admits both an upper O and a lower bound 𝛺. When both bounds meet that means that we found an asymptotically optimal algorithm to solve that particular problem Θ.

For example, if we prove that the complexity of an algorithm is both in O(n) and 𝛺(n) it implies that its complexity is in Θ(n). (That's the definition of Θ and it more or less translates to "asymptotically equal".) Which also means that no algorithm can solve the given problem in o(n). Again, roughly saying "this problem can't be solved in (strictly) less than n steps".

Usually the o is used within lower bound proof to show a contradiction. For example:

Suppose algorithm A can find the min value in an array of size n in o(n) steps. Since A ∈ o(n) it can't see all items from the input. In other words, there is at least one item x which A never saw. Algorithm A can't tell the difference between two similar inputs instances where only x's value changes. If x is the minimum in one of these instances and not in the other, then A will fail to find the minimum on (at least) one of these instances. In other words, finding the minimum in an array is in 𝛺(n) (no algorithm in o(n) can solve the problem).

Details about lower/upper bound meanings

An upper bound of O(n) simply means that even in the worse case, the algorithm will terminate in at most n steps (ignoring all constant factors, both multiplicative and additive). A lower bound of 𝛺(n) is a statement about the problem itself, it says that we built some example(s) where the given problem couldn't be solved by any algorithm in less than n steps (ignoring multiplicative and additive constants). The number of steps is at most n and at least n so this problem complexity is "exactly n". Instead of saying "ignoring constant multiplicative/additive factor" every time we just write Θ(n) for short.


I find that when I can't conceptually grasp something, thinking about why one would use X is helpful to understand X. (Not to say you haven't tried that, I'm just setting the stage.)

Stuff you know: A common way to classify algorithms is by runtime, and by citing the big-Oh complexity of an algorithm, you can get a pretty good estimation of which one is "better" -- whichever has the "smallest" function in the O! Even in the real world, O(N) is "better" than O(N²), barring silly things like super-massive constants and the like.

Let's say there's some algorithm that runs in O(N). Pretty good, huh? But let's say you (you brilliant person, you) come up with an algorithm that runs in O(NloglogloglogN). YAY! Its faster! But you'd feel silly writing that over and over again when you're writing your thesis. So you write it once, and you can say "In this paper, I have proven that algorithm X, previously computable in time O(N), is in fact computable in o(n)."

Thus, everyone knows that your algorithm is faster --- by how much is unclear, but they know its faster. Theoretically. :)