Freakin’ asymptotic notation: How does that work?
OK, I’ll be the first to admit it – math is my favorite subject. I know that makes me weird – even among fellow working software engineers – but it’s the truth. It’s also the truth that while they may be boring to many, knowing even a little bit about the numbers of computer science can help make you a better software engineer. Today, we’re going to look at a little piece of applied discrete math – the analysis of algorithms – and how we can empirically describe the performance of one algorithm with respect to another using something called “asymptotic notation”. If you already know what this is and how to read the various sub-notations, skip the rest of this post: it’s old news. If you do not know what this is, and would like to, read on! If you do not know what this is and don’t care, read on anyway – I promise you it’ll be worth it.
Now, while I’m totally guilty of bandying around big scary terms like “asymptotic notation” (I’m a notorious stickler for nomenclature), what this boils down to is the following mathematical notation (and its related brothers, which we’ll get to later):
[O(g(n))]
Most typically, you’ll see this used and referred to not as (O(g(n))), which doesn’t serve much purpose unless we know already what (g(n)) is, but rather in terms of a specific (g(n)), say, (n^2):
[O(n^2)]
Moreover, for the purposes of this discussion, we’re going to look primarily at the following notation style:
[f(n) = O(n^2)]
So what on earth does this mean? Well, let’s think about algorithms. What is an algorithm? Simply a series of instructions for performing some task, right? In computing specifically an algorithm is implemented as an actual, written program in your language of choice. But under the hood, mathematically, an algorithm is a function (sometimes pure, sometimes not): it takes some input(s) and delivers some output. In the above block o’ math, let us equate (f(n)) with some algorithm we would like to implement, say, sorting an array of integers from least to greatest. With an algorithm such as that, you would give it an array of dubiously sorted values, and it would give you back an array containing exactly the same elements but in sorted order. Represented mathematically, we have:
[s(a)]
where (s) is the algorithm function itself, and (a) is the array to be sorted. Now, let’s keep in mind that there are many different ways of going about sorting an array – naturally, we would like to pick the fastest one for the task at hand. So in order to know anything about the performance of an algorithm, we should probably come up with some way to represent how long it takes to run. To wit:
[T(s(a))]
represents the “time” (T) it takes to run algorithm (s) on input (a). OK. So, how can we begin to reason about what the range of this function might be? Actually, perhaps that’s jumping the gun – what would really be the domain of this function? What measurement of your general work-a-day sorting algorithm would dictate how long it takes to run? The answer, my friend, is the size of the input: it should make sense to anybody who’s actually had to do it that sorting a 10-element array is a heck of a lot faster than sorting a 10,000,000-element array. So, really, our (T) function can be abstracted further as a function of (n), where (n) is the size of the array to be sorted:
[T(n)]
Since the value of (T(n)) will grow as (n) grows (presumably), this seems like a solid footing to start our thinking upon.
Alrighty. Now that we’ve got some notation to start thinking about how we’ll build this bad boy, let’s actually pick a real, concrete algorithm and think through it. For the purposes of simplicity and succinctness (cue laughter), I’m going to go with Bubble Sort. If you want to learn more about the actual algorithm, I suggest you read the wikipedia article linked to above, because I’m gonna skip over a fat exposition on what it actually does and how it does it in favor of thinking about how long it takes to do it, and specifically how long it takes to do it to the worse possible case of input: an array that’s already been sorted, but reversed. Please note I have copied the following pseudocode directly from wikipedia, and credit therefor is due directly to the author of the page linked to above.
(c_{2})
(c_{3}n)
(c_{4}n)
(c_{5}n(n-1))
(c_{6}n(n-1))
(c_{7}n(n-1))
(c_{8}n)
repeat
swapped = false
for i = 1 to length(A) – 1 inclusive do:
if A[i-1] > A[i] then
swap( A[i-1], A[i] )
swapped = true
end if
end for
until not swapped
end procedure
OK, so that’s the algorithm from wikipedia – but what’s all that nasty-looking business on the right? Put quite simply, that’s our best guess as to how long that line of code takes to execute! Note that each line is some function of (n) and some constant (c_{i}). Now if we wanted to figure out exactly how long this algorithm takes to run, we’d need to know the value of those contants, e.g. (c_{1} = 12.2ms) or something. You’ll notice that some lines only seem to take a constant time to run; that’s because those lines are only executed once regardless of how big the input is. Other lines are the associated constant multiplied by (n) – those lines will execute once for every unit of input. Still others are multiplied by (n(n-1)) – these lines are executed (n-1) times for each input. Now, handily, we can figure out our overall formula by just adding all these things together:
[T(n) = c_{1} + c_{2} + c_{3}n + c_{4}n + c_{5}n(n-1) + c_{6}n(n-1) + c_{7}n(n-1) + c_{8}n]
[T(n) = c_{1} + c_{2} + c_{3}n + c_{4}n + c_{5}n^{2}-n + c_{6}n^{2}-n + c_{7}n^{2}-n + c_{8}n]
[T(n) = (c_{5} + c_{6} + c_{7})n^{2} + (c_{3} + c_{4} + c_{8} - 3)n + (c_{1} + c_{2})]
But it turns out that the precise value of each constant is practically useless – the constants in question represent such complex matters and concrete facts of the real world as language of implementation, hardware specs, how many other processes are fighting for CPU time, etc. Remember that we only really care about the performance of the algorithm – not the performance of the hardware or implementation thereof, which is not affected by the algorithm itself. So we can happily simplify our job by just getting rid of all those pesky constants!
[T(n) = n^{2} + n]
Wow, much nicer! But guess what: it gets even simpler. Since, for large enough inputs, the contribution of the lower-order term (n) makes virtually no difference to the output number, we can happily drop that too:
[T(n) = n^{2}]
And voilá! Unfortunately, we’ve boiled this timing function down so far and stripped so much from it that we can’t really say that (T(n)) is exactly (n^{2}), just that (in the worst case) it follows a similar curve. More precisely, we have virtually guaranteed that, for large enough inputs, it can never be bigger than some constant (c) times (n^{2}). This is the crux of asymptotic notation – specifically what is referred to as “Big-O notation”. As such, when we say that:
[T(n) = O(n^{2})]
What we mean is that there are some constants (c) and (a), such that for all (n) where (0 le a < n), we know that (T(n) le cn^{2}).
It should also be pointed out that the notation’s use of the “equals sign” is a little unusual – we’re not saying that (T(n)) is equal to (O(n^{2})), but that it is of equal or lesser order to (n^{2}). This might be a bit of a trick to wrap your head around at first (it certainly was to me). If I’d have written the notation myself, I’d have tried to use some symbol that didn’t already carry all the connotations and denotations the “equals sign” itself does; but c’est la vie – this is the way it’s done. Another way (in fact, the proper way) to think of it is to think of (O(n^{2})) being some set of functions that follow the above definition, and for the notation
[T(n) = O(n^{2})]
to in fact mean
[T(n) in O(n^{2})]
i.e. that (T(n)) is an element of the set (O(n^{2})).
OK, so we’ve been able to put an upper bound on some hypothetical order of our function (T(n)), but what does that really tell us? Not a whole lot, in isolation. But it really comes into play when you consider other algorithms, such as, say Merge Sort, which is bounded on top not by (O(n^{2})), but by (O(nlog_{2}n))! Graph those two functions and here’s what you get ((n^{2}) is in black, (nlog_{2}n) is in red):
This is a bit of an oversimplification, but sufficient to demonstrate the idea – notice that very quickly, (n^{2}) begins to overtake (nlog_{2}n), and continues to do so, the gap getting ever wider as n gets ever bigger; and just like in golf, the lower score is the one you want.
So, in what I hope was nutshell-ish if not perfectly so, that’s the basics of asymptotic notation! There are other, related notations like “Big Theta” (Theta(g(n))), “Big Omega” (Omega(g(n))), and the lower-case versions of omega and o, but I’ll leave those to you to explore.
Finally, as a disclaimer, I have tried to make the math herein as simple as possible for the purposes of the demonstration, and I could easily have borked something up terribly (I’m still getting the hang of LaTeX). If you spot an error, comment and I’ll see that it’s corrected. Thanks!
