Jen Trudell

Algorithms: Big-O Notation

September 17, 2015 | 9 Minute Read

What’s an algorithm?

An algorithm is a hard to pronounce word for something simple: a set of steps that can be used to solve a problem. A slightly longer definition is that an algorithm is a finite number of steps or rules that, if followed, give you a result. Algorithms are distinguished by three things: (1) the steps/rules are clear and unambiguous, (2) there is an end to the steps/rules (finite. the steps/rules don't go on forever.) and (3) following the steps/rules returns a result. Addition is an algorithm, which, given a list of numbers you want to add, can be expressed with the following steps: 1. given a list of numbers, 2. add the first number to the second number, 3. if there are more than two numers, add the sum of the first number and second number to the third number, 4. repeat step three for the fourth number (if any) and any subsequent numbers until there are no more numbers in the list, 5. return the final sum. If you don't like math, here's a peanut butter sandwich algorithm: 1. get out the peanut butter, 2. get out the bread, 3. get out the knife, 4. spread peanut butter on one slice of bread, 5. put the other slice of break on top of the slice of bread with the peanut butter on it, 6. return peanut butter sandwich.

What makes a good algorithm? Measuring algorithm complexity.

Algorithms are measured and compared based on how complex they are. The addition algorithm and the peanut better sandwhich algorithm are linear algorithms–the more numbers you have, or the more peanut butter sandwiches you have to make, the more steps you have to take, and the amount of steps you have to take increase in direct proportion to how many numbers or peanut butter sandwiches you are working with. Think of a graph: the x-axis is the number of steps, and the y axis is the number of sandwhiches/numbers. With linear algorithms, you end up with a straight diagonal line. If it takes 3 steps (disregarding taking out the knife, peanut butter and bread for the moment) to make one peanut butter sandwhich, it should take 300 steps to make 100 sandwhiches. If n is the number of peanut butter sandwhiches, or the number of numbers in a list of numbers we want to add, we could represent the complexity of the algorithm by O(n).

Big-O Notation

That O(n) up there is known as Big-O notation. Big-O notation represents the upper-bound of complexity of an algorithm, or for our purposes the total number of steps it will take to run the algorithm through a given set of n-numbers for the addition algorithm or n-sandwhiches for the sandwich making algorithm. Big-O represents the worst case scenario of complexity. When calculating what Big-O is for a certain algorithm, we only care about the big picture. Big-O notation is not exact, it is an estimate of the upper-bound based on the significant steps in the algorithm. Trivial stuff, like getting out the peanut butter and bread and pulling the knife from the drawer only happens once–it is a constant. You can drop constants when using Big-O, because Big-O assumes n is trending towards infinity. If we included constants, we might write something like O(n+3), where the +3 is the get out bread step, get out peanut butter step, and get out knife step, each of which we had to do only once before we started making sandwhiches. But if we are making 100 sandwhiches, or a million sandwiches, do we really care about the extra 3 steps in the complexity of our algorithm (300 steps or 1,000,000 steps vs. 303 or 1,000,003)? Nope. We can leave it off, O(n) is good enough for our purposes. Not all algorithms can be represented by O(n) to show their complexity (although it would be a lot easier if they could be!). You could have an algorithm that is O(n^2), or O(2logn), etc.

There a difference between complexity and runtime

Big-O is a representation of how complex an algorithm is. It is NOT a measure of how long it would take to run the algorithm. It is not a measure of performance. The time it takes to run the algorithm depends on what we can call K. What is K? It’s a constant. K can be you, in the case of how fast you can make sandwhiches or add up numbers on a piece of paper, or your computer, if you are using your computer to do the addition, and the amount of memory and processor speed your computer has to run the data (or sandwiches) you are processing. Big-O doesn’t really care about K, because big-O doesn’t care about constants. Big-O just wants to tell you the upperbound of complexity for a given algorithm so you can compare it to other algorithms that do the same thing.

Comparing Algorithms using Big-O

So what’s the point of all this? Why do I care if my algorithm for making peanut butter sandwhiches is O(n)? I own a company that makes gourmet peanut butter sandwhiches and I need to make 20,000 peanut butter sandwiches (each sandwich represented by n) a day. By hand, with our 3 steps a sandwhich algorithm, that’s 60,000 steps. I want to buy a peanut butter sandwhich making machine, but only if it makes sense. With the machine, the steps are as follows for each sandwhich: 1. press button, 2. return sandwhich. At the beginning of each day we have to load up the machine with a ton of bread and peanut butter, but we only do that once a day and we don’t need to include it when calculating Big-O for our algorithm. You can program the machine to make sandwhiches in 100 sandwich batches, so for 100 sandwiches, it is still only two steps: 1. turn on machine, 2. return sandwiches. In this case, as long as there are less than or equal to 100 sandwiches, the sandwich machine will take 2 steps to make them. So we could represent the complexity of our sandwich machine making algorithm as O(n/100), where n is the number of sandwiches. There are some situations where this wouldn’t be accurate (if you were only making 50 sandwiches, for example), but as we learned above, we can ignore the outliers. We care about the big picture. If we are making 20,000 peanut butter sandwiches, this algorithm would suggest that at most it would take about 400 steps to make them (2 steps * (20,000 sandwiches/100)). A great improvement over the 60,000 steps it would take me by myself to make the sandwiches without the machine using my handmade sandwich algorithm.

The Big BIG Picture

Obviously the sandwich machine making algorithm is less complex than the sandwich making by hand algorithm. We can compare those two algorithms using Big-O because they are doing the same thing. It wouldn’t make sense to say that the sandwich machine algorithm is less complex than the addition algorithm above. But we could compare different addition or multiplication algorithms (Yes, there are more ways to multiply numbers than the one you learned in elementary school. This Stanford video is a great intro to algorithms and uses multiplication as an example.). If we are multiplying numbers with hundreds or thousands of digits, we care which set of steps, which algorithm, is more efficient, because our hands are going to hurt like hell multiplying all those numbers and we want to keep the steps to a minimum. Similarly, if we are doing the multiplication on a computer, we care whether one algorithm will process our long list of numbers in 2 steps if each step takes 10 seconds, while another algorithm that is more complex will process it in 1000 steps and keep us from updating Facebook for 2000 seconds while it processes the list.

So that’s big-O notation. Keep in mind that it is the upper-bound, worse case scenario: it says “this, at most, is how many steps this algorithm will take, give some input n”. Big-O isn’t telling you anything about time or performance. If you want to figure that out, you will need to throw in some factor K (which could be how fast your processor is/how fast your peanut butter making machine is/how fast you write), which has nothing to do with the complexity of the algorithm being represented by Big-O. There’s other sorts of notation: big-Omega notation represents the lower bound, best case scenario for an algorithm: this is, at the very least, how many steps this algorithm will take to run. A gross simplification of theta notation is that it is when the worst case and the best case are the same: if, no matter what, an algorithm given an input n will take a certain number of steps to run, then you can represent it using theta notation.

Caveat

I’m not a math major. I don’t have a CS degree. Half of this is probably wrong. Except I’m 100% positive about Big-O being the upperbound of complexity is right, and anyone who tells you Big-O represents runtime is flat out wrong. That being said, we all call tissue “kleenex” even though Kleenex is a brand name and not all tissue is Kleenex, and if someone asks you for a kleenex and you say “I don’t have kleenex, but I have a tissue” you are just being a jerk because you know damn well what they want. Someone at an interview is inevitably going to ask you about Big-O notation and runtime and you are just going to smile, node and pretend they are the same.