Are you “Failing Fast” properly?

Table of Contents

What is “Failing Fast”?

You may have heard about “Failing Fast” before! Well, it simply means exactly what you are reading. If you have to fail in something, you better fail fast, so you can correct your path.

In different fields, “Failing Fast” will take a different cosmetic form; but, the concepts stays the same.

Failing Fast in Business

For example, in business, “failing fast” usually means that you should be rapidly make decisions and test different ideas to find out which one works and which one doesn’t. The goal is to reduce the amount of resources, most importantly time, on paths that are doomed to fail. Of course, there are some foot notes attached to this strategy.

For example, in Amazon, we like to divide our decisions in two big categories:

Two-way door decision
One-way door decision

A two-way door decision is a decision that just like a two-way door (or revolving door) you can undo without any big consequences. The one-way door, however, is impossible to undo or there is a heavy cost associated with undoing it. So, make a decision fast if it is a two-way door decision; but take your time, as long as you need, once it comes to one-way doors.

Failing Fast in Programming

Failing Fast in programming essentially means the same thing, if some thing that should not happened has happened, you want to halt the execution of the code as fast as you can. There is no need to do some more work if you know that (for example) one of your inputs are wrong. Besides, when there is an exception thrown, you want that to be as close as possible to where the root cause of the problem is, rather than failing in another section or sub-system, while the actual error happened couple of calls and systems away. Failing fast makes debugging much easier, saves on resources and time that could have been used for something else, also depending on the system, actually sometimes continuing with a faulty input or an unstable (or wrong) state could make some irreversible damages to logging or data that you store.

Using “assert” as part of Fail Fast

One common approach to implement Fail Fast Philosophy in coding is the use of assert. For example, let’s say you have a function that accepts number of employees and does some thing with that information. We can agree that the number of employees can never be a negative integer. So, you would right something like:

def allocate_resources(n_employee: int) -> float:
    assert n_employee >= 0

    print(f"You will get 42 resources per employee.")

    return n_employee * 42

If you run the above code you get something like this:

> allocate_resources(1);

You will get 42 resources per employee.

And if you use a wrong input you will get:

> allocate_resources(-10);

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[6], line 1
----> 1 allocate_resources(-10);

Cell In[2], line 2, in allocate_resources(n_employee)
      1 def allocate_resources(n_employee: int) -> float:
----> 2     assert n_employee >= 0
      4     print(f"You will get 42 resources per employee.")
      6     return n_employee * 42

AssertionError:

Notice that you are getting “AssertionError:” and no explanation of what’s wrong! Now the only way for a person debugging what went wrong is to check that assert, check the conditions of the assert and then understand why the assertion went wrong.
In some cases, like this example, it is easy to understand the reason behind that assert. But sometimes it is not as easy, even when you check the condition. For example, once I saw something like this in a code base:

assert flag is False

It was not really clearly what that flag is representing; it was literally just named flag. Moreover, earlier in the code there was “assert flag is True”. So, why at some point of the code, they were making sure that the flag is True and another part of the code, they were making sure it is false. Well, there reason was actually more clearly if you were looking at another part of the code. That part of the code should have been only executed if the flag was True. If you execute that part of the code when the flag was False there is a chance that you were getting some unstable numerical response and make the situation even worse. So, they used a flag that was representing certain conditions and by the time that the logic in that code section was over, if flag was False, it meant that condition was handled properly. But if it was still True, it meant something went wrong. BTW, this was about some special conditions in some computational geometry calculations.

It is considered a good practice to always provide some explanation of what that assert is doing. You can do this by:

def allocate_resources(n_employee: int) -> float:
    assert n_employee >= 0, "Number of employees cannot be negative."

    print(f"You will get 42 resources per employee.")

    return n_employee * 42

Now if someone tries to run that code with wrong input they will get:

> allocate_resources(-10);

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[9], line 1
----> 1 allocate_resources(-10);

Cell In[7], line 2, in allocate_resources(n_employee)
      1 def allocate_resources(n_employee: int) -> float:
----> 2     assert n_employee >= 0, "Number of employees cannot be negative."
      4     print(f"You will get 42 resources per employee.")
      6     return n_employee * 42

AssertionError: Number of employees cannot be negative.

Although may be not making much of a difference in this code, but this is much more clearly. Furthermore, even in this simple example, the user would understand why the assertion failed without needing to check the original code and the assertion condition.

So, what’s wrong with using assert to fail fast?

You guessed it right. There is something wrong with using assert for failing fast, and that’s why I am bringing it up here in this post.

While “assert’s” performance is much higher than a heavy load exception mechanism, “assert” only exists in debugging, testing, and regular execution mode. The moment that you execute your python code in optimized mode, all the mentions of asserts are removed.

Let’s check this out. Let’s assume we have the following script:

def allocate_resources(n_employee: int) -> float:
    assert n_employee >= 0, "Number of employees cannot be negative."

    return n_employee * 42

if __name__ == "__main__":
    n_employee = 10
    resources = allocate_resources(n_employee=n_employee)

    print(f"{resources} resources were allocation to {n_employee} employee(s).\n")

    n_employee = -10
    resources = allocate_resources(n_employee=n_employee)

    print(f"{resources} resources were allocation to {n_employee} employee(s).")

If you execute this in regular mode you get:

$ python test_assert.py
420 resources were allocation to 10 employee(s).

Traceback (most recent call last):
  File ".../test/test_assert.py", line 13, in <module>
    resources = allocate_resources(n_employee=n_employee)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/moeabo/Downloads/test/test_assert.py", line 2, in allocate_resources
    assert n_employee >= 0, "Number of employees cannot be negative."
           ^^^^^^^^^^^^^^^
AssertionError: Number of employees cannot be negative.

as you can see, the first call to “allocate_resources” went through without any issue. But the second call failed because we provided a negative integer for “n_employee”.

Now let’s run the same code but in optimized mode:

$ python -O test_assert.py
420 resources were allocation to 10 employee(s).

-420 resources were allocation to -10 employee(s).

Surprise! Surprise! The assert didn’t do anything and our code executed just fine. And that is the danger of using asserts for production. You might think that you still do have all the guardrails protecting you for bad inputs or wrong states; however, as you can see, in production and when you are executing your code in optimized mode, the assert statement have absolutely no effect.

Is this a Python thing?

Actually no! This is not a Python thing. The assert is pretty much ignored in almost any coding language when the code is executed (or compiled) for release mode and particularly when the optimization is enabled.

Let’s clarify the “almost any coding language” part of the above statement. Any coding language that I have worked with so far actually does behave the same and they ignore assert in optimized mode. However, I do not claim that I have worked with every coding language that is out there. So, if you know a programming language that doesn’t get rid off assert, well, that’s why I said almost any coding language.

Tell me more about python optimized mode

There are two ways to run a python code in optimized mode.

Using -O or -OO

One method is to use “-O” for single optimization level or use “-OO” for double optimization mode. The double tries to optimize even furthermore. So, essentially you will do one of these calls:

python -O script_name.py

Or for double optimization you would do:

python -OO script_name.py

Use PYTHONOPTIMIZE environment variable

Alternatively you can set an environment variable called “PYTHONOPTIMIZE” to get a desired optimization level. If PYTHONOPTIMIZE is not defined or if it is set to zero, no optimization is used.

$ PYTHONOPTIMIZE=0  python test_assert.py
420 resources were allocation to 10 employee(s).

Traceback (most recent call last):
  File ".../test/test_assert.py", line 13, in <module>
    resources = allocate_resources(n_employee=n_employee)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../test/test_assert.py", line 2, in allocate_resources
    assert n_employee >= 0, "Number of employees cannot be negative."
           ^^^^^^^^^^^^^^^
AssertionError: Number of employees cannot be negative.

To use single optimization set PYTHONOPTIMIZE to 1:

$ PYTHONOPTIMIZE=1  python test_assert.py
420 resources were allocation to 10 employee(s).

-420 resources were allocation to -10 employee(s).

and finally for double optimization you can do:

$ PYTHONOPTIMIZE=2  python test_assert.py
420 resources were allocation to 10 employee(s).

-420 resources were allocation to -10 employee(s).

Be careful when using single or double optimization. It is possible that sometimes, codes behave differently when you enable code optimization during compiling or runtime. I have personally faced many situations particularly when using C/C++ that a numerical algorithm failed to run when optimization was enabled.

The good news, though, is that as compilers have become better and smarter, this issue is happening less and less.