Static Type-Checking in Python Using Mypy

Today I want to talk about static type-checking in Python, a dyanmically-typed language. Support for static type-checking arguably began in 2006 but developments this year in regard to Python 3.5 have made it easy to introduce opt-in static typing into new and existing Python software. Today I’m going to provide a brief introduction into how you can add such type-checking to your Python code and what I feel are the pros and cons overall.

Writing Python With Type-Checking

Sometimes I will write type information in docstrings for functions. Here is a remedial example for a function that adds two integers:

def add(x, y):
    """Add two integers and return the result.

    Args:
        x (int): The first number.
        y (int): The second number.

    Returns:
        int: The result of adding the two numbers.
    """
    return x + y

That amount of documentation is not necessary for such a function, but it looks nice in bpython, which is my favorite interactive Python intepreter.

Ideally we’d like to express that type information in a way that’s more succinct and, more importantly, syntactically closer to the relevant elements of the function. We can do this by taking advantage of PEP 0484 which introduced ‘Type Hints’ to the language, letting us rewrite the function like so:

def add(x: int, y: int) -> int:
    return x + y

Here I’ve dropped the docstring because the type hints express everything it conveyed. That is, the function is sufficiently self-documenting in this form. We’ve written the fact that the two parameters and return type are all int’s directly into the function signature.

However, Python does not enforce any type-checking as a result. So if we write add("Foo", "Bar") the language is not going to complain. It will happily return "FooBar" without so much as a warning.

So what’s the point? How can we benefit from these type annotations?

Enter Mypy

Mypy is an optional type-checker for Python; in fact, you will see the design of PEP 0484 borrowed much from Mypy. There were some differences, e.g. PEP 0484 uses Callable instead of Mypy’s Function, but as of April 2015 Mypy has adopted the majority of those differences.

Note well that Mypy only performs type-checking; it does not execute code, nor is it a Python interpreter or compiler in the traditional sense. Mypy will not ‘run’ your code. Instead Mypy acts solely as a tool to perform type-checking.

For the sake of example, let’s use this code:

import typing

def add(x: int, y: int) -> int:
    return x + y

result = add("Foo", "Bar")

We import the typing module—now part of the Python 3.5 standard library—so that we have access to type-checking tools. If we run Mypy on this code we will see the following output:

error: Argument 1 to "add" has incompatible type "str"; expected "int"
error: Argument 2 to "add" has incompatible type "str"; expected "int"

Nice! Mypy sees the type information in the signature of add() and realizes our call to add() uses the incorrect types.

Aside: At this point I want to mention flycheck-mypy, a package for GNU Emacs which will report errors from Mypy in real-time while you write code. It looks like this, and hopefully you can see how useful that can be.

A More Complex Example

Let’s look at how to provide type-checking for variables and more complex function signatures. First we’ll have the full example code and then we’ll walk through it.

from typing import Callable, List, Iterable

def simple_name_source() -> Iterable[str]:
    yield from ["Eric", "Jeff", "Lobby"]

def broken_name_source() -> Iterable[int]:
    yield from [1, 2, 3]

def generate_names(generator: Callable[[], Iterable[str]]) -> List[str]:
    return [name for name in generator()]

names = [] # type: List[str]

names = generate_names(simple_name_source)
names = generate_names(broken_name_source)

print(names)

First let’s take note of the names variable. There is no syntax similar to function arguments that we can use to annotate the type for such variables. So we must resort to using a special type of comment: # type: List[str]. This tells Mypy that names should always be a list of strings. And note that we must import List from the typing module; or we could simply write import typing and then write typing.List[str], but personally I prefer to explicitly import such names as List, Callable, Iterable, and so on.

We are going to populate names by calling generate_names(). From the function signature we can see it returns List[str], exactly the type we give to names. If we make any changes to generate_names() which cause it to not return a list of strings then we’ll see those errors, both in the function code and in the places where we assign the return value to names.

The code for generate_names() accepts a generator and invokes it to build up a list of strings. The type annotation for that generator parameter is the most complex we have seen thus far. When our parameter is a function, generator, etc., we use the Callable annotation. The general syntax is Callable[[argument_types], return_type]. Here are some example annotations along with function signatures that would satisfy the type-checker:

Callable[[int, int], None] ~ def foo(x: int, y: int) -> None: ...

Callable[[], str] ~ def foo() -> str: …

Callable[ [Callable[[Any], bool], List[Any]], List[Any] ] ~ def filter(predicate: Callable[[Any], bool], items: List[Any]) -> List[Any]: …

That last one is noisy. Thankfully Mypy allows us to define ‘type aliases’. So we could create an alias for a predicate function like so:

Predicate = Callable[[Any], bool]

Then we could rewrite the last example like so:

Callable[ [Predicate, List[Any]], List[Any] ] ~ def filter(predicate: Predicate, items: List[Any]) -> List[Any]: …

Aliases are useful when your type annotations contain many uses of Callable, Union, Iterable, et alia.

Now let’s return to our example for this section. Remember that this is the function we will use to populate our names list:

def generate_names(generator: Callable[[], Iterable[str]]) -> List[str]:
    return [name for name in generator()]

These type annotations give us insight into how we must write any function we intend to use as the first parameter for generate_names(): they must be callables which accept no arguments and return an iterable of strings. With that in mind…

def simple_name_source() -> Iterable[str]:
    yield from ["Eric", "Jeff", "Lobby"]

def broken_name_source() -> Iterable[int]:
    yield from [1, 2, 3]

…we can see why one of these two is broken. So when we write names = generate_names(broken_name_source) Mypy will give us an error regarding our use of broken_name_source as a parameter.

But here’s something great: Mypy will detect this error even if broken_name_source() has no type annotation for its return value. Mypy will look at broken_name_source() and infer that it returns the Iterable[int] type. Thanks to that type-inference we could simply write…

def broken_name_source():
    yield from [1, 2, 3]

…and Mypy would still report the same error for generate_names(broken_name_source). This type-inference leads directly into the second part of this article.

The Pros and Cons

Pros

The type-inference in Mypy is a great pro because it allows you to more easily introduce type-checking into existing code, without having to go back and annotate everything before you start to see the benefits. Mypy also has large amounts of type annotations for Python’s standard library, thus helping you catch misuse of those functions without having to first write any type information yourself.

Another pro is, obviously, the pro of static type-checking itself: it helps you more easily find errors in your code that result from accidentally using values of unintended types.

Finally, an important pro is that type-checking is entirely opt-in. Using Mypy doesn’t mean you need to suddenly rewrite your entire code-base. You can gradually introduce type annotations as you go, focusing first on wherever you feel they will be most useful.

Cons

Casts do not perform any runtime type-checking. We can write this…

from typing import cast

foo = [1, 2, 3] # type: List[int]
bar = cast(List[str], foo)

…and Mypy won’t complain one bit. Some people may fairly argue that Mypy shouldn’t raise an issue over this; personally I feel like it ought to, because List[str] is not a subtype of List[int]. If we declared the type of foo to be object then I have no objections, because List[str] is a subtype of object. Regardless, if we write bar = foo then Mypy will raise an error over assigning a List[int] to a List[str]. So at least that helps.

Whenever you use a third-party library it’s unlikely the library will have any annotations for type-checking, e.g.

import foo
foo.run()

Mypy will complain about foo. You have two options in this situation. First, you can write a foo.pyi file that provides type-annotated stubs for that module; but depending on the size of the module this could easily be unreasonable. Second, you can tell Mypy to straight-out ignore type issues related to foo like so:

import foo # type: ignore
foo.run()

The last con I want to mention is Mypy’s current inability to infer the type of lists in some common situations. I stress the word ‘current’ because the Mypy developers plan to address this in the future. But for now, let’s say we have this code:

numbers = []

for i in range(10):
    numbers.append(i)

def print_strings(n: List[str]) -> None:
    print(n)

print_strings(numbers)

We can tell that numbers has the type List[int], but Mypy cannot, and thus it will not raise a type-error when we call print_strings(numbers); it does actually give us a warning, but that warning is ‘Cannot determine the type of numbers’, which is exactly the problem. We can address this in two ways. The first is to declare numbers like so:

numbers = [] # type: List[int]

This will cause Mypy to detect the type error in our call to print_strings(). The second approach is to rewrite numbers as a list comprehension.

numbers = [i for i in range(10)]

By using a list comprehension Mypy can automatically and correctly infer the type of numbers, without requiring us to write any explicit annotation like in the previous approach.

Conclusion

If you’re a fan an advocate of static type-checking—and personally I am such a programmer—and you miss its benefits when using dynamically-typed languages like Python, well, thanks to PEP 0484 and Mypy you can have many of the benefits of static types in your Python code. I strongly recommend giving it a try sometime, especially for any new Python software where you can introduce type annotations from the get-go.

Further Reading: The Mypy Documentation.

Advertisements

3 thoughts on “Static Type-Checking in Python Using Mypy

  1. I would argue that the last con isn’t really a con at all. When a programmer reads a variable declaration, they shouldn’t have look elsewhere in the code to figure out the role the variable. The declaration site should contain full information about the role of the variable. How that information is conveyed depends on the variable in question. Local variables are often thrown away after a few lines without any mutation; in that case, the type will be inferred, as in the case of your list comprehension. If it’s permanent, mutable, global etc., a strong declaration including a type annotation and a comment should be provided so that the programmer doesn’t have to guess/reverse-engineer its role.

    The problem with opt-in is that it often limits the quality of the static type system. For example, declarations such as “var a = 3” should infer a non-null int by default, but doing so can cause type errors in untyped programs. I guess if opt-in happens at module level granularity then this isn’t an issue.

    1. When a programmer reads a variable declaration, they shouldn’t have look elsewhere in the code to figure out the role the variable.

      That’s a great point, and I agree with you. I should have clarified the con by emphasizing that my issue is really more with Mypy’s current behavior in that scenario, particularly not inferring the type based on analyzation of the rest of the code.

      The problem with opt-in is that it often limits the quality of the static type system.

      True, but with Python I feel like opt-in is the only realistic choice. And since it happens at the module level, as you say, I personally think it’s a small issue.

Add Your Thoughts

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s