Migrating Python Web Applications to Python 3

Follow along here: w0rp.com/migrating

Who am I?

Andrew Wray, Senior Developer at Wazoku Ltd
(Wazoku is hiring!)
Python programmer since 2008 (Just before 3.0 release)
Passions: Programming, language, music

What do I work on?

During the day ...

Spotlight: a single page web application
Built with Django and Angular
Ran on Python 2.7, now runs on 3.6!

... but at night!

ALE — A Vim plugin for checking and fixing your code
Songs and poems

ALE: https://github.com/w0rp/ale

My demo album: holdon.world

Overview

Motivation for upgrading
Upgrading to Python 3 methodically
How to address common issues

Motivation

Why Python 3?

Performance not a problem
Bug fixes
Better types (str, bytes, int)
Great new features
Django 2.0 will not support Python 2

Python 3 performance

Probably the same, and going to get better.

Ignore generic benchmarks. Run your own!

Wazoku benchmarks

I submitted some Ideas and averaged the time for POST requests.

~349ms in Python 2, ~330ms in Python 3.

Basically, no meaningful difference.

However, much less memory is now consumed in production.

The Timsort algorithm used in list.sort() and sorted() now runs faster

Whats New in Python 3.2?

UTF-8 is now 2x to 4x faster. UTF-16 encoding is now up to 10x faster.

Whats New in Python 3.3?

collections.OrderedDict is now implemented in C, which makes it 4 to 100 times faster.

Whats New in Python 3.5?

The UTF-8 encoder is now up to 75 times as fast for error handlers ignore, replace, surrogateescape, surrogatepass

Whats New in Python 3.6?

Optimized case-insensitive matching and searching of regular expressions. Searching some patterns can now be up to 20 times faster.

Whats New in Python 3.7?

Performance improvements go where new development is.

New development means Python 3.

New features in 3.x

Better *x and **y
list comprehensions don't leak variables
Better super() and class declarations
More generators, yield from
async – await
Type annotations for mypy
3.7 makes UTF-8 the default encoding, finally
Much more!

Don't get ahead of yourself, however.

These features are great, but you can't upgrade all in one go.

Python 2 and 3 behave differently in subtle ways.

You will need to follow a methodical upgrade process.

Upgrading

The upgrade process

At a high level, all you need to do is...

Install Python 3
Cover Python 2 code well (70%, 80%, ...)
Introduce Python 3 testing into CI
Start writing code which works in both versions
Achieve 100% test success in Python 3
Roll out in development, testing, production
Switch to Python 3 only

Install Python 3.6

(or whatever Python 3 version)

Ubuntu or Linux Mint

# Install the version you want via the deadsnakes PPA.
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get update
sudo apt-get install python3.6 python3.6-dev python3.6-venv

Mac OSX or Windows

I strongly recommend using the official installer. https://www.python.org/downloads/release/python-362/

You could just compile Python. It's not too hard.

Use Continuous Integration

Configure CI, test Python 2 and 3
Wazoku used Jenkins and Bash scripts
Travis CI and Tox are common
Use different virtualenvs for local testing

Example here: https://github.com/w0rp/tox-travis-example

Mark progress with tests

Run automated tests to measure Python 3 compatibility.

You can target certain tests with marks.

pytest -m python3_supported

@pytest.mark.python3_supported
def test_really_important_thing(...):
...

You can also choose to skip some.

@pytest.mark.skipif(six.PY3)
def test_really_important_thing(...):
...

Use CI to move forward (10% one week, 20% the next...)

Localise problem imports

Import code which fails on Python 3 locally.

Before:

from broken_in_python3 import important_function

def do_something_important():
important_function()

After:

def do_something_important():
from broken_in_python3 import important_function

important_function()

Localising imports will take you from output like this...

$ pytest
E..F.

to output more like this.

$ pytest
......FFFFFFF..FF..

You can get more of your tests to run.

Use libraries that work

Over 90% of the top 200 Python packages support Python 3: https://python3wos.appspot.com

Stop using packages and versions that do not support Python 3.

Start using the packages that do.

Adopt new syntax and semantics

First thing's first. Write this in every Python file. (I mean it!)

from __future__ import absolute_import, division, print_function, unicode_literals

The above will fix problems with...

changes to import rules.
x / y returning different results.
using print as a statement.
strings not being... strings by default.

flake8 plugin: https://github.com/xZise/flake8-future-import

Syntax to avoid

print statements

print x, y  # With the __future__ imports, this is a syntax error
print(x, y) # This works in both versions with the __future__ imports

Old except: syntax

except ExceptionType, e:   # Only valid in Python 2
except ExceptionType as e: # Works in both versions

.next() for iterators

x = iterator.next() # Only valid in Python 2
x = next(iterator)  # Works in both versions.

Use list(x for x in ...), not [x for x in ...]

some_list = [x for x in range(3)]     # x is bound to function scope in 2.x
some_list = list(x for x in range(3)) # x is not accessible on the outside

Don't get your strings tangled up

Both Python versions offer text sequence types and binary sequence types.

They are named differently in each version.

Text should be your default, not bytes.

from __future__ import ..., unicode_literals # Remember this?

some_text = 'foo'   # This is now `unicode` in Python 2, and `str` in Python 3.
also_text = u'foo'  # Works in 2.7, removed in 3.0, added back in 3.3
some_bytes = b'bar' # This is now `str` in Python 2, and `bytes` in Python 3.

Never mix text and binary sequences.

confusion    = 'foo' + b'bar'                 # This doesn't work in 3
text_result  = 'foo' + b'bar'.decode('utf-8') # This will work
bytes_result = 'foo'.encode('utf-8') + b'bar' # This will also work

Remember: You decode bytes, and encode text.

Never the opposite.

These are correct.

b'xyz'.decode('utf-8') # Correct, you decode some bytes into text
'xyz'.encode('utf-8')  # Correct, you encode text into bytes

These are wrong!

b'xyz'.encode('utf-8') # Wrong! Python 3 will raise AttributeError
'xyz'.decode('utf-8')  # Wrong! Python 3 will raise AttributeError

The textual lifecycle

Think of byte sequences and text sequences like so.

encoded -> decoded -> encoded

bytes -> str -> bytes

HTTP request -> application -> database

Libraries will almost always handle encoding for you.

Gain confidence, and release!

Get your tests passing in Python 3
Gain confidence through quality tests
Roll out Python 3 gradually
Ship it, call it good (Wazoku did)

... You'll have to deal with a number of issues before you're done.

Common issues

Install six for a compatibility layer

https://pythonhosted.org/six/

pip install six

six offers a compatibility layer for many common symbols.

six will fix most of your common standard library issues, so use it.

Handling problems with builtins

Builtins behave differently.

Here is how to fix those issues.

Use range, not xrange.

If you must have a generator, use six.moves

from six.moves import range # Make range compatible for this file

a_generator = range(42) # Same as xrange(42) in 2, range(42) in 3

Use list(range(...)) when you must have a list.

some_generator = xrange(5) # Does not exist in Python 3
something = range(5)       # list in 2, but the same as xrange(5) in 3
a_list = list(range(5))    # A list in both

This works for anything which returns an iterable.

some_list = list(may_return_list())

Worst case scenario, you make a redundant copy.

Use expressions instead of map or filter.

Don't bother with this:

doubled_values = map(lambda x: x * 2, some_list) # a list in 2, a generator in 3
odd_values = filter(lambda x: x % 0: some_list)  # list in 2, a generator in 3

Do this instead:

doubled_values = list(x * 2 for x in some_list)  # a list in each version
odd_values = list(x for x in some_list if x % 0) # a list in each version

If you must use them, import the iterator versions.

from six.moves import filter, map # itertools.filter and itertools.imap in 2.x

reduce is no longer a builtin in 3, for some reason.

Just import it when you use it.

from six.moves import reduce # Redundant in 2, but fixes code in 3

product_result = reduce(lambda x, y: x * y, range(1, 5)) # Now you can use reduce

Type checking

Types are different in 2 and 3. Use types from six instead.

Before:

isinstance(value, (int, long)) # Checking for integers
isinstance(value, basestring)  # Checking for str or unicode

After:

isinstance(value, six.integer_types) # Checking for integers, no long in 3
isinstance(value, six.string_types)  # basestring in 2, and str in 3 ...
isinstance(value, six.text_type)     # unicode in 2, and str in 3
isinstance(value, six.binary_type)   # Check for bytes in 3, or str in 2

# If you really must, accept both, and convert your value...
isinstance(value, (six.binary_type, six_text_type))

Iterating through dictionaries

These do not work in Python 3

for key in some_dict.iterkeys(): ...
for value in some_dict.itervalues(): ...
for key, value in some_dict.iteritems(): ...

Use six functions instead.

# You can just do this instead in each version
for key in some_dict: ...
for value in six.itervalues(some_dict): ...
for key, value in six.iteritems(some_dict): ...

There are view functions too.

for key in six.viewkeys(some_dict): ...
for value in six.viewvalues(some_dict): ...
for key, value in six.viewitems(some_dict): ...

Dealing with text and bytes

A whole talk could be devoted to this alone.

Try and use text almost everywhere, not bytes.

u'' always gives you text in 2.x and 3.3+

b'' always gives you bytes in 2.6+ and 3.0+

Wazoku first adopted explicit literals, and then unicode_literals

from __future__ import unicode_literals

When you know the types, decode and encode.

a_string = some_data.decode('utf-8')
some_data = a_string.encode('utf-8')

Always explicitly specify the encoding.

The default encoding is often ascii in Python 2.

Fuzzy text conversion

A poor man's version here. (Find a good library instead)

def to_text(value, encoding='utf-8'):
# Check for str in 2, and bytes in 3
if isinstance(value, six.binary_type):
    return value.decode(encoding)

# Use six.text_type(x) instead of unicode(x) or str(x)
return six.text_type(value)

def to_bytes(value, encoding='utf-8'):
# Re-encode binary data as utf-8, so we get exceptions for invalid bytes
return to_text(value, encoding).encode('utf-8')

Use force_text or force_bytes instead for Django code.

from django.utils.encoding import force_bytes, force_text

Handling URL functions

Python 2 only:

from urlparse import urlparse, parse_qs
from urllib import urlencode, quote_plus

Python 3 only:

from urllib.parse import urlparse, parse_qs, urlencode, quote_plus

Both with six:

from six.moves.urllib.parse import urlparse, parse_qs, urlencode, quote_plus

See six documentation for other functions.

Use the Django functions if you can, which work better.

from django.utils.http import urlencode, urlquote_plus

Other issues you will face

urlencode handles Unicode poorly in Python 2
Un-orderable types (You can't sort with None)
csv changed to expect text sequences
No cmp for sorted or itertools.groupby
__str__: Use @six.python_2_unicode_compatible
__metaclass__: Use @six.with_metaclass
u'%s' % b'foo' produces weird results

Conclusion

Python 3 is the future
Migrating will make life hard at first
Python 3.6 will make life easier
Start upgrading now, it will take time