Testing and Linting with Jupyter Notebooks

One objection many people have to Jupyter Notebooks is the difficulty of producing clean code in them. Lets look at a few tools to help with producing high quality Python code in Jupyter.

Commenting and docstrings work just the same in Jupyter as in a Python IDE. The main tools we are therefore likely to be looking for are testing and linting. We might also find timing and memory usage information useful to check for efficiency. Fortunately all of these can be done within Jupyter Notebooks.

Testing

There are many testing frameworks in Python. Lets discuss two that certainly work in Jupyter, unittest and doctest. First lets consider applying unittest to the following example case.

# here is some really awful code with errors

class BadCalculator:

    def __init__ (self, num1, num2):
        self.number_one = num1
        self.number_two = num2

    def add(self):
        # maths is correct test will pass
        answer = self.number_one + self.number_two
        return answer

    def multiply(self):
        # note maths error - test will fail
        answer = self.number_one * self.number_two +1
        return answer

calc = BadCalculator(2,3) 
print(calc.add())
print(calc.multiply())

To use unittest we just define the tests as usual. However when it comes to running the tests we need to slightly modify our approach as shown in the following code

# we import unnittest as usual
import unittest

# and even write our tests in the conventional fashion
class TestBadCalculator(unittest.TestCase):
'''Testing example for the badcalculator class'''

    def testAdd(self):
    ''' Checks the addition module'''
        # add maths is correct test will pass
        testCalc = BadCalculator(3,3)
        self.assertEqual(testCalc.add(),6)

    def testMultiply(self):
    ''' Checks the multiplication module'''
        # multiply maths is incorrect test will fail
        testCalc = BadCalculator(3,3)
        self.assertEqual(testCalc.multiply(), 9)

# note the change to how unittest needs to be called to work in Jupyter
if __name__ == '__main__':
    unittest.main(argv=['first-arg-is-ignored'], 
                  verbosity = 2, exit=False)

Note the argv parameter, this is needed for unittest to work within Jupyter. Verbosity can be adjusted as desired.

The process for docctest is even simpler. the tests to be used need to be set up withing the doc strings of the code under test as shown here

# here is a second really awful calculator to demonstrate doctest

class OtherCalculator:

    def __init__ (self, num1, num2):
        self.number_one = num1
        self.number_two = num2

    def add(self):
    '''Returns the sum of the two numbers of the OtherCalculator item

    >>> check = OtherCalculator(3,3)
    >>> check.add()
    6
    '''
        # maths is correct test will pass
        answer = self.number_one + self.number_two
        return answer

    def multiply(self):
    '''Returns the sum of the two numbers of the OtherCalculator item

    >>> check = OtherCalculator(3,3)
    >>> check.multiply()
    9
    '''
        # note maths error - test will fail
        answer = self.number_one * self.number_two +1
        return answer

other_calc = OtherCalculator(2,3)
print(calc.add())
print(calc.multiply())

After this, the tests can easily be run as follows

# here is the doctest code to check our code
import doctest

if __name__ == '__main__':
    doctest.testmod()

Linting

Linting is a little more awkward. Unfortunately pylint and pyflakes do not work with .ipynb files. However there is a linter called nblint which does work . It uses pycodestyle by default and it can also be configured to use pyflakes as its linting engine. Sadly it does not support pylint.

While not currently available via conda install, it is easily installed via pip:

pip install nblint

Once you have installed nblint, search and note its location (eg using Windows Explorer to seach for “nblint” on Windows machines). The linter can then be run direct from your Jupyter notebook using the %run command followed by the full path to nblint. remember to substitute forward slashes for Windows backslashes in the path. You also include the name of the file to be linted. Include the path if it is not in the current working directory.

 
# running nblint 
%run C:/Users/Justin/Anaconda3/envs/theano/Scripts/nblint Testing_Notebook.ipynb

# or alternatively running nblint with pyflakes
%run C:/Users/Justin/Anaconda3/envs/theano/Scripts/nblint --linter pyflakes Testing_Notebook.ipynb

Timing

we might also want to time our code to check its efficiency. This can easily be done in Juypter using two of its magic timing functions %%time and %%timeit. Both magic functions pertain only to the cell in which they occur

%%time will give you the time for a single run of your code,

%%time
for i in range(100000):
    i = i**3

%%timeit runs the code a large number of times and gives you the mean of the fastest of 3 runs.

%%timeit
for i in range(100000):
    i = i**3

Also you can use %timeit with a single % sign to time a single line of code

%timeit L = [i ** 3 for i in range(100000)]

Note that if you want to make full use of the timeit module’s more advanced options you will still need to import it and use it as usual.

Memory

Finally we may want to examine memory usage of our variables. To do so the following code snippet is useful.

import sys

# These are the usual Jupyter objects, including this one you are creating
variables = ['In', 'Out', 'exit', 'quit', 'get_ipython', 'variables']

# Get a sorted list of the objects and their sizes
sorted([(x, sys.getsizeof(globals().get(x))) for x in dir()
        if not x.startswith('_') and x not in sys.modules and
        x not in ipython_vars], key=lambda x: x[1], reverse=True)