One objection many people have to Jupyter Notebooks is the difficulty of producing clean code in them. Lets look at a few tools to help with producing high quality Python code in Jupyter.
Commenting and docstrings work just the same in Jupyter as in a Python IDE. The main tools we are therefore likely to be looking for are testing and linting. We might also find timing and memory usage information useful to check for efficiency. Fortunately all of these can be done within Jupyter Notebooks.
Testing
There are many testing frameworks in Python. Lets discuss two that certainly work in Jupyter, unittest and doctest. First lets consider applying unittest to the following example case.
# here is some really awful code with errors class BadCalculator: def __init__ (self, num1, num2): self.number_one = num1 self.number_two = num2 def add(self): # maths is correct test will pass answer = self.number_one + self.number_two return answer def multiply(self): # note maths error - test will fail answer = self.number_one * self.number_two +1 return answer calc = BadCalculator(2,3) print(calc.add()) print(calc.multiply())
To use unittest we just define the tests as usual. However when it comes to running the tests we need to slightly modify our approach as shown in the following code
# we import unnittest as usual import unittest # and even write our tests in the conventional fashion class TestBadCalculator(unittest.TestCase): '''Testing example for the badcalculator class''' def testAdd(self): ''' Checks the addition module''' # add maths is correct test will pass testCalc = BadCalculator(3,3) self.assertEqual(testCalc.add(),6) def testMultiply(self): ''' Checks the multiplication module''' # multiply maths is incorrect test will fail testCalc = BadCalculator(3,3) self.assertEqual(testCalc.multiply(), 9) # note the change to how unittest needs to be called to work in Jupyter if __name__ == '__main__': unittest.main(argv=['first-arg-is-ignored'], verbosity = 2, exit=False)
Note the argv parameter, this is needed for unittest to work within Jupyter. Verbosity can be adjusted as desired.
The process for docctest is even simpler. the tests to be used need to be set up withing the doc strings of the code under test as shown here
# here is a second really awful calculator to demonstrate doctest class OtherCalculator: def __init__ (self, num1, num2): self.number_one = num1 self.number_two = num2 def add(self): '''Returns the sum of the two numbers of the OtherCalculator item >>> check = OtherCalculator(3,3) >>> check.add() 6 ''' # maths is correct test will pass answer = self.number_one + self.number_two return answer def multiply(self): '''Returns the sum of the two numbers of the OtherCalculator item >>> check = OtherCalculator(3,3) >>> check.multiply() 9 ''' # note maths error - test will fail answer = self.number_one * self.number_two +1 return answer other_calc = OtherCalculator(2,3) print(calc.add()) print(calc.multiply())
After this, the tests can easily be run as follows
# here is the doctest code to check our code import doctest if __name__ == '__main__': doctest.testmod()
Linting
Linting is a little more awkward. Unfortunately pylint and pyflakes do not work with .ipynb files. However there is a linter called nblint which does work . It uses pycodestyle by default and it can also be configured to use pyflakes as its linting engine. Sadly it does not support pylint.
While not currently available via conda install, it is easily installed via pip:
pip install nblint
Once you have installed nblint, search and note its location (eg using Windows Explorer to seach for “nblint” on Windows machines). The linter can then be run direct from your Jupyter notebook using the %run command followed by the full path to nblint. remember to substitute forward slashes for Windows backslashes in the path. You also include the name of the file to be linted. Include the path if it is not in the current working directory.
# running nblint %run C:/Users/Justin/Anaconda3/envs/theano/Scripts/nblint Testing_Notebook.ipynb # or alternatively running nblint with pyflakes %run C:/Users/Justin/Anaconda3/envs/theano/Scripts/nblint --linter pyflakes Testing_Notebook.ipynb
Timing
we might also want to time our code to check its efficiency. This can easily be done in Juypter using two of its magic timing functions %%time and %%timeit. Both magic functions pertain only to the cell in which they occur
%%time will give you the time for a single run of your code,
%%time for i in range(100000): i = i**3
%%timeit runs the code a large number of times and gives you the mean of the fastest of 3 runs.
%%timeit for i in range(100000): i = i**3
Also you can use %timeit with a single % sign to time a single line of code
%timeit L = [i ** 3 for i in range(100000)]
Note that if you want to make full use of the timeit module’s more advanced options you will still need to import it and use it as usual.
Memory
Finally we may want to examine memory usage of our variables. To do so the following code snippet is useful.
import sys # These are the usual Jupyter objects, including this one you are creating variables = ['In', 'Out', 'exit', 'quit', 'get_ipython', 'variables'] # Get a sorted list of the objects and their sizes sorted([(x, sys.getsizeof(globals().get(x))) for x in dir() if not x.startswith('_') and x not in sys.modules and x not in ipython_vars], key=lambda x: x[1], reverse=True)
Further Information
As usual you can find the code on Github. Below some links to further information
Using unittest in Jupyter:
https://medium.com/@vladbezden/using-python-unittest-in-ipython-or-jupyter-732448724e31
Information about testing options:
https://docs.python-guide.org/writing/tests/
https://pymotw.com/2/unittest/
nblint project:
https://github.com/alexandercbooth/nblint
General Jupyter hints:
https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/
Timeit documentation:
https://docs.python.org/2/library/timeit.html
Memory usage code snippet for Jupyter:
https://stackoverflow.com/questions/40993626/list-memory-usage-in-ipython-and-jupyter