Writing software is among the most complicated endeavors a human can undertake. Brian Kernigan, co-author of the AWK programming language and "K and R C", sumed up the true nature of software development in the book, Software Tools, when he stated, "Controlling complexity is the essence of software development." The harsh reality of real world software development is that software is often created with intentional, or unintentional, complexity and a disregard for maintainability, testability, and quality. The end result of this unfortunate reality is software that can become increasingly difficult and expensive to maintain and that fails sporadically and even spectacularly.
The first step in the process of writing high quality code is to re-examine the entire thought process of how an individual or team develops software. Often in failed, or troubled, software development projects, the software was developed in a reactionary stream of consciousness where the focus of the software development was on getting a problem solved in any manner possible. In a successful software project, the developer is thinking not only about how to solve the problem at hand, but additionally about the process involved in solving the problem.
A succesful software developer will devise a way to run the tests in an easily automated fashion, so they can continuously prove the software works. They are aware of the dangers of needless complexity. They are humble in their approach, seek critical review, and expect refactoring at every step of the way. They continuously think about how they can ensure their software is testable, readable, and maintainable. Although Python the language, and Python the community, are heavily influenced by desire to write clean, maintainable code that works, it is still quite easy to do the exact opposite. In this article, we will tackle this problem head on and explore how to write clean, testable, high quality code in Python.
A clean code hypothetical problem
The best way to demonstrate this style of development is to solve a hypothetical problem. Let's suppose you are a back-end web developer at a company that allows users to generate reviews, and you need to come up with a way to show and highlight small snippets of those reviews.
One way to approach the problem would be to write a large function that takes a snippet of text, and query parameters, and returns back a character limited snippet with the query parameters highlighted. All of the logic needed to solve the problem would be included in the one "mega" function,
and you would simply need to keep rerunning your script, until you got the result you wanted.
The format would probably look like the code example below and would often be developed with a combination of print statements, or logging statements, and an interactive shell.
defmy_mega_function(snippet, query)
"""This takes a snippet of text, and a query parameter and returns """#Logic goes here, and often runs on for several hundred lines#There are often deeply nested conditional statements and loops#Function could reach several hundred, if not thousands of linesreturn result
Copy codeCopied!
With a dynamic language like Python, Perl, or Ruby, it is easy to develop software by simply banging away at the problem, often interactively, until you get what seems to be the correct result and calling it a day.
Unfortunately, this approach, while tempting, often leads to a false sense of accomplishment that is fraught with danger. Much of the danger lies in not designing a solution to be testable, and part lies in not properly controlling the complexity of the software written.
How can you say this function even works? You can have faith that it works because it worked the last time you ran it during development, but are you sure it doesn't contain subtle errors of logic or syntax? What happens if you need to change the code? Would it still work, and how would you know
it still worked? What if that code needed to be maintained by another developer, and he needed to make changes to it? How would he know his changes didn't cause something subtle to break? How hard would it be for him to understand what the code does?
The short answer is: if you don't have tests, you don't know if your software works. If you stack together enough guesses, you may eventually build something that appears to function, but that no human could ever say with certainty ever worked properly. This is a bad place to be, and I have both written this software and helped debug software written this way.
Test-driven development
Fortunately, this condition is easily avoidable. Writing tests before, such as the case of test-driven development, or while you write your logic, actually shapes the way code is written. It leads to modular, extensible code that is easy to test, understand, and maintain. It is immediately apparent to the experienced developer when software was developed with testing in mind, and when it was not. The software itself looks dramatically different to the trained eye.
Measuring cyclomatic complexity of your code
Without simply taking my word for it, or visually inspecting code, there are ways to measure scientifically the difference between these two different styles. The first way is to actually measure the lines of code that are tested.
Nose is a popular extension of Python's unit test framework that includes an easy way to run automatically a batch of tests and plug-ins, such as code coverage. By measuring code coverage during development, it becomes quickly apparent that it is almost impossible to get 100 percent test coverage for code that is composed of large functions, with highly nested logic, that are built in an ad-hoc manner.
Cyclomatic complexity is a software metric that is used to determine a program's complexity. It measures the number of linearly independent paths, or branches, through source code. It is best to keep the complexity of a method below 10, because classes that have a higher complexity score are more likely to be fault-prone.
The second way to measure the difference is to use static analysis tools. There are several popular Python tools that measure various metrics for Python developers, ranging from general code quality to specific metrics, like duplicate code or complexity. You can measure the cyclomatic complexity of your code with Pygenie or Radon.
Here is an example of what it looks like when we run Pygenie on "clean" code that is relatively simple:
% python pygenie.py complexity ‑‑verbose highlight spy
File: /Users/ngift/Documents/src/highlight.py
Type Name Complexity
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑
M HighlightDocumentOperations.createsnippit 3
M HighlightDocumentOperations._reconstruct_document_string 3
M HighlightDocumentOperations._doc_to_sentences 2
M HighlightDocumentOperations._querystring_to_dict 2
M HighlightDocumentOperations._word_frequency_sort 2
M HighlightDocumentOperations.highlight_doc 2
X /Users/ngift/Documents/src/highlight.py 1
C HighlightDocumentOperations 1
M HighlightDocumentOperations.__init 1
M HighlightDocumentOperations._custom_highlight_tag 1
M HighlightDocumentOperations._score_sentences 1
M HighlightDocumentOperations._multiple_string_replace 1
Copy codeCopied!
As you can tell from the example, every method is extremely simple and contains a complexity rating under 10, which is desirable according to McCabe's research. In my experiences, I have seen "mega" functions written without testing that had complexity ratings over 140 and have stretched over 1200 lines. Suffice to say, it is literally impossible to test code like this. There is actually no way to ever know it works and refactoring it is impossible. If the author of the code kept testing in mind, and wrote the same logic with 100 percent test coverage, it is highly unlikely it would have such a high complexity rating.
A clean code hypothetical solution
Let's now take a look at a complete source code example with accompanying unit tests and functional tests and see what it actually does, and why this code is considered clean.
One reasonable definition of clean, using strictly metrics, is that it fulfills the following requirements:
It has close to 100 percent test coverage
It has a cyclomatic complexity rating of under 10 for all classes and methods
Here is an example of using Nose to test unit test and doc test coverage on the highlight module:
% nosetests ‑v ‑‑with‑coverage ‑‑cover‑package=highlight ‑‑with‑doctest\
‑‑cover‑erase ‑‑exe
Doctest: highlight.HighlightDocumentOperations._custom_highlight_tag ... ok
test_functional.test_snippit_algorithm ... ok
test_custom_highlight_tag (test_highlight.TestHighlight) ... ok
Consumes the generator, and then verifies the result[0] ... ok
Verifies highlighted text is what we expect ... ok
test_multi_string_replace (test_highlight.TestHighlight) ... ok
Verifies the yielded results are what is expected ... ok
Name Stmts Exec Cover Missing
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑
highlight 7171100%
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑
Ran 7 tests in4.223s
OK
Copy codeCopied!
As you can see from the above snippet, the nosetests command was run with several options, and there was 100 percent test coverage for the highlight spy script. The only thing of real note to point out is that --cover-package=highlight is a way of telling nose to show only the coverage report on a specified module. This is very useful to isolate the output of a coverage report to the module or packages you want to observe coverage reporting on. One thing you may want to try is to download the source code from this article and to comment out some of the tests to see how the coverage reporting mechanism really works.
highlight.py
#/usr/bin/python#‑∗‑ coding: utf‑8 ‑∗‑"""
:mod:highlight ‑‑ Highlight Methods
===================================
.. module:: highlight
:platform: Unix, Windows
:synopsis: highlight document snippets that match a query.
.. moduleauthor:: Noah Gift
Requirements::
1. You will need to install the ntlk library to run this code.
http://www.nltk.org/download
2. You will need to download the data for the ntlk:
See http://www.nltk.org/data::
import nltk
nltk.download()
"""import re
import logging
import nltk
#Globals
logging.basicConfig()
LOG = logging.getLogger("highlight")
LOG.setLevel(logging.INFO)
classHighlightDocumentOperations(object):
"""Highlight Operations for a Document"""definit(self, document=None, query=None):
"""
Kwargs:
document (str):
query (str):
"""self._document = document
self._query = query
@staticmethoddef_custom_highlight_tag(phrase,
start="<strong>",
end="</strong>"):
"""Injects an open and close highlight tag after a word
Args:
phrase (str) ‑ A word or phrase.
Kwargs:
start (str) ‑ An opening tag. Defaults to <strong>
end (str) ‑ A closing tag. Defaults to </strong>
Returns:
(str) word or phrase with custom opening and closing tags
>>> h = HighlightDocumentOperations()
>>> h._custom_highlight_tag("foo")
'foo'
>>>
"""
tagged_phrase = "{0}{1}{2}".format(start, phrase, end)
return tagged_phrase
def_doc_to_sentences(self):
"""Takes a string document and converts it into a list of sentences
Unfortunately, this approach might be a tad naive for production
because some segments that are split on a period are really an
abbreviation, and to make things even more complicated, an
abbreviation can also be the end of a sentence::
http://nltk.googlecode.com/svn/trunk/doc/book/ch03.html
Returns:
(generator) A generator object of a tokenized sentence tuple,
with the list position of sentence as the first portion of
the tuple, such as: (0, "This was the first sentence")
"""
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
sentences = tokenizer.tokenize(self._document)
for sentence inenumerate(sentences):
yield sentence
@staticmethod def_score_sentences(sentence, querydict):
"""Creates a scoring system for each sentence by substitution analysis
Tokenizes each sentence, counts characters
in sentence, and pass it back as nested tuple
Returns:
(tuple) ‑ (score (int), (count (int), position (int),
raw sentence (str))
"""
position, sentence = sentence
count = len(sentence)
regex = re.compile('|'.join(map(re.escape, querydict)))
score = len(re.findall(regex, sentence))
processed_score = (score, (count, position, sentence))
return processed_score
def_querystring_to_dict(self, split_token="+"):
"""Converts query parameters into a dictionary
Returns:
(dict)‑ dparams, a dictionary of query parameters
"""
params = self._query.split(split_token)
dparams = dict([(key, self._custom_highlight_tag(key)) for\
key in params])
return dparams
@staticmethoddef_word_frequency_sort(sentences):
"""Sorts sentences by score frequency, yields sorted result
This will yield the highest score count items first.
Args:
sentences (list) ‑ a nested tuple inside of list
(0, (90, 3, "The crust/dough was just way too effin' dry for me.
Yes, I know what 'cornmeal' is, thanks."))
"""
sentences.sort()
while sentences:
yield sentences.pop()
def_create_snippit(self, sentences, max_characters=175):
"""Creates a snippet from a sentence while keeping it under max_chars
Returns a sorted list with max characters. The sort is an attempt
to rebuild the original document structure as close as possible,
with the new sorting by scoring and the limitation of max_chars.
Args:
sentences (generator) ‑ sorted object to turn into a snippit
max_characters (int) ‑ optional max characters of snippit
Returns:
snippit (list) ‑ returns a sorted list with a nested tuple that
has the first index holding the original position of the list::
(0, (90, 3, "The crust/dough was just way too effin' dry for me.
Yes, I know what 'cornmeal' is, thanks."))
"""
snippit = total = 0for sentence inself._word_frequency_sort(sentences):
LOG.debug("Creating snippit", sentence)
score, (count, position, raw_sentence) = sentence
total += count
if total < max_characters:
#position now gets converted to index 0 for sorting later
snippit.append(((position), score, count, raw_sentence))
#try to reassemble document by original order by doing a simple sort
snippit.sort()
return snippit
@staticmethoddef_multiple_string_replace(string_to_replace, dict_patterns):
"""Performs a multiple replace in a string with dict pattern.
Borrowed from Python Cookbook.
Args:
string_to_replace (str) ‑ String to be multi‑replaced
dict_patterns (dict) ‑ A dict full of patterns
Returns:
(str) ‑ Multiple replaced string.
"""
regex = re.compile('|'.join(map(re.escape, dict_patterns)))
defone_xlat(match):
"""Closure that is called repeatedly during multi‑substitution.
Args:
match (SRE_Match object)
Returns:
partial string substitution (str)
"""return dict_patternsmatch.group(0)
return regex.sub(one_xlat, string_to_replace)
def_reconstruct_document_string(self, snippit, querydict):
"""Reconstructs string snippit, build tags, and return string
A helper function for highlight_doc.
Args:
string_to_replace (list) ‑ A list of nested tuples, containing
this pattern::
(0, (90, 3, "The crust/dough was just way too effin' dry for me.
Yes, I know what 'cornmeal' is, thanks."))
dict_patterns (dict) ‑ A dict full of patterns
Returns:
(str) The most relevant snippet with the query terms highlighted.
"""
snip = for entry in snippit:
score = entry1 sent = entry3 #if we have matches, now do the multi‑replaceif score:
sent = self._multiple_string_replace(sent,
querydict)
snip.append(sent)
highlighted_snip = " ".join(snip)
return highlighted_snip
defhighlight_doc(self):
"""Finds the most relevant snippit with the query terms highlighted
Returns:
(str) The most relevant snippet with the query terms highlighted.
"""#tokenize to sentences, and convert query to a dict
sentences = self._doc_to_sentences()
querydict = self._querystring_to_dict()
#process and score sentences
scored_sentences = for sentence in sentences:
scored = self._score_sentences(sentence, querydict)
scored_sentences.append(scored)
#fit into max characters, and sort by original position
snippit = self._create_snippit(scored_sentences)
#assemble back into string
highlighted_snip = self._reconstruct_document_string(snippit,
querydict)
return highlighted_snip
Copy codeCopied!Show more
test_highlight.py
#/usr/bin/python#‑∗‑ coding: utf‑8 ‑∗‑"""
Tests this query searches a document, highlights a snippit and returns it
http://www.example.com/search?finddesc=deep+dish+pizza&ns=1&rpp=10&findloc=\
San+Francisco%2C+CA
Contains both unit and functional tests.
"""import unittest
from highlight import HighlightDocumentOperations
classTestHighlight(unittest.TestCase):
defsetUp(self):
self.document = """
Review for their take‑out only.
Tried their large Classic (sausage, mushroom, peppers and onions) deep dish;\
and their large Pesto Chicken thin crust pizzas.
Pizza = I've had better. The crust/dough was just way too effin' dry for me.\
Yes, I know what 'cornmeal' is, thanks. But it's way too dry.\
I'm not talking about the bottom of the pizza...I'm talking about the dough \
that's in between the sauce and bottom of the pie...it was like cardboard, sorry!
Wings = spicy and good. Bleu cheese dressing only...hmmm, but no alternative\
of ranch dressing, at all. Service = friendly enough at the counters.
Decor = freakin' dark. I'm not sure how people can see their food.
Parking = a real pain. Good luck.
"""self.query = "deep+dish+pizza"self.hdo = HighlightDocumentOperations(self.document, self.query)
deftestcustom_highlight_tag(self):
actual = self.hdo._custom_highlight_tag("foo",
start="[BAR]",
end="[ENDBAR]")
expected = "[BAR]foo[ENDBAR]"self.assertEqual(actual,expected)
deftest_query_string_to_dict(self):
"""Verifies the yielded results are what is expected"""
result = self.hdo._querystring_to_dict()
expected = {"deep": "deep",
"dish": "dish",
"pizza":"pizza"}
self.assertEqual(result,expected)
deftest_multi_string_replace(self):
query = """pizza = I've had better"""
expected = """pizza = I've had better"""
query_dict = self.hdo._querystring_to_dict()
result = self.hdo._multiple_string_replace(query, query_dict)
self.assertEqual(expected, result)
deftest_doc_to_sentences(self):
"""Consumes the generator, and then verifies the result[0]"""
results = expected = (0,'\nReview for their take‑out only.')
for sentence inself.hdo._doc_to_sentences():
results.append(sentence)
self.assertEqual(results[0], expected)
deftest_highlight(self):
"""Verifies highlighted text is what we expect"""
expected = """Tried their large Classic (sausage, mushroom, peppers and onions)\
deepdish;and their large Pesto Chicken thin crust \
pizzas."""
actual = self.hdo.highlight_doc()
self.assertEqual(expected, actual)
deftearDown(self):
delself.query
delself.hdo
delself.document
if __name == '__main':
unittest.main()
Copy codeCopied!Show more
test_functional_highlight.py
"""Functional Test That Performs Some Basic Checks"""from highlight import HighlightDocumentOperations
deftestsnippitalgorithm():
document1 = """
This place has awesome deep dish pizza.
I have been getting delivery through Waiters on wheels for years.
It is classic, deep dish Chicago style pizza.
Now I found out they also have half‑baked to pick‑up and cook at home.
This is a great benefit. I am having it tonight. Yum.
"""
document2 = """Review for their take‑out only.
Tried their large Classic (sausage, mushroom, peppers and onions) deep dish;\
and their large Pesto Chicken thin crust pizzas.
Pizza = I've had better. The crust/dough was just way too effin' dry for me.\
Yes, I know what 'cornmeal' is, thanks. But it's way too dry.\
I'm not talking about the bottom of the pizza...I'm talking about the dough \
that's in between the sauce and bottom of the pie...it was like cardboard, sorry!
Wings = spicy and good. Bleu cheese dressing only...hmmm, but no alternative\
of ranch dressing, at all. Service = friendly enough at the counters.
Decor = freakin' dark. I'm not sure how people can see their food.
Parking = a real pain. Good luck."""
h1 = HighlightDocumentOperations(document1, "deep+dish+pizza")
actual = h1.highlight_doc()
print"Raw Document1: %s" % document1
print" Formatted Document1: %s" % actual
assertlen(actual) < 500assert"<strong>"in actual
h2 = HighlightDocumentOperations(document2, "deep+dish+pizza")
actual = h2.highlight_doc()
print"Raw Document2: %s" % document2print" Formatted Document2: %s" % actual
assertlen(actual) < 500assert"<strong>"in actual
if __name == "__main":
test_snippit_algorithm()
Copy codeCopied!Show more
Running pylint on the above sample code
Concerning the above code sample, if you would like to run it, you will need to download the Natural Language Toolkit source and download the nltk data according to the instructions. Since this article is not about the code sample shown but about how it was created, and how to test it, I won't go into any detail explaining what the code actually does. Instead, let's finish up by running the static code analysis tool pylint on our source code:
% pylint highlight spy
No config file found, using default configuration
∗∗∗∗∗∗∗∗∗∗∗∗∗ Module highlight
E: 89:HighlightDocumentOperations._doc_to_sentences: Instance of 'unicode' has no
'tokenize' member (but some types could not be inferred)
E: 89:HighlightDocumentOperations._doc_to_sentences: Instance of 'ContextFreeGrammar'
has no 'tokenize' member (but some types could not be inferred)
W:108:HighlightDocumentOperations._score_sentences: Used builtin function 'map'
W:192:HighlightDocumentOperations._multiple_string_replace: Used builtin function 'map'
R: 34:HighlightDocumentOperations: Too few public methods (1/2)
Report
======
69 statements analysed.
Global evaluation
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑
Your code has been rated at 8.12/10 (previous run: 8.12/10)
Copy codeCopied!
The code scored an 8.12 out of 10 and was nicked down for a few items. Pylint is configurable, so it is very likely that you may need to configure it to meet your needs on your project. You can refer to the official pylint documentation.
For this specific example, there are two errors on line 89 that can be attributed to the external library nltk, and there are two warnings that could be changed by a configuration change to pylint.
In general, you will never want to allow pylint errors in your source code, but there are some times, such as in the example above, that you may need to make an executive decision. It isn't a perfect tool, but I have found it to be very useful in the real world.
Conclusion
In this article, we explored how merely thinking about testing influences the structure of software, and how a lack of thought toward testing can prove fatally harmful to a project.
We showed a complete code example, that included both functional and unit tests, and ran it against both code coverage analysis with nose and two static analysis tools, pylint, and pygenie.
Finally, testing isn't a panacea, nor are static analysis tools. Software development is hard work. To get the chance even to be successful, we have to always be mindful of the real goal. It is not only to solve a problem, but also to create something we can prove works. If you agree with this premise, then this means that overly complex code, arrogance in design, and lack of respect for the power of Python, directly interfere with this goal.
Acknowledgements
Thanks to Kennedy Behrman, of Imagemovers Digital, for the technical review of this article.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.