Python Sets: Part One

Python sets

No, not that kind of set.

One of the core Python data types that we did not mention in earlier articles that perhaps deserves some attention is the set. Sets are a recent addition to Python; they are neither mappings (dictionaries) nor sequences (strings, lists and typles). Sets are created by calling the built-in set function or using new set literals and expressions in 3.0, and they support the usual mathematical set operations.

Introduction to Sets

Creating a set can be done two different ways:

>>> mySet = set(‘hello’)
>>> otherSet = {‘a’,’b’,’c’,’d’,’e’}

len returns the number of unique set items, so we get:
>>> len(otherSet)
5
But if we run the same operation on mySet, we get:
>>> len(mySet)
4

>>> ‘h’ in mySet
True
>>> ‘i’ in mySet
False
>>> mySet.isdisjoint(otherSet)
False
>>> newSet = set(‘he’)
newSet.issubset(mySet)
True

Mathematical set operations are generally valid on sets:

>>> mySet, otherSet
({‘o’, ‘e’, ‘l’, ‘h’}, {‘e’, ‘d’, ‘b’, ‘a’, ‘c’})

>>> mySet & otherSet
{‘e’}

>>> mySet | otherSet
{‘e’, ‘h’, ‘o’, ‘d’, ‘b’, ‘c’, ‘a’, ‘l’}
>>> mySet – otherSet
{‘o’, ‘l’, ‘h’}

There are a few other operations. <= tests whether ever element in the left operand set is in the right operand set. For example: >>> newSet <= mySet
True

But what if we want to return False if the sets are equal? Then we use <:
>> newSet < mySet

True

>>> exactcopy = set(mySet)
>>> exactcopy < mySet

False

>>> exactcopy <= mySet

True

We can flip the operand around, and check to see if the left operand set is a superset of the right operand set:

>>> mySet > newSet
True
>>> mySet >= newSet
True

Sets are mutable; you can add and remove items with the add and remove methods:

>>> otherSet.add(‘f’)
{‘e’, ‘f’, ‘d’, ‘b’, ‘c’, ‘a’}
>>> otherSet.remove(‘c’)
>>> otherSet
{‘e’, ‘f’, ‘d’, ‘b’, ‘a’}

You can also iterate over a set. For example:
>>> for s in mySet:
print(s)

o
e
l
h

Since sets are mutable, there are some things we cannot do with them: for example, we cannot use them as dictionary keys. But objects of type frozenset are immutable. Therefore:

>>> unchangeableSet = frozenset(‘abc’)

Although we can’t add and remove items, we can perform the usual set operations on frozensets, or a combination of sets and frozensets:

>>> mySet | unchangeableSet
{‘o’, ‘e’, ‘b’, ‘h’, ‘c’, ‘a’, ‘l’}

For an example of using frozenset to create keys for a dictionary, here’s a sample:

>>> keySet = frozenset(‘abc’)
>>> names = [ ‘Able’, ‘Baker’, ‘Charlie’ ]
>>> myDict = { }
>>> i = 0
>>> for s in keySet:
myDict[s] = names[i]
i += 1

>>> print(myDict)
{‘a’: ‘Able’, ‘b’: ‘Baker’, ‘c’: ‘Charlie’}

Here, we created an immutable set called keySet, and a list of items to put in our dictionary. We iterate through the set, mapping items in the list to keys in keySet. When we print out the results, we see that each item was successfully mapped to a key.

In the next article, we will continue our look at sets.

External Links:

Set, frozenset at docs.python.org – Official documentation on sets and frozensets for Python 3.4

Python Debugging: An Overview

debugging

The Eclipse debugger in action.

At some point, one of your programs is likely to do something you did not expect it to do, and in such a scenario, you are going to have to debug your code. Fortunately, whether you are using the command line Python interpreter, the IDLE interface, or a third-party IDE, there are multiple options for debugging.

Debugging Options

One option is to rely on the error messages Python provides. If you already know Python, and especially if you understand your own code, this is often enough: you just read the error message and go and fix the tagged line and file. It may not always be a good option, however, for larger programs you did not write.

Another possibility, and probably the main way that Python programmers use for debugging their code is to insert print statements and run the program again. Because Python runs immediately after changes, this is usually the quickest way to get more information than error messages provide. Typically, a display of variable values is enough to provide the information you need.

For larger systems you did not write, and for beginners who want to trace code in more detail, most Python development GUIs have some sort of point-and-click debugging support. IDLE has a debugger as well, but it does not appear to be used very often (possibly because it has no command line, or possibly because adding print statements is usually quicker than setting up an IDLE GUI debugging session. But other IDEs such an Eclipse, NetBeans, Komodo and Wing IDE offer advanced point-and-click debuggers.

Python comes with a source-code debugger named pdb, available as a module in Python’s standard library. In pdb, you type commands to step line by line, display variables, set and clear breakpoints, or continue to a breakpoint or error. It can be launched interactively by importing it, or as a top-level script. Either way, it provides a powerful debugging tool.

To provide a real-world example of debugging to solve a coding problem, I introduced a bug into the binary sort code. The code is provided here; some readers may be able to find the bug just by looking at the code. In any case, when we run the code, we get the following result:

[None, None, 17, 144, 372, 51, 79, 123, 519, 997]

This result is wrong for at least two reasons. First, two list members have been replaced with “None”. Secondly, the list is not sorted. Fortunately, this gives us an opportunity to use our debugging skills. The heapSort function does the following:

  1. Creates a binary heap.
  2. While there is at least one item on the heap, delete the top item from the heap. If there is remaining heap data, rebuild the heap, consolidate the heap, and repeat this step.

A cursory review of the code seems to indicate that we correctly deleted the top item and re-inserted it after the heap. The error must be in heapify, rebuildHeap, or consolidateHeap. For starters, I decided to set a breakpoint after heapify is called and check to see if it returns a valid binary heap. If it does, we can be pretty sure the error is in rebuildHeap or consolidateHeap.

In Eclipse, I set a breakpoint in the second line of heapSort by right mouse-clicking on the left column next to that line of code (the column containing numbers indicating what line of code it is) and selecting “Add Breakpoint). Now I was able to debug the code by pressing “F11”. Eclipse now presents a dialog box asking if I want to switch to the debug perspective. Since I do, I press “Yes” and continue.

In the upper right corner of the Debug perspective, there is a section with two tabs: “Variables” and “Breakpoints”. The “Variables” tab provides us the information about local variables, including the current values stored in L. We can confirm that the list items form a valid binary heap:

        997
      /     \
    519     239
   /  \     / \
  144  372 17 29
  / \   /
79  51 123

It does not appear that the problem is in heapify, so it must be in either rebuildHeap or consolidateHeap. To find the bug, we will first step through the heapSort function line by line, by pressing F6.

As soon as we press F6 the first time, the variable high shows up on the variable list, since it has now made its first appearance in the code. The same goes for temp, when it makes its first appearance two lines later. The next line deletes the top node, and the change is reflected in the values for L shown on the “Variables” tab. All is well so far.

But after we run rebuildHeap for the first time, it is a different story. Our “heap” looks like this:

          519
         /   \
      None    29
      /  \    / \
    144  372 17 None
    / \   /
   79 51 123

It looks like rebuildHeap has failed us. Nodes marked “None” should not have children. The next step will be to remove the breakpoint from heapSort and put a breakpoint in rebuildHeap. Then we will restart the program. We terminate the program by pressing “CTRL-F2” and then pressing “F11” again.

Once we do this, we can continue debugging and begin stepping through the source code again. Looking at the “Variables” tab, we can see that lnode (left node) is initially set to 1, the correct value for the left node of the root node. The next line checks to see if there is at least one heap item and a left node. There is, so we proceed to the next line. We have to promote either the left node or the right node. We promote the left node if it is greater than the right node or if there is no right node. If so, we assign the left node’s value to the root node, mark the left node as empty, and run rebuildHeap recursively on the left node. When we reach the recursive rebuildHeap call, we will use “F5” to step into the second copy of rebuildHeap.

Once we start stepping into the second rebuildHeap, we can see something is wrong. lnode is now 5, but as the lnode of the lnode of the root, it should be 3. This is because root is equal to 2 when it should be 1. Looking at the first recursive rebuildHeap call in the function, we see that the second parameter is lnode+1 (which calls the function recursively on the right node), when it should be lnode (to call the function recursively on the left node). If we keep stepping through the code, rebuildHeap will perform operations on the wrong nodes and tamper with the integrity of heap data that should be kept intact. Here, it will promote one of the children of the right node, and erase data, as we can see by viewing list L’s values just before the next rebuildHeap call:

[519, None, 29, 144, 372, 17, None, 79, 51, 123]

In other words, it looks this is what accounts for the error in rebuildHeap. We could continue to step through the code, but to save time, let’s change the second rebuildHeap parameter at line 56 to lnode, remove the breakpoint, and see what happens.

Running the program after that modification results in the following output:

[17, 29, 51, 79, 123, 144, 239, 372, 519, 997]

Thus, it seems as if we have isolated and fixed the bug that was causing the error we were seeing earlier. This does not guarantee that our code is bug-free, but if there are other bugs to be found, we can have some confidence that the methods used here can be applied to finding them.

In this example, we used the Eclipse debugger for debugging, but don’t let this discourage you from using whatever tools you want. The Winpdb system, for example, is an open source stand-alone debugger with advanced debugging support and cross-platform GUI and console interfaces. These options become more important as you start writing larger scripts, but at least Python’s debugging support makes the process of finding errors a lot easier than they might be otherwise.

External Links:

The official Eclipse web site – An excellent IDE with good debugging capabilities

The official Winpdb web site

Classes and Inheritance: Part Two

classesIn the previous article, we introduced Python classes, discussed some of their features, and some of the similarities and differences from C++ classes. In this article, we will use our knowledge of C++ classes to rewrite our hash functions as a Python class.

Writing the HashObject Class

Our first job is to write the class constructor. Since it makes sense to initialize our list when the class is first instantiated, I decided to incorporate the createBuckets() function as part of the constructor:

class HashObject(object):
    def __init__(self, num):
        self.numBuckets = num
        self.hashSet = []
        for i in range(num):
            self.hashSet.append([])

This constructor takes one parameter (num), and assigns it to numBuckets, and then initializes our list.

Next, we want to define hashElement, which takes a number and hashes it to an index in hashSet. This function is the same as before, except that we do not have to pass the number of buckets to it:

def hashElement(self, elem):
        if type(elem) == int:
            return elem%self.numBuckets

The new insert function requires only one parameter: the list item to be inserted into one of the buckets:

def insert(self, i):
        self.hashSet[self.hashElement(i[0])].append(i)

Note that it is assumed that the first element of the list to be stored is assumed to be the key. Next, we need a remove function. We will take one parameter: the key value of the item to be deleted:

def remove(self, key):
        newElement = []
        for j in self.hashSet[self.hashElement(key)]:
            if j[0] != key:
                newElement.append(j)
        self.hashSet[self.hashElement(key)] = newElement    

Finally, we need to write the function to check to see if an item with a certain key exists:

def member(self, key):
        return i in self.hashSet[hashElement(key)]

One additional member function we may want to define is an accessor function for hashSet so we do not have to directly access the list, a good practice, since data hiding is one of the objectives of object-oriented programming:

def getElement(self, key):
        for j in self.hashSet[self.hashElement(key)]:
            if j[0] == key:
                return j

As you can see, we iterate through each list item in the bucket the key value hashes to (multiple items may be stored in a single bucket). When we find an item whose first value equals the key, we return that item.

With variables and member functions for HashObject defined, we can rewrite hashTest to test our class:

def hashTest():
    h = HashObject(51)
    h.insert([2175,'Homer Simpson','Technician',20])
    print(h.getElement(2175))
    h.insert([4158,'Elmer Higgins','Engineer',17])
    print(h.getElement(4158))
    h.insert([2583,'Waylon Smithers','Assistant',25])
    print(h.getElement(2583))
    h.remove(2175)
    print(h.getElement(2175))
    print(h.getElement(2583))

Running this function results in the following output:

[2175, ‘Homer Simpson’, ‘Technician’, 20]
[4158, ‘Elmer Higgins’, ‘Engineer’, 17]
[2583, ‘Waylon Smithers’, ‘Assistant’, 25]
None
[2583, ‘Waylon Smithers’, ‘Assistant’, 25]

In this function, we first create a hash set with 51 buckets. We insert the first record (Homer Simpson’s), and print it out, confirming its successful insertion. We insert a second record, also printing it out to confirm its insertion. Then we enter a third item (which, incidentally, hashes to the same location as the first item), and print it out, confirming its insertion and also confirming that we can successfully iterate through a bucket. Finally, we remove the first record and try to print it out. Since the getElement function will not return a value for a nonexistent record, the print call prints out “None”, for no value. Finally, we successfully print out the third record.

I have made HashObject and HashTest available for download here, so you can download it, run it and modify it to your heart’s content.

External Links:

Classes Tutorial on python.org

Class (computer programming) on Wikipedia

Python Programming: Part Three (Conditional Statements; Lists, Tuples and Dictionaries)

Python listIn the previous article, we covered variables, and how to save and run Python modules. In this article, we will introduce some more basic concepts including statements that alter the control flow of Python, and lists, tuples and dictionaries.

Up to this point, we have been executing Python statements using a linear control flow. A programming language is of minimal value, however, without conditional statements. Python provides three such statements: if, elif, and else. For example:

>>> if x < 0:
                print(‘x is a negative number’)

In this example, we used the comparison operator “<” (less than) to test to see if x is less than 0. If x is less than zero, we will print a statement that tells us x is a negative number. We might also want to print something out if x is zero or positive:

>>> if x < 0:
               print(‘x is a negative number’)
         elif x == 0:
                print(‘x equals zero’)
         else:
                print(‘x is a positive number’)

Here, the first two lines are the same, but if x is not less than zero, we test it to see if it is equal to zero and if it is, we print a message. If x is not less than zero or zero, we assume it is positive and print another message. Another useful control flow statement is the for statement. The for statement in Python differs a bit from the for statement in C and Pascal. Rather than always iterating over an arithmetic progression of numbers, as in Pascal, or giving the user the ability to define both the iteration step and halting condition as in C, Python’s for statement iterates over the items of any sequence in the order that they appear in the sequence. For example:

>>> word = ‘coffee’
>>> for c in word:
                 print(c)

This will print every letter in the string word on a separate line. If you need to iterate over a sequence of numbers, the built in function range() comes in handy. It generates arithmetic progressions:

>>> for i in range(10):
                 print(i,’ ‘,i**2)

This code will print out two columns: one containing integers 0 through 9, the other containing their squares. If you specify only a single parameter, 0 will be the lower bound and the specified parameter will be the upper bound. But you can specify a lower bound:

>>> for i in range(5,10):
                print(i,’ ‘,i**2)

This code will print out integers 5 through 9 and their squares. We can also specify a step value:

>>> for i in range(1,10,2):
                 print(i,’ ‘,i**2)

This code will print out all odd-numbered integers from 1 through 9 and their squares.

Lists, Tuples and Dictionaries

In C, there are arrays, which act as a compound data type. In Python, there are two sequence types not yet covered: lists and tuples. There are also dictionaries, which is not a sequence type, but is nonetheless very useful.

Lists are collections of some data type. Creating a list can be done quite easily:

>>> mylist = [5, 10, 15]

This will create a list with three items in it. We can iterate through the list as well:

>>> for i in mylist:
                 print(i)

This will print each item in mylist on a separate line. You can use the assignment operator on a list:

>>> otherlist = mylist

Now we will be able to access the list, but it is important to note that otherlist is not a separate copy of mylist. It also points to mylist. So this statement:

>>> otherlist[0] = 100

will alter the contents of mylist:

>>> print(mylist)
[100, 10, 15]

You can also nest a list within a list. For example:

>>> secondlist = [1, 2, 3, 4]
>>> mylist[0] = secondlist

nests secondlist inside mylist as the first element.

You can also create an empty list, by specifying an empty set of brackets in the assignment, like this:

mylist = []

Lists have a built-in method called append that can be used to append an item to the end of a list:

>>> mylist.append(125)

appends 125 to the end of the list.

Whereas a list is a mutable sequence data type, a tuple is an immutable sequence data type. A tuple consists of a number of values separated by commas. For example:

>>> tu = 1, 2, 3

A tuple need not contain all the same data type. The following statement is valid:

>>> tu = 5, 10, 15, ‘twenty’

To confirm that tuples are immutable, we can try to change one of the items:

>>> tu[0] = 121
Traceback (most recent call last):
File “”, line 1, in
TypeError: ‘tuple’ object does not support item assignment

On output, tuples are always enclosed in parentheses, so that nested tuples are interpreted correctly; they may be input with or without surrounding parenthesis, although often parenthesis are necessary anyway if the tuple is part of a larger expression. It is not possible to make an assignment to an individual item of a tuple; however, it is possible to create tuples which contain mutable objects, such as lists.

Another useful data type built into Python is the dictionary. Unlike sequences, which are indexed by a range of numbers, dictionaries are indexed by keys, which can be any immutable type. Strings and numbers can always be keys. Tuples can be used as keys if they contain only strings, numbers, or tuples. If a tuple contains any mutable object either directly or indirectly, it cannot be used as a key. You cannot use lists as keys, since lists can be modified.

You can think of dictionaries as unordered sets of key: value pairs, with the requirement that the keys are unique within one dictionary. A pair of braces creates an empty dictionary. Placing a comma-separated list of key:value pairs within the braces adds initial key:value pairs to the dictionary.

Here is an example of a dictionary:

>>> generals = { ‘Germany’: ‘Hindenburg’, ‘Russia’: ‘Samsonov’, ‘France’: ‘Nivelle’, ‘United Kingdom’: ‘Haig’ }
>>> print(generals[‘Germany’])
‘Hindenburg’
>>> del generals[‘Russia’]
>>> generals[‘France’] = ‘Foch’
>>> print(generals)
{‘Germany’: ‘Hindenburg’, ‘France’: ‘Foch’, ‘United Kingdom’: ‘Haig’}

You can use for to iterate through a dictionary. For example:

>>> for c in generals.keys()
print(generals[c])

Hindenburg
Foch
Haig

In the next article, we will introduce the concept of modules, and write our first function.

External Links:

Python documentation from the official Python website

How to Install Python

Install Python

Running the Python command line.

If you want to run a Python script and you don’t have Python on your system already, fear not. There are a number of options available to install Python, covering a wide variety of different platforms.

How to Install Python: From python.org

Probably the easiest way to install Python is to go to the official Python site’s download page (http://www.python.org/downloads/). Select the release version you want to download, and select the appropriate operating system. The source is also available as a gzipped tarball. If you are running Linux, you may already have Python installed. Ubuntu, for example, includes Python by default. If not, you should be able to install Python from the repositories by typing:

sudo apt-get install python

When prompted, type the superuser password, and installation will begin.

Install Python

IDLE, the official Python GUI.

Once Python is installed, you should be able to invoke the Python interpreter at the command line by typing “python“. Alternatively, you can create a shortcut for Python on your desktop and click on the icon. Clicking on “pythonw” should start IDLE, the Python GUI. You may, however, have problems running IDLE on certain versions of Windows. I myself could not start pythonw under Windows 8.1. If you have the same problem, you might consider using an alternative method for starting IDLE:

  1. Start the Python command line. You can do this [1] by launching a command prompt and typing “python“; [2] by double-clicking “python.exe” in the File Explorer; [3] by right-mouse clicking on the Windows logo in the lower left corner of the desktop, selecting “Run” and launching python.exe from “Run“, or [4] creating a desktop icon for Python and clicking on it.
  2. In the Python interpreter, type “import idlelib.idle” and press <ENTER>. IDLE should now launch.

Whether you are using the basic Python command interpreter or IDLE, you can start typing in Python code. Typing and executing the Python Hello World program is almost trivial:

>>> print(‘Hello, world!’)
Hello, world!

As you can see, the Python interpreter runs the program without the need to compile the code, one of the advantages of using Python. Using IDLE provides the following advantages over the standard Python shell:

  • The ability to save and load Python source code files
  • The ability to cut, copy and paste text, and search
  • The ability to debug programs, using IDLE’s debugger and stack viewer

How to Install Python: The Eclipse IDE

Install Python

Using the PyDev plugin in the Eclipse IDE to write and run a Python script.

Many users will likely find the stock Python installation more than adequate. For those looking for a more elegant solution, however, I suggest the Eclipse IDE. Eclipse is an integrated development environment originally developed by IBM Canada, and currently being developed by the Eclipse Foundation. It contains a base workspace and an extensible plug-in system for customizing the environment. The base system is only useful for Java development, but with plug-ins, you can extend Eclipse to develop applications in other programming languages, including C/C++, COBOL, JavaScript, Perl, PHP – and Python.

The official Eclipse website can be found at eclipse.org; here you can download the latest version. As of this writing, the current version is 4.4, and there are versions for Windows (32 and 64-bit), Linux and Mac OS X. Under Linux, you can also install Eclipse from the repositories by typing:

sudo apt-get install eclipse

and typing the superuser password when prompted.

Once Eclipse is installed, you still need to install the Python plugin. I use the PyDev Python IDE for Eclipse, which you can find at: http://marketplace.eclipse.org/content/pydev-python-ide-eclipse. Some of the features of PyDev include:

  • Python, Jython and IronPython support
  • Django integration
  • Code completion
  • A debugger
  • Code analysis

To install PyDev, launch Eclipse, and from the top menu, navigate to Help -> Install New Software to launch the Install New Software dialog box. In the “Work with: ” edit box at the top, type:

PyDev and PyDev Extensions – http://pydev.org/updates

and press the “Add” button. “PyDev” and “PyDev Mylyn Integration (optional)” should appear in the list box. Check the software packages you want to install (PyDev is required if you want to use PyDev to use Python with Eclipse; Mylyn is an optional application lifecycle management framework that serves as a task management tool for Eclipse). When you have made your selection, press the “Next” button; it should take a minute for Eclipse to calculate file sizes and dependencies. Review the information on the next page and press “Next” again. On the subsequent page, click on the radio button to agree to the software license terms and press “Finish“. PyDev should now be installed.

Now, you should be able to start a Python project in Eclipse by navigating to File -> New -> PyDev Project. You need to give your project a name, and select a grammar version (default is 2.7), but it should be easy to create a project. Once you have created a project, you can right-mouse click on the project, and select New -> File to create a new Python source code file that will be added to the project.

Although I covered how to install Python using the stock Python installation as well as Eclipse in this article, I barely scratched the surface in terms of what is available for Python development. I prefer Eclipse, but it may behoove you to do your own research and find the tools which best fit your needs.

External Links:

The official Python site

The official Eclipse site