Python Database Programming: Part Two

Python database programming

Using the Eclipse IDE to access and modify a Python persistent dictionary.

In the previous article, we introduced Python database programming , the concept of persistent dictionaries, and different database modules such as dbm. In this article, we will put it all together and use the dbm module to create, access and modify a persistent dictionary.

All of the dbm modules support an open function to create a new dbm object. Once opened, you can store data in the dictionary, read data, close the dbm object as well as the associated data file/files, remove items and test for the existence of a key in the dictionary.

Python Database Programming: Creating a Persistent Dictionary

To open a dbm persistent dictionary, use the open function on the module you choose. For example, we can use this code to create a persistent dictionary with the dbm module:

import dbm

db = dbm.open('payroll', 'c')

# Add on item
db['Orioles'] = '118'
db['Yankees'] = '211'
db['Blue Jays'] = '120'

print(db['Orioles'])

# Close and save to disk
db.close()

When you run this script, you will see output like the following:

b'118'

This example, which creates a ‘payroll’ dictionary with three entries, uses the recommended dbm module. The open function requires the name of the dictionary to create. The name gets translated into the name of the data file or files that may already be on the disk. The dbm module may create more than one file (usually a file for the data and one for the index of the keys), but it does not always do this. The name of the dictionary is treated as a base file name, including the path. Usually, the underlying dbm library will append a suffix such as .dat for data. You can find the file yourself by looking for the file named payroll, most likely in your current working directory.

There is also an optional flag. The following table lists the available flags:

Flag Usage
C Opens the data file for reading and writing, creating the file if needed.
N Opens the file for reading and writing, but always creates a new empty file. If one already exists, it will be overwritten and its contents lost.
W Opens the file for reading and writing, but if the file doesn’t exist it will not be created.

You can also set another optional parameter, the mode. The mode holds a set of UNIX file permissions.

The above code is simple. First, we use the open method of the dbm module, which returns a new dbm object (db), which we can then use to store and retrieve data.

Once we open a persistent dictionary, we can write values as we normally would with Python dictionaries, as shown in this example:

db['Orioles'] = '118'

Both the key and value must be strings and cannot be other objects, like numbers or Python objects. But if you want to save an object, you can serialize it using the pickle module:

import pickle

data = {
        'Orioles' : ['118', 'Dan Duquette', 'Buck Showalter', 'Camden Yards'],
        'Yankees' : ['211', 'Brian Cashman', 'Joe Girardi', 'Yankee Stadium III'],
        'Blue Jays' : ['120', 'Alex Anthopoulos', 'John Gibbons', 'Rogers Centre']
        }

with open('data.pickle', 'wb') as f:
    pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)
    
with open('data.pickle', 'rb') as f:
    data = pickle.load(f)

Finally, the close method closes the file or files and saves the data to disk.

Python Database Programming: Accessing and Modifying the Persistent Database

With the dbm modules, you can treat the object you get back from the open function as a dictionary object. You can get and set values using code like the following:

db['key'] = 'value'
value = db['key']

Remember that the key and the value must both be text strings.

You can delete a value in the dictionary using del:

del db['key']

As with a normal dictionary, the keys method returns a list of all the keys:

for key in db.keys():
	# do something else

The keys method may take a long time to execute if there are a huge number of keys in the file. Also, this method may require a lot of memory to store the potentially large list that it would create with a large file.

Here’s a script we can use to access the persistent dictionary we created with the first script:

import dbm

# Open existing file
db = dbm.open('payroll', 'w')

# Add another item
db['Rays'] = '67'

# Verify the previous item remains
if db['Blue Jays'] != None:
    print('Found Blue Jays')
else:
    print('Error: Missing item')
    
# Iterate over the keys...may be slow
# May use a lot of memory
for key in db.keys():
    print('Key = ', key, ' value = ', db[key])
    
del db['Rays']
print('After deleting Rays, we have:')

for key in db.keys():
    print('Key = ', key, ' value = ', db[key])
    
# Close and save to disk
db.close()

When you run this script, you should see output similar to the following:

Found Blue Jays
Key =  b'Rays'  value =  b'67'
Key =  b'Orioles'  value =  b'118'
Key =  b'Yankees'  value =  b'211'
Key =  b'Blue Jays'  value =  b'120'

After deleting Rays, we have:

Key =  b'Orioles'  value =  b'118'
Key =  b'Yankees'  value =  b'211'
Key =  b'Blue Jays'  value =  b'120'

This script works with a small database of major league baseball teams and their payrolls (in millions of dollars). You need to run the first script in this article first. That example creates the dbm file and stores data in the file. This script then opens the preexisting dbm file.

The script opens the persistent dictionary payroll in read/write mode. The call to the open function will generate an error if the necessary data file or files do not exist on disk in the current directory.

From the previous example, there should be three values in the dictionary (the new script tests to see if one of them exists). This example adds the Tampa Bay Rays, with a payroll of $67 million, as another key.

The script verifies that the ‘Blue Jays’ key exists in the dictionary, using the following code:

if db['Blue Jays'] != None:
    print('Found Blue Jays')
else:
    print('Error: Missing item')

Next, the script prints out all of the keys and values in the dictionary:

for key in db.keys():
    print('Key = ', key, ' value = ', db[key])

Note that there should now be four entries.

After printing out all the entries, the script removes one using del:

del db['Rays']

The script then prints out all the keys and values again, which should result in three entries, as show in the output. Finally, the close method closes the dictionary, which involves saving all the changes to disk, so the next time the file is opened, it will be in the state we left it.

As you can see from these examples, the API for working with persistent dictionaries is very simple because it works with files and like dictionaries.

External Links:

Python Database Programming at wiki.python.org

Python Database Programming at python.about.com

Databases at docs.python-guide.org

Python Modules; Introduction to Recursion

  Python moduleIf you quit the Python interpreter and enter it again without saving your program to a text file, the definitions you have made will be lost. Therefore, if you want to write a somewhat longer program, you are better off using a text editor to prepare the input for the interpreter and running it with that file as input instead. In doing so, we create scripts. As your program gets longer, you may also want to split it into several files for easier maintenance.

Python Modules

To support this, Python provides a way of putting definitions into a file and using them in a script or in an interactive instance of the interpreter. Such a file is called a module; definitions from a Python module can be imported into other modules or into the main Python modules.

A Python module is a file containing Python definitions and statements; such a file always ends with the suffix .py. Up to this point, we have not introduced Python definitions. A definition in Python always starts with def and is followed by its name. This is the equivalent of a function in C or Pascal. For example:

def hello():
        print(‘Hello, world!’)

is a very simple Python function to print out “Hello, world!”. If this code is saved in a file called hello.py, and is in Python’s path, you can import it at the Python command line:

>>> import hello

Once the Python module is imported, you can invoke the hello function with:

>>> hello.hello()

As with C and other languages, you can specify parameters. For example:

def isZero(a):
    if a == 0:
       print('a is zero')
    else:
       print('a is a non-zero number')

You can also return a value from the function:

def isZero(a):
     if a == 0:
        return True
     else:
        return False

As in other languages such as C/C++, we can specify default parameters. For example:

def isZero(a=0):

allows us to invoke the function isZero with no arguments; the interpreter will insert a value of 0 for a if no arguments are specified. However, a non-default argument cannot follow a default argument. Thus:

def isZero(a=0,b):

is not allowed, but:

def isZero(a,b=0):

is allowed.

This is a decent start, but it would be nice if we came up with a program that can do something useful.

Introduction to Recursion

In computer science, recursion is a method where the solution to a problem depends on solutions to smaller instances of the same problem. Recursion always involves the existence of one more more base cases in which an operation can be done directly on the input data. The other cases involve invoking the same function on a subset of the input data in a divide-and-conquer strategy. The approach can be applied to many types of problems – for example, finding palindromes. It shouldn’t be too difficult to come up with a Python module to solve this problem.

A palindrome is a word, phrase, number, or other sequence of symbols or elements that read the same forward or reversed: for example, “Race car”, or “A man, a plan, a canal – Panama”. A solution of the problem for finding a palindrome using recursion can be outlined as follows:

  1. If the input string length is 0 or 1, then we have a palindrome – return true
  2. If the input string length is greater than 1 but the first and last character match, apply the test recursively to the string minus the first and last characters
  3. If [1] and [2] don’t apply, then we don’t have a palindrome – return false

Successive applications of this process on the input string will eventually yield either a mismatch between the first and last character or an input string of length 0 or 1, and the test will be complete. We can code this algorithm in Python as follows:

def isPalindrome(myString):
  ''' Simple program to find palindromes, part of our Python module
  Parameters: myString => string to perform test on
  If length <= 1, it's a palindrome - return True
  If first char is the same as the last char, apply algorithm recursively
  Otherwise return False '''
  if len(myString) <= 1:
     return True
   elif myString[0].lower() == myString[len(myString)-1].lower():
     return isPalindrome(myString[1:(len(myString)-1)])
  return False
 

Our isPalindrome function takes in a single argument, myString. We introduced two hitherto unseen functions here. len() takes a single parameter – a string – and returns the length. lower() is a member of the string class and converts the string into lowercase. This ensures that our test is not case-sensitive. You may have noticed that there is one shortcoming of this algorithm: if there are spaces or any other alphanumeric content in the string, it will return false. I decided it would be easier to write a separate function to strip the non-alphanumeric characters out:

 def convertToAlphaNum(myString):
  ''' Iterate through the string and generate an output string with only the alphanumeric chars (also part of the palindrome.py Python module)
  Parameters: myString => the string to convert
  Returns: A copy of the string with all non-alphanumeric
  characters removed '''
  retval = ''
  for c in myString:
      if c.isalpha() or c.isdigit():
          retval += c
  return retval

All this function does is iterate through the input string and if a character is a letter or digit, it gets added to a new string. When it is done, the function returns the new string. We still need a function to read input from the user and call these functions, so that will be our next bit of code, and the last function in our Python module:

def testPalindrome():
 ''' Part of the palindrome. py Python module - Prompt user for string input
 Output whether it is a palindrome or not '''
 myString = str(input('Enter a string: '))
 if isPalindrome(convertToAlphaNum(myString)):
    print(myString,'is a palindrome')
 else:
    print(myString,'is not a palindrome')

This function simply prompts the user to input a string and uses the previous functions to determine whether or not it is a palindrome, and prints the results.

Once these functions are saved to a file, you can load the file into IDLE using the File -> Open menu option (or CTRL-O). The Python module will load into a separate window; from that window, select Run -> Run Module (or press F5). Then from the main IDLE window, you can run testPalindrome():

>>> testPalindrome()
Enter a string: sample text
sample text is not a palindrome
>>> testPalindrome()
Enter a string: A man, a plan, a canal – Panama
A man, a plan, a canal – Panama is a palindrome
>>> testPalindrome()
Enter a string: No ‘x’ in Nixon
No ‘x’ in Nixon is a palindrome
>>> testPalindrome()
Enter a string: No ‘x’ in Ford
No ‘x’ in Ford is not a palindrome
>>> testPalindrome()
Enter a string: Able I was, ere saw I Elba
Able I was, ere saw I Elba is a palindrome

The source code for for these functions is available as a single Python module, via this link.

It’s not the most elegant solution, but it does seem to work. In the next article, we will continue our look at programming in Python, including a second look at using recursion to solve problems.

External Links:

Python documentation from the official Python website