Python Database Programming: Part Two

Python database programming

Using the Eclipse IDE to access and modify a Python persistent dictionary.

In the previous article, we introduced Python database programming , the concept of persistent dictionaries, and different database modules such as dbm. In this article, we will put it all together and use the dbm module to create, access and modify a persistent dictionary.

All of the dbm modules support an open function to create a new dbm object. Once opened, you can store data in the dictionary, read data, close the dbm object as well as the associated data file/files, remove items and test for the existence of a key in the dictionary.

Python Database Programming: Creating a Persistent Dictionary

To open a dbm persistent dictionary, use the open function on the module you choose. For example, we can use this code to create a persistent dictionary with the dbm module:

import dbm

db = dbm.open('payroll', 'c')

# Add on item
db['Orioles'] = '118'
db['Yankees'] = '211'
db['Blue Jays'] = '120'

print(db['Orioles'])

# Close and save to disk
db.close()

When you run this script, you will see output like the following:

b'118'

This example, which creates a ‘payroll’ dictionary with three entries, uses the recommended dbm module. The open function requires the name of the dictionary to create. The name gets translated into the name of the data file or files that may already be on the disk. The dbm module may create more than one file (usually a file for the data and one for the index of the keys), but it does not always do this. The name of the dictionary is treated as a base file name, including the path. Usually, the underlying dbm library will append a suffix such as .dat for data. You can find the file yourself by looking for the file named payroll, most likely in your current working directory.

There is also an optional flag. The following table lists the available flags:

Flag Usage
C Opens the data file for reading and writing, creating the file if needed.
N Opens the file for reading and writing, but always creates a new empty file. If one already exists, it will be overwritten and its contents lost.
W Opens the file for reading and writing, but if the file doesn’t exist it will not be created.

You can also set another optional parameter, the mode. The mode holds a set of UNIX file permissions.

The above code is simple. First, we use the open method of the dbm module, which returns a new dbm object (db), which we can then use to store and retrieve data.

Once we open a persistent dictionary, we can write values as we normally would with Python dictionaries, as shown in this example:

db['Orioles'] = '118'

Both the key and value must be strings and cannot be other objects, like numbers or Python objects. But if you want to save an object, you can serialize it using the pickle module:

import pickle

data = {
        'Orioles' : ['118', 'Dan Duquette', 'Buck Showalter', 'Camden Yards'],
        'Yankees' : ['211', 'Brian Cashman', 'Joe Girardi', 'Yankee Stadium III'],
        'Blue Jays' : ['120', 'Alex Anthopoulos', 'John Gibbons', 'Rogers Centre']
        }

with open('data.pickle', 'wb') as f:
    pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)
    
with open('data.pickle', 'rb') as f:
    data = pickle.load(f)

Finally, the close method closes the file or files and saves the data to disk.

Python Database Programming: Accessing and Modifying the Persistent Database

With the dbm modules, you can treat the object you get back from the open function as a dictionary object. You can get and set values using code like the following:

db['key'] = 'value'
value = db['key']

Remember that the key and the value must both be text strings.

You can delete a value in the dictionary using del:

del db['key']

As with a normal dictionary, the keys method returns a list of all the keys:

for key in db.keys():
	# do something else

The keys method may take a long time to execute if there are a huge number of keys in the file. Also, this method may require a lot of memory to store the potentially large list that it would create with a large file.

Here’s a script we can use to access the persistent dictionary we created with the first script:

import dbm

# Open existing file
db = dbm.open('payroll', 'w')

# Add another item
db['Rays'] = '67'

# Verify the previous item remains
if db['Blue Jays'] != None:
    print('Found Blue Jays')
else:
    print('Error: Missing item')
    
# Iterate over the keys...may be slow
# May use a lot of memory
for key in db.keys():
    print('Key = ', key, ' value = ', db[key])
    
del db['Rays']
print('After deleting Rays, we have:')

for key in db.keys():
    print('Key = ', key, ' value = ', db[key])
    
# Close and save to disk
db.close()

When you run this script, you should see output similar to the following:

Found Blue Jays
Key =  b'Rays'  value =  b'67'
Key =  b'Orioles'  value =  b'118'
Key =  b'Yankees'  value =  b'211'
Key =  b'Blue Jays'  value =  b'120'

After deleting Rays, we have:

Key =  b'Orioles'  value =  b'118'
Key =  b'Yankees'  value =  b'211'
Key =  b'Blue Jays'  value =  b'120'

This script works with a small database of major league baseball teams and their payrolls (in millions of dollars). You need to run the first script in this article first. That example creates the dbm file and stores data in the file. This script then opens the preexisting dbm file.

The script opens the persistent dictionary payroll in read/write mode. The call to the open function will generate an error if the necessary data file or files do not exist on disk in the current directory.

From the previous example, there should be three values in the dictionary (the new script tests to see if one of them exists). This example adds the Tampa Bay Rays, with a payroll of $67 million, as another key.

The script verifies that the ‘Blue Jays’ key exists in the dictionary, using the following code:

if db['Blue Jays'] != None:
    print('Found Blue Jays')
else:
    print('Error: Missing item')

Next, the script prints out all of the keys and values in the dictionary:

for key in db.keys():
    print('Key = ', key, ' value = ', db[key])

Note that there should now be four entries.

After printing out all the entries, the script removes one using del:

del db['Rays']

The script then prints out all the keys and values again, which should result in three entries, as show in the output. Finally, the close method closes the dictionary, which involves saving all the changes to disk, so the next time the file is opened, it will be in the state we left it.

As you can see from these examples, the API for working with persistent dictionaries is very simple because it works with files and like dictionaries.

External Links:

Python Database Programming at wiki.python.org

Python Database Programming at python.about.com

Databases at docs.python-guide.org

Python File I/O

Python file I/OIn the previous articles, we covered quite a bit of rudimentary Python programming, but we haven’t covered on of the most important elements of any programming language: the ability to perform file I/O. In this article, we will cover file operations, and put it into practice by using Python file I/O to generate a file of numbers to sort with the quicksort algorithm developed in the previous article.

Python File I/O: Simple Commands

To open a file, we use open(), which returns a file object, and is commonly used with two arguments: open(filename, mode).

>>> f = open(‘mylist.txt’,’w’)

The first argument is a string containing the filename. The second argument is another string describing how the file will be used. ‘r’ is specified when the file will only be read, ‘w’ for only writing (an existing file with the name name will be erased), and ‘a’ opens the file for appending. Any data written to the file is automatically added to the end. ‘r+’ opens the file for both reading and writing. The mode argument is optional: ‘r’ is the default value if it is omitted.

On Windows, ‘b’ appended to the mode opens the file in binary mode. Python on Windows makes a distinction between text and binary files; the end-of-line characters in text files are automatically altered slightly when data is read or written. This modification is OK for ASCII text files, but it will corrupt binary data like that in JPEG or EXE files. On Unix, it doesn’t hurt to append a ‘b’ to the mode, so you can use it platform-independently for all binary files.

To read a file’s content, call the read() function of the object returned by open() (in the above example, f). read() reads some quantity of data and returns it as a string. When size is omitted or negative, the entire contents of the file, so you probably want to specify a number of bytes if the file is large. If the end of the file has been reached, f.read() will return an empty string:

>>> f.read()
‘This is the contents of a text file.\n’

readline() reads a single line from the file; a newline character is left at the end of the string. It is only omitted on the last line of the file if the file does not end in a newine. Thus if f.readline() returns an empty string, the end of file has been reached, while a blank line is represented by ‘\n’.

For readling lines from a file, you can loop over the file object:

>>> for line in f:
print line

If you want to read all the files of a file, you can also use list(f) or f.readlines().

That covers read operations. write(string) writes the contents of string to the file, returning None.

To write something other than a string, it needs to be converted to a string first; e.g.:

>>> value = (‘The sum is ‘,101)
>>> s = str(value)
>>> f.write(s)

f.tell() returns an integer giving the file object’s current position in the file, measured in bytes from the beginning of the file. To change the file’s object position, we use f.seek(offset, param). The position is computed from adding offset to a reference point. The reference point is selected by the param argument. A param value of 0 measures from the beginning of the file; 1 uses the current file position, and 2 uses the end of the file as the reference point. The second parameter can be omitted; the default value is 0, using the beginning of the file as a reference point. You may have noticed that the behavior of this function is similar to that of the fseek() function in C/C++. When you’re done with a file, call f.close() and free up any system resources taken up by the open file. After calling f.close(), attempts to use the file will automatically fail.

Python File I/O: Serialization

Finally, you can write data to a file easily with serialization. The read() function only returns strings, and when you want to save complex data types like nested lists and dictionaries, it can become unwieldy. Python, however, allows you to use the popular data interchange format called JSON (JavaScript Object Notation). The standard module called json can take Python data hierarchies, and convert them to string representations. This process is called serializing, and reconstructing the data from the string representation is called deserializing. To write an object x to a file object f opened for writing, we use:

json.dump(x, f)

And to decode the object, we use the following:

x = json.load(f)

This simple serialization technique can handle lists and dictionaries without any additional effort, but serializing arbitrary class instances in JSON requires some extra work.

Here’s a chance to apply what we know about Python file I/O to generate a file containing randomly-generated numbers. We will then read the data into a list, and use quicksort to sort the list. We start with a function to generate the file:

def generateNumbers(filename,num):
'''Use Python file I/O to generate a file of name 
filename and fill it with num randomly-generated 
integers of range 1 to num'''
temp = []
f = open(filename,'w')
for i in range(0,num):
temp.append(int((random.random()*num)+1))
json.dump(temp,f)
f.close()

As you can see, I used JSON to save the data to the file, which should make the process of reading the numbers in easier. Now for the function to read in the data:

def readList(filename):
'''Use Python file I/O to read in a file of name 
filename using serialization'''
retval = []
f = open(filename,'r')
temp = []
try:
retval = json.load(f)
except EOFError:
print('EOF error')
f.close()
return retval

The following code uses Python file I/O to generate a list of 1000 integers, reads in the same list, sorts the list and prints out both the size of the list along with the sorted list:

generateNumbers('mylist.lst',1000)
myList = readList('mylist.lst')
quickSort(myList)
print('Size of myList = ',len(myList))
print(myList)

We could save the list of integers as ASCII text, but if we did, we would have to parse the list when we read it back in, which would be somewhat cumbersome. Serialization makes the process easy, and it is undoubtedly a technique we will use in the future.

External Links:

Reading and writing Files at docs.python.org