Python Database Programming: Part One

Python databaseMost large enterprise-level systems use databases for storing data. In order for Python to be capable of handling these types of enterprise applications, the language must be able to access databases.

For Python database programming, Python provides a database Application Programming Interface (API) that enables you to access most databases regardless of the databases’ native API. Although minor differences exist between different implementations of databases, for the most part you can access databases such as Oracle or MySQL from your Python scripts without worrying too much about the details of the specific databases. There are two main database systems supported by Python: dbm persistent dictionaries and relational databases with the DB API. Moreover, you can use add-ons such as MySQL-python to make direct database queries from within your Python scripts.

Python Database Programming: Persistent Dictionaries

A persistent dictionary, as the name suggests, is a Python dictionary that can be saved to disk. You store name/value pairs in the dictionary, which is saved. Thus, if you save data to a dictionary that’s backed by a dbm, the next time you start your program, you can read the value stored under a given key again, once you’ve loaded the dbm file. The dictionaries work like normal Python dictionaries; you might recall that the syntax of a statement creating a dictionary looks something like this:

payroll = { ‘Orioles’: 118, ‘Yankees’: 211, ‘Blue Jays’: 120 }

With a persistent dictionary, the main difference is that the data is written to and read from disk. An additional difference is that the keys and the values must both be strings; therefore our above example would have to be rewritten:

payroll = { ‘Orioles’: ‘118’, ‘Yankees’: ‘211’, ‘Blue Jays’: ‘120’ }

Python Database Programming: Modules

Python supports a number of dbm modules for Python database programming. Each dbm module supports similar interface and uses a particular C library to store the data to disk. The difference is in the underlying binary format of the data files on disk.

DBM, short for database manager, acts as a generic name for a number of C language libraries originally created on UNIX systems. The names of these libraries (e.g. dbm, gdbm, etc.) correspond closely to the available modules that provide the needed functionality within Python.

Python supports a number of dbm modules, each of which supports a similar interface and uses a particular C library to store the data. The underlying binary format of each module is different. As a result, each dbm module creates incompatible files. If you create a dbm persistent dictionary with one dbm module, you must use the same module to read the data. None of the other modules will work with a data file created by another module.

Module Description
dbm Chooses the best dbm module
dbm.dumb Uses a simple, but portable, implementation of the dbm library
dbm.gnu Uses the GNU dbm library

Originally, this library was only available with the commercial versions of UNIX. This led to the creation of alternative libraries: e.g. the Berkeley UNIX library and GNU’s gdbm.

With all the incompatible file formations, all these libraries can be an issue. But by using the dbm module, you can sidestep this issue. The dbm module will choose the best implementation available on your system when creating a new persistent dictionary. When it reads a file, the dbm module uses the whichdb function to make an informed guess as to which library created the database. It is usually good practice to use the dbm module, unless you need to use a specific feature of one of the dbm libraries.

In the next article on Python database programming, we’ll start to cover the nuts and bolts of programming using the dbm module in Python.

External Links:

Python Database Programming at wiki.python.org

Python Database Programming at python.about.com

Databases at docs.python-guide.org