Python Database Programming: Part Six

Python Database ProgrammingIn the previous article, we covered creating and accessing databases with Sqlite. In this article, we continue our look at the Python database APIs.

You must download a separate DB API module for each database you need to access. Modules exist for most major databases with the exception of Microsoft’s SQL Server. You can access SQL Server using an ODBC (Open Database Connectivity) module, though. In fact, the mxODBC module can communicate with most databases using ODBC on Windows or an OBDC bridge on UNIX or Linux.

A complete listing of databases is available at the official Python wiki. You just need to download the modules you need, and follow the instructions that come with the modules to install them. You may need a C compiler and build environment to install some of the database modules. If you do, it will be described in the module’s own documentation. For some databases, such as oracle, you can choose among a number of slightly different modules.

Python Database Programming: Creating a Connection Object 

A Connection object provides the means to communicate from your script to a database program. Note the major assumption here that the database is running in a separate process (or processes). The Python database modules connect to the database. They do not include the database application itself.

Each database module needs to provide a connect function that returns a connection object. The parameters that are passed to connect vary by the module and what is required to communicate with the database.

Parameter Usage
Dsn Data source name. This usually includes the name of your database and the server where it’s running.
Host Host, or network system name, on which the database runs.
Database Name of the database.
User User name for connecting to the database.
Password Password for the given user name.

The above table summarizes some of the more common parameters. For example, here’s a typical connect statement:

conn = dbmodule.connect(dsn='localhost:MYDATABASE', user='chris', password='agenda')

With a Connection object, you can work with transactions, close the connection, and get a cursor.

A cursor is a Python object that enables you to work with the database. In database terms, the cursor is positioned at a particular location within a table or tables in the database, much like the cursor on your screen when you’re editing a document, which is positioned at a pixel location.

To get a cursor, you need to call the cursor method on the connection object:

cursor = conn.cursor()

Once you have a cursor, you can perform operations on the database, such as inserting records.

import os
import sqlite3

conn = sqlite3.connect('my_database')
cursor=conn.cursor()

# Create players
cursor.execute("""
insert into player(idnum,lastname,firstname,age,team,lefthanded,totalwar,earliestfreeagent)
values(100,"d'Arnaud",'Travis',26,18,'No',0.0,2020)""")

cursor.execute("""
insert into player(idnum,lastname,firstname,age,team,lefthanded,totalwar,earliestfreeagent)
values(101,'Duda','Lucas',29,18,'Yes',2.9,2018)""") 

cursor.execute("""
insert into player(idnum,lastname,firstname,age,team,lefthanded,totalwar,earliestfreeagent)
values(102,'Harper','Bryce',22,20,'Yes',9.6,2019)""")

# Create teams
cursor.execute("""
insert into team(teamid,teamname,teamballpark)
values(18,'New York Mets','Citi Field')""")

cursor.execute("""
insert into team(teamid,teamname,teamballpark)
values(20,'Washington Nationals','Nationals Park')""")

conn.commit()
cursor.close()
conn.close()

The first few lines of the script set up the database connection and create a cursor object:

import os
import sqlite3
conn = sqlite3.connect('my_database')
cursor=conn.cursor()

Note how we connect to an Sqlite database. To conect to a different database, replace this with your database-specific module, and modify the call to use the connect function from that database module, as needed.

The next several lines execute a number of SQL statements to insert rows into the two tables set up earlier: player and team. The execute method on the cursor object executes the SQL statement:

cursor.execute("""
insert into player(idnum,lastname,firstname,age,team,lefthanded,totalwar,earliestfreeagent)
values(100,"d'Arnaud",'Travis',26,18,'No',0.0,2020)""")

This example uses a triple-quoted string to cross a number of lines as needed. You will find that SQL commands, especially those embedded within Python scripts, are easier to understand if you can format the commands over a number of lines. This becomes helpful with more complex queries. Also, note that we used double quotes around the last name in this example (“d’Arnaud”) so we could include an apostrophe in the name (if we used single quotes, the apostrophe, identical to a single quote mark, would have denoted the end of the string).

To save your changes to the database, you must commit the transaction:

conn.commit()

Note that this method is called on the connection, not the cursor.

When you are done with the script, close the cursor and then the connection to free up resources. In short scripts like this, it may not seem important, but it helps the database program free its resources, as well as your Python script:

cursor.close()
conn.close()

You now have a small amount of sample data to work with while using other parts of the DB API.

External Links:

Database Programming at wiki.python.org

Database Programming at python.about.com

Databases at docs.python-guide.org

Python Database Programming: Part Five

Python database programming

Using Eclipse to create an Sqlite database.

In most cases when you are doing database programming, the database will be up and running. Your website hosting, for example, may have MySQL database access included. Your IT department may have standardized on a particular database, such as Oracle, DB/2, Sybase, or Informix.

Python Database Programming: Sqlite

But if you have no database yet, a good starting database is Sqlite. The main advantages of Sqlite are that it comes installed with Python and it is simple, yet functional. Using Sqlite is as simple as importing a module (sqlite3).

If you are working with another database, such as SQL Server, then the chances are good that a database has already been created. If not, follow the instructions from you database vendor.

import os
import sqlite3
conn = sqlite3.connect('my_database')
cursor=conn.cursor()
# Create tables
cursor.execute("""
create table player
    (idnum integer,
    lastname varchar,
    firstname varchar,
    age integer,
    team integer,
    lefthanded varchar,
    totalwar float(5,1),
    earliestfreeagent integer)
""")
cursor.execute("""
create table team
    (teamid integer,
    teamname varchar,
    teamballpark varchar)
""")
# Create indices
cursor.execute("""create index teamid on team (teamid)""")
cursor.execute("""create index idnum on player (idnum)""")
cursor.execute("""create index teamidfk on player(team)""")
conn.commit()
cursor.close()
conn.close()

This script should produce no output unless it raises an error. It uses the Sqlite API, which is somewhat similar to the DB API. After the import statements, the script creates a database connection with this statement:

conn = sqlite3.connect('my_database')

From there, the script gets a Cursor object. The Cursor object is used to create two tables (the Player and Team tables from the previous two articles) and define indexes on these tables.

The script then calls the commit method on the connection to save all the changes to disk. Sqlite stores all the data in the file my_database, which you should see in your Python directory.

Python’s support for relational databases started with ad hoc solutions, with one solution written to interface with each particular database, such as Oracle. Each database module created its own API, which was highly specific to that database because each database vendor evolved its own API based on its own needs. This can be difficult to support, because moving from one database to another requires that everything be rewritten and retested.

Over the years, Python has come to support a common database known as the DB API. Specific modules enable your Python scripts to communicate with different databases, such as DB/2, PostgreSQL, and so on. All of these modules support the common API, which makes your job a lot easier when you write scripts to access databases.

The DB API provides a minimal standard for working with databases, using Python structures and syntax wherever possible. This API includes the following:

  • Connections, which cover guidelines for how to connect to databases
  • Executing statements and stored procedures to query, update, insert and delete data with cursors
  • Transactions, with support for committing or rolling back a transaction
  • Examining metadata on the database module as well as on database and table structure
  • Defining the types of errors

In the next article, we will begin our look at the DB API.

External Links:

Database Programming at wiki.python.org

Database Programming at python.about.com

Databases at docs.python-guide.org

Python Database Programming: Part Two

Python database programming

Using the Eclipse IDE to access and modify a Python persistent dictionary.

In the previous article, we introduced Python database programming , the concept of persistent dictionaries, and different database modules such as dbm. In this article, we will put it all together and use the dbm module to create, access and modify a persistent dictionary.

All of the dbm modules support an open function to create a new dbm object. Once opened, you can store data in the dictionary, read data, close the dbm object as well as the associated data file/files, remove items and test for the existence of a key in the dictionary.

Python Database Programming: Creating a Persistent Dictionary

To open a dbm persistent dictionary, use the open function on the module you choose. For example, we can use this code to create a persistent dictionary with the dbm module:

import dbm

db = dbm.open('payroll', 'c')

# Add on item
db['Orioles'] = '118'
db['Yankees'] = '211'
db['Blue Jays'] = '120'

print(db['Orioles'])

# Close and save to disk
db.close()

When you run this script, you will see output like the following:

b'118'

This example, which creates a ‘payroll’ dictionary with three entries, uses the recommended dbm module. The open function requires the name of the dictionary to create. The name gets translated into the name of the data file or files that may already be on the disk. The dbm module may create more than one file (usually a file for the data and one for the index of the keys), but it does not always do this. The name of the dictionary is treated as a base file name, including the path. Usually, the underlying dbm library will append a suffix such as .dat for data. You can find the file yourself by looking for the file named payroll, most likely in your current working directory.

There is also an optional flag. The following table lists the available flags:

Flag Usage
C Opens the data file for reading and writing, creating the file if needed.
N Opens the file for reading and writing, but always creates a new empty file. If one already exists, it will be overwritten and its contents lost.
W Opens the file for reading and writing, but if the file doesn’t exist it will not be created.

You can also set another optional parameter, the mode. The mode holds a set of UNIX file permissions.

The above code is simple. First, we use the open method of the dbm module, which returns a new dbm object (db), which we can then use to store and retrieve data.

Once we open a persistent dictionary, we can write values as we normally would with Python dictionaries, as shown in this example:

db['Orioles'] = '118'

Both the key and value must be strings and cannot be other objects, like numbers or Python objects. But if you want to save an object, you can serialize it using the pickle module:

import pickle

data = {
        'Orioles' : ['118', 'Dan Duquette', 'Buck Showalter', 'Camden Yards'],
        'Yankees' : ['211', 'Brian Cashman', 'Joe Girardi', 'Yankee Stadium III'],
        'Blue Jays' : ['120', 'Alex Anthopoulos', 'John Gibbons', 'Rogers Centre']
        }

with open('data.pickle', 'wb') as f:
    pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)
    
with open('data.pickle', 'rb') as f:
    data = pickle.load(f)

Finally, the close method closes the file or files and saves the data to disk.

Python Database Programming: Accessing and Modifying the Persistent Database

With the dbm modules, you can treat the object you get back from the open function as a dictionary object. You can get and set values using code like the following:

db['key'] = 'value'
value = db['key']

Remember that the key and the value must both be text strings.

You can delete a value in the dictionary using del:

del db['key']

As with a normal dictionary, the keys method returns a list of all the keys:

for key in db.keys():
	# do something else

The keys method may take a long time to execute if there are a huge number of keys in the file. Also, this method may require a lot of memory to store the potentially large list that it would create with a large file.

Here’s a script we can use to access the persistent dictionary we created with the first script:

import dbm

# Open existing file
db = dbm.open('payroll', 'w')

# Add another item
db['Rays'] = '67'

# Verify the previous item remains
if db['Blue Jays'] != None:
    print('Found Blue Jays')
else:
    print('Error: Missing item')
    
# Iterate over the keys...may be slow
# May use a lot of memory
for key in db.keys():
    print('Key = ', key, ' value = ', db[key])
    
del db['Rays']
print('After deleting Rays, we have:')

for key in db.keys():
    print('Key = ', key, ' value = ', db[key])
    
# Close and save to disk
db.close()

When you run this script, you should see output similar to the following:

Found Blue Jays
Key =  b'Rays'  value =  b'67'
Key =  b'Orioles'  value =  b'118'
Key =  b'Yankees'  value =  b'211'
Key =  b'Blue Jays'  value =  b'120'

After deleting Rays, we have:

Key =  b'Orioles'  value =  b'118'
Key =  b'Yankees'  value =  b'211'
Key =  b'Blue Jays'  value =  b'120'

This script works with a small database of major league baseball teams and their payrolls (in millions of dollars). You need to run the first script in this article first. That example creates the dbm file and stores data in the file. This script then opens the preexisting dbm file.

The script opens the persistent dictionary payroll in read/write mode. The call to the open function will generate an error if the necessary data file or files do not exist on disk in the current directory.

From the previous example, there should be three values in the dictionary (the new script tests to see if one of them exists). This example adds the Tampa Bay Rays, with a payroll of $67 million, as another key.

The script verifies that the ‘Blue Jays’ key exists in the dictionary, using the following code:

if db['Blue Jays'] != None:
    print('Found Blue Jays')
else:
    print('Error: Missing item')

Next, the script prints out all of the keys and values in the dictionary:

for key in db.keys():
    print('Key = ', key, ' value = ', db[key])

Note that there should now be four entries.

After printing out all the entries, the script removes one using del:

del db['Rays']

The script then prints out all the keys and values again, which should result in three entries, as show in the output. Finally, the close method closes the dictionary, which involves saving all the changes to disk, so the next time the file is opened, it will be in the state we left it.

As you can see from these examples, the API for working with persistent dictionaries is very simple because it works with files and like dictionaries.

External Links:

Python Database Programming at wiki.python.org

Python Database Programming at python.about.com

Databases at docs.python-guide.org

Python Database Programming: Part One

Python databaseMost large enterprise-level systems use databases for storing data. In order for Python to be capable of handling these types of enterprise applications, the language must be able to access databases.

For Python database programming, Python provides a database Application Programming Interface (API) that enables you to access most databases regardless of the databases’ native API. Although minor differences exist between different implementations of databases, for the most part you can access databases such as Oracle or MySQL from your Python scripts without worrying too much about the details of the specific databases. There are two main database systems supported by Python: dbm persistent dictionaries and relational databases with the DB API. Moreover, you can use add-ons such as MySQL-python to make direct database queries from within your Python scripts.

Python Database Programming: Persistent Dictionaries

A persistent dictionary, as the name suggests, is a Python dictionary that can be saved to disk. You store name/value pairs in the dictionary, which is saved. Thus, if you save data to a dictionary that’s backed by a dbm, the next time you start your program, you can read the value stored under a given key again, once you’ve loaded the dbm file. The dictionaries work like normal Python dictionaries; you might recall that the syntax of a statement creating a dictionary looks something like this:

payroll = { ‘Orioles’: 118, ‘Yankees’: 211, ‘Blue Jays’: 120 }

With a persistent dictionary, the main difference is that the data is written to and read from disk. An additional difference is that the keys and the values must both be strings; therefore our above example would have to be rewritten:

payroll = { ‘Orioles’: ‘118’, ‘Yankees’: ‘211’, ‘Blue Jays’: ‘120’ }

Python Database Programming: Modules

Python supports a number of dbm modules for Python database programming. Each dbm module supports similar interface and uses a particular C library to store the data to disk. The difference is in the underlying binary format of the data files on disk.

DBM, short for database manager, acts as a generic name for a number of C language libraries originally created on UNIX systems. The names of these libraries (e.g. dbm, gdbm, etc.) correspond closely to the available modules that provide the needed functionality within Python.

Python supports a number of dbm modules, each of which supports a similar interface and uses a particular C library to store the data. The underlying binary format of each module is different. As a result, each dbm module creates incompatible files. If you create a dbm persistent dictionary with one dbm module, you must use the same module to read the data. None of the other modules will work with a data file created by another module.

Module Description
dbm Chooses the best dbm module
dbm.dumb Uses a simple, but portable, implementation of the dbm library
dbm.gnu Uses the GNU dbm library

Originally, this library was only available with the commercial versions of UNIX. This led to the creation of alternative libraries: e.g. the Berkeley UNIX library and GNU’s gdbm.

With all the incompatible file formations, all these libraries can be an issue. But by using the dbm module, you can sidestep this issue. The dbm module will choose the best implementation available on your system when creating a new persistent dictionary. When it reads a file, the dbm module uses the whichdb function to make an informed guess as to which library created the database. It is usually good practice to use the dbm module, unless you need to use a specific feature of one of the dbm libraries.

In the next article on Python database programming, we’ll start to cover the nuts and bolts of programming using the dbm module in Python.

External Links:

Python Database Programming at wiki.python.org

Python Database Programming at python.about.com

Databases at docs.python-guide.org