Python Database Programming: Part Four

Python Database ProgrammingIn the previous article, we introduced the concept of relational databases. In this article, we introduce the Structured Query Language, or SQL.

Python Database Programming: SQL

SQL (pronounced “sequel” or “S-Q-L”) defines a standard language for querying and modifying databases. SQL supports the following basic operations:

Operation Usage
Select Perform a query to search the database for specific data.
Update Modify a row or rows, usually based on a certain condition.
Insert Create new rows in the database.
Delete Remove a row or rows from the database

SQL offers more than these basic operations, but at least initially, these are the operations you will be using in writing Python applications: Query, Update, Insert and Delete (QUID), or Create, Read, Update and Delete (CRUD).

If you are not familiar with SQL, you’re going to want a good SQL book. O’Reilly has a useful book on SQL which can be downloaded here, and if you’re looking for a user-friendly SQL book, one is available here.

SQL is important because when you access databases with the Python DB API, you must first create SQL statements and then execute these statements by having the database evaluate them. Thus you will be using Python statements to execute SQL statements.

The basic SQL syntax for the operations mentioned above are:

SELECT columns FROM tables WHERE condition ORDER by columns ascending_or_descending

UPDATE table SET new values WHERE condition

INSERT INTO table (columns) VALUES (values)

DELETE FROM table WHERE condition

This covers the basic syntax, but there are many more optional parameters and specifiers available. You can use these options with Python’s DB API.

To insert a new row in the player table from the previous article, you can use an SQL query like this one (even though we are adding data and not getting data, the convention is that all SQL commands or statements are called queries):

insert into player (idnum, lastname, firstname, age, team, left-handed, totalwar, earliestfreeagent) values (103, ‘Murphy’, ‘Daniel’, 30, 18, ‘Yes’, 10.9, 2016)

In this example, the first tuple holds the names of the columns in the order you are using for inserting your data. The second tuple, after the keyword values, holds the data in the same order. Notice how SQL uses single quotes to delimit strings, and no quotes around numbers.

With queries, you can use wildcards such as * to say that you want an operation to be performed using all of the columns in a table. For example, to query all of the rows in the team table, showing all of the columns for each row, you can use a query like this:

select * from team

Note that SQL is not case-sensitive for its keywords, such as SELECT and FROM. But some databases require table and column names to be all uppercase. It is thus common to see people use SELECT and FROM and other operations in all capital letters to make them more easily distinguished from other parts of the query.

This SQL statement omits the names of the columns to read and any conditions that would otherwise narrow down the data that would be returned. Thus the query will return all of the columns (from the *) and all of the rows (because there is no where clause).

You can also perform a join with the select command, to query data from more than one table, but present it all in a single response. For example, to extract the team name with each player, you could perform a query like the following:

select player.firstname, player.lastname, team.name from player, team
where player.team = team.teamid order by lastname desc

In this example, the select statement requests two columns from the player table (the firstname and the lastname, but these are specified as coming from player by the convention of specifying the table name and the column name in the table) and one from the team table (team.name). The order by section of the statement tells the database to order the results by the value in the lastname column, in descending order.

To simplify these queries, you can use aliases for the table names, which make them easier to type and read. For example, to use the alias p with the player table, you can start a query like this:

select p.firstname, p.lastname from player p

In this case, you must place the alias, p, after the table name in the from clause. You can also use the following format with the optional keyword as, which could be easier for you to read:

select p.firstname, p.lastname from player as p

To modify or update a row, use a SQL statement like the following:

update player set age=27 where idnum=100

This example modifies the player with an idnum of 100 by setting that player’s age to 27. As with other queries, numbers do not need to have quotes around them; however, strings would need to be quoted with single quotes.

To delete a row, use an SQL statement like the following:

delete player where idnum=101

This example deletes the player with an idnum of 101, but doesn’t affect anything else in the database.

External Links:

Database Programming at wiki.python.org

Database Programming at python.about.com

Databases at docs.python-guide.org

Python Database Programming: Part Three

Python Database ProgrammingIn the previous article, we showed how to create, access and modify a persistent dictionary in Python using the dbm module. In this article, we will consider using Python to create, access and modify a relational database.

The dbm modules work well when your data needs to be stored as key/value pairs. You can store more complicated data within key/value pairs with some imagination. For example, you can create formatted strings that use a comma or some other character to delimit items in the strings. This, however, can be difficult to maintain, and it can restrict you because now your data is stored in an inflexible manner. In addition, some dbm libraries limit the amount of space you can use for the values – sometimes to a maximum of 1024 bytes.

The upshot of all this is that if your data needs are simple and you only plan to store a small amount of data, you should use a dbm persistent dictionary. If, on the other hand, you require support for transactions and if you require complex data structures or multiple tables of linked data, you should use a relational database. If you use relational databases, you will also find that they provide a far richer and more complex API than the simple dbm modules.

Python Database Programming: Introducing Relational Databases

In a relational database, data is stored in tables that can be viewed as two-dimensional data structures. The columns, or vertical part of the two-dimensional matrix, are all of the same type of data (e.g. strings, numbers, dats, etc.). Each horizontal component of the table is made up of rows, also called records. Each row is made up of columns. Typically, each record holds the information pertaining to one item.

idnum last name first name age Team left-handed total war earliest free agency
100 d’Arnaud Travis 26 18 No 0.0 2020
101 Duda Lucas 29 18 Yes 2.9 2018
102 Harper Bryce 22 20 Yes 9.6 2019

This table holds seven columns about baseball players:

  • idnum: The player’s ID number. Relational databases make extensive use of ID numbers where the database manages the assignment of unique numbers so that each row can be referenced with these numbers to make each row unique, even if they have identical data. We can then refer to the player by the ID number. The ID number alone provides enough information to look up the employee.
  • lastname: Holds the person’s last name.
  • firstname: Holds the player’s first name.
  • age: Holds the player’s age.
  • team: Holds ID of the player’s team.
  • left-handed: Holds whether the player is left-handed.
  • total war: Holds the player’s total WAR (Wins Above Replacement).
  • earliest free agent: Holds the earliest year the player will be eligible for free agency.

In this example, the column idnum, the ID number, would be used as the primary key. A primary key is a unique index for a table, where each element has to be unique because the database will use that element as the key to the given row and as a way to refer to the data in that row, in a manner similar to dictionary keys and values in Python. Thus, each player needs to have a unique ID number, and once we have an ID number, we can look up any player. Therefore it makes sense to make idnum the key.

The team column holds the ID of a team – that is, an ID of a row in another table. This ID could be considered a foreign key, because the ID acts as a key into another table.

For example, here is a possible layout of the teams table:

team id name ballpark
18 New York Mets Citi Field
20 Washington Nationals Nationals Park

In these examples, Travis d’Arnaud and Lucas Duda play for team 18, the New York Mets. Bryce Harper plays for team 20, the Washington Nationals.

In a large enterprise, there may be hundreds of tables in the database with thousands (or even millions) of records. In the next article, we will cover how to make SQL queries with Python.

External Links:

Python Database Programming at wiki.python.org

Python Database Programming at python.about.com

Databases at docs.python-guide.org

Python Database Programming: Part One

Python databaseMost large enterprise-level systems use databases for storing data. In order for Python to be capable of handling these types of enterprise applications, the language must be able to access databases.

For Python database programming, Python provides a database Application Programming Interface (API) that enables you to access most databases regardless of the databases’ native API. Although minor differences exist between different implementations of databases, for the most part you can access databases such as Oracle or MySQL from your Python scripts without worrying too much about the details of the specific databases. There are two main database systems supported by Python: dbm persistent dictionaries and relational databases with the DB API. Moreover, you can use add-ons such as MySQL-python to make direct database queries from within your Python scripts.

Python Database Programming: Persistent Dictionaries

A persistent dictionary, as the name suggests, is a Python dictionary that can be saved to disk. You store name/value pairs in the dictionary, which is saved. Thus, if you save data to a dictionary that’s backed by a dbm, the next time you start your program, you can read the value stored under a given key again, once you’ve loaded the dbm file. The dictionaries work like normal Python dictionaries; you might recall that the syntax of a statement creating a dictionary looks something like this:

payroll = { ‘Orioles’: 118, ‘Yankees’: 211, ‘Blue Jays’: 120 }

With a persistent dictionary, the main difference is that the data is written to and read from disk. An additional difference is that the keys and the values must both be strings; therefore our above example would have to be rewritten:

payroll = { ‘Orioles’: ‘118’, ‘Yankees’: ‘211’, ‘Blue Jays’: ‘120’ }

Python Database Programming: Modules

Python supports a number of dbm modules for Python database programming. Each dbm module supports similar interface and uses a particular C library to store the data to disk. The difference is in the underlying binary format of the data files on disk.

DBM, short for database manager, acts as a generic name for a number of C language libraries originally created on UNIX systems. The names of these libraries (e.g. dbm, gdbm, etc.) correspond closely to the available modules that provide the needed functionality within Python.

Python supports a number of dbm modules, each of which supports a similar interface and uses a particular C library to store the data. The underlying binary format of each module is different. As a result, each dbm module creates incompatible files. If you create a dbm persistent dictionary with one dbm module, you must use the same module to read the data. None of the other modules will work with a data file created by another module.

Module Description
dbm Chooses the best dbm module
dbm.dumb Uses a simple, but portable, implementation of the dbm library
dbm.gnu Uses the GNU dbm library

Originally, this library was only available with the commercial versions of UNIX. This led to the creation of alternative libraries: e.g. the Berkeley UNIX library and GNU’s gdbm.

With all the incompatible file formations, all these libraries can be an issue. But by using the dbm module, you can sidestep this issue. The dbm module will choose the best implementation available on your system when creating a new persistent dictionary. When it reads a file, the dbm module uses the whichdb function to make an informed guess as to which library created the database. It is usually good practice to use the dbm module, unless you need to use a specific feature of one of the dbm libraries.

In the next article on Python database programming, we’ll start to cover the nuts and bolts of programming using the dbm module in Python.

External Links:

Python Database Programming at wiki.python.org

Python Database Programming at python.about.com

Databases at docs.python-guide.org