In the previous article, we showed how to create, access and modify a persistent dictionary in Python using the dbm module. In this article, we will consider using Python to create, access and modify a relational database.
The dbm modules work well when your data needs to be stored as key/value pairs. You can store more complicated data within key/value pairs with some imagination. For example, you can create formatted strings that use a comma or some other character to delimit items in the strings. This, however, can be difficult to maintain, and it can restrict you because now your data is stored in an inflexible manner. In addition, some dbm libraries limit the amount of space you can use for the values – sometimes to a maximum of 1024 bytes.
The upshot of all this is that if your data needs are simple and you only plan to store a small amount of data, you should use a dbm persistent dictionary. If, on the other hand, you require support for transactions and if you require complex data structures or multiple tables of linked data, you should use a relational database. If you use relational databases, you will also find that they provide a far richer and more complex API than the simple dbm modules.
Python Database Programming: Introducing Relational Databases
In a relational database, data is stored in tables that can be viewed as two-dimensional data structures. The columns, or vertical part of the two-dimensional matrix, are all of the same type of data (e.g. strings, numbers, dats, etc.). Each horizontal component of the table is made up of rows, also called records. Each row is made up of columns. Typically, each record holds the information pertaining to one item.
idnum | last name | first name | age | Team | left-handed | total war | earliest free agency |
---|---|---|---|---|---|---|---|
100 | d’Arnaud | Travis | 26 | 18 | No | 0.0 | 2020 |
101 | Duda | Lucas | 29 | 18 | Yes | 2.9 | 2018 |
102 | Harper | Bryce | 22 | 20 | Yes | 9.6 | 2019 |
This table holds seven columns about baseball players:
- idnum: The player’s ID number. Relational databases make extensive use of ID numbers where the database manages the assignment of unique numbers so that each row can be referenced with these numbers to make each row unique, even if they have identical data. We can then refer to the player by the ID number. The ID number alone provides enough information to look up the employee.
- lastname: Holds the person’s last name.
- firstname: Holds the player’s first name.
- age: Holds the player’s age.
- team: Holds ID of the player’s team.
- left-handed: Holds whether the player is left-handed.
- total war: Holds the player’s total WAR (Wins Above Replacement).
- earliest free agent: Holds the earliest year the player will be eligible for free agency.
In this example, the column idnum, the ID number, would be used as the primary key. A primary key is a unique index for a table, where each element has to be unique because the database will use that element as the key to the given row and as a way to refer to the data in that row, in a manner similar to dictionary keys and values in Python. Thus, each player needs to have a unique ID number, and once we have an ID number, we can look up any player. Therefore it makes sense to make idnum the key.
The team column holds the ID of a team – that is, an ID of a row in another table. This ID could be considered a foreign key, because the ID acts as a key into another table.
For example, here is a possible layout of the teams table:
team id | name | ballpark |
---|---|---|
18 | New York Mets | Citi Field |
20 | Washington Nationals | Nationals Park |
In these examples, Travis d’Arnaud and Lucas Duda play for team 18, the New York Mets. Bryce Harper plays for team 20, the Washington Nationals.
In a large enterprise, there may be hundreds of tables in the database with thousands (or even millions) of records. In the next article, we will cover how to make SQL queries with Python.
External Links:
Python Database Programming at wiki.python.org
Recent Comments