Python Iterators: Part Five

Python iteratorAnother common use for list comprehensions is with files. A file object has a readlines method that loads the file into a list of line strings all at once; e.g.:

>>> fp = open('simple.py')
>>> lines = fp.readlines()
>>> lines
['a = [1, 2, 3, 4]\n', 'for x in a:\n', 'print(x)\n', 'print(x*2)\n','\n']

This will work, but the lines in the result all include the newline character. Perhaps we can use list comprehensions to get rid of all the newlines.

One way we could accomplish this is by running each line in the list through the string rstrip method to remove whitespaces on the right side. So now we have:

>>> lines = [line.rstrip() for line in lines]
>>> lines
['a = [1, 2, 3, 4]', 'for x in a:', 'print(x)', 'print(x*2)','']

This works as we expected. Because the list comprehensions are an iteration conect just like the for loop statements, however, we do not have to open the file ahead of time. If we open the file inside the expression, the list comprehension will automatically use the iteration protocol used earlier. It will read one line from the file at a time by calling the file’s next method, run the line through the rstrip expression, and add it to the result list. The code now looks like this:

>>> lines = [line.rstrip() for line in open('simple.py')]
>>> lines
['a = [1, 2, 3, 4]', 'for x in a:', 'print(x)', 'print(x*2)','']

Again, the code does what we want it to do. This expression does a lot implicitly, although we are getting a lot of work done for free. Python scans the file and builds a list of operation results automatically. It is also an efficient way to write the code: because most of the work is done inside the Python interpreter, it is likely much faster than the equivalent for statement. Especially for large files, the speed advantages of list comprehensions can be considerable.

List comprehensions are also expressive. For example, we can run any string operation on a file’s lines as we iterate. For example, we can use a list comprehension to convert the contents of the file to uppercase:

>>> [line.upper() for line in open('simple.py')]
['A = [1, 2, 3, 4]\n', 'FOR X IN A:\n', '\tPRINT(X)\n', '\tPRINT(X*2)\n', '\n']

If we also want to strip out the newline, we can do that too:

>>> [line.rstrip().upper() for line in open('script1.py')]
['A = [1, 2, 3, 4]', 'FOR X IN A:', '\tPRINT(X)', '\tPRINT(X*2)', '']

Beyond this complexity level, however, list comprehension expressions can become too compact for their own good. They are generally intended for simple types of iterations; for more involved programming statements, a simple for statement structure will likely be easier to understand and modify in the future.

External Links:

Python Iterators at Python Wiki

Python Iterator tutorial at bogotobogo.com

Python Iterators: Part Three

Python iteratorsBesides files and physical sequences such as lists, other types also have useful iterators. In older versions of Python, for example, one would step through the keys of a dictionary by requesting the keys list explicitly:

>>> K = {'alpha':1, 'bravo':2, 'charlie':3}
>>> for key in K.keys():
	print(key,K[key])

alpha 1
bravo 2
charlie 3

In more recent versions of Python, however, dictionaries have an iterator that automatically, returns one key at a time in an iteration context:

>>> I = iter(K)
>>> next(I)
'alpha'
>>> next(I)
'bravo'
>>> next(I)
'charlie'
>>> next(I)
Traceback (most recent call last):
  File "<pyshell#88>", line 1, in 
    next(I)
StopIteration

The effect here as that we no longer need to call the keys method to step through dictionary keys. The for loop will use the iteration protocol to grab one key each time through.

Other Python object types also support the iterator protocol and therefore may be used in for loops as well. For example, a shelf is a persistent, dictionary-like object in which the values can be arbitrary Python objects, such as class instances, recursive data types, and objects. Shelves support the iterator protocol. So does os.popen, a tool for reading the output of shell commands:

>>> import os
>>> I = os.popen('dir')
>>> I.__next__()
' Volume in drive C has no label.\n'
>>> I.__next__()
' Volume Serial Number is 9664-E470\n'
>>> next(I)
Traceback (most recent call last):
  File "<pyshell#93>", line 1, in 
    next(I)
TypeError: '_wrap_close' object is not an iterator

Note that the popen objects support a P.next() method in Python 2.6. In 3.0 and later, they support the P.__next__() method, but not the next(P) built-in. It is not clear if this behavior will continue in future releases, but as of Python 3.4.1, it is still the case. This is only an issue for manual iteration; if you iterate over these objects automatically with for loops and other iteration contexts, they return successive lines in either Python version.

The iteration protocol is also the reason we have had to wrap some results in a list call to see their values all at once. Objects that are iterable return results one at a time, not in a physical list:

>>> RG = range(5)
>>> RG 
range(0, 5)
>>> I = iter(RG)
>>> next(I)
0
>>> next(I)
1
>>> list(range(5))
[0, 1, 2, 3, 4]

Now that you have a better understanding of this protocol, you should be able to see how it explains why the enumerate tool introduced in the prior chapter works the way it does:

>>> EN = enumerate('quick')
>>> EN

>>> I = iter(EN)
>>> next(I)
(0, 'q')
>>> next(I)
(1, 'u')
>>> list(enumerate('quick'))
[(0, 'q'), (1, 'u'), (2, 'i'), (3, 'c'), (4, 'k')]

We don’t normally see what is going on under the hood because for loops run it for us automatically to step through results. In face, everything that scans left-to-right in Python employes the iteration protocol.

External Links:

Python Iterators at Python Wiki

Python Iterator tutorial at bogotobogo.com