Python Iterators: Part Seven

Python iteratorsThe map, zip and filter built-in functions also became iterators in Python 3.0, as did range. This conserves space, as the result list does not have to be produced all at once in memory. All three process iterables and return iterable results in Python 3.0. Unlike range, however, they are their own iterators, and once you step through the results, they are exhausted and you cannot have multiple iterators on their results.

The basic form of the map function is:

map(function, iterable, ...)

The built-in map function returns an iterator that applies a function to every item of an iterable, yielding the resulting iterator. If additional arguments are passed, the function specified as the first argument must take that many arguments and apply them to the items from all iterables in parallel. With multiple iterables, the iterator stops when the shortest iterable is exhausted. For example:

>>> mylist = map(abs, (1, 2, -3))
>>> list(mylist)
[1, 2, 3]
>>> mylist

 <map object at 0x0000000004069D30>

>>> next(mylist)
1
>>> next(mylist)
2
>>> next(mylist)
3
>>> next(mylist)
Traceback (most recent call last):
  File "<pyshell#9>", line 1, in 
    next(mylist)
StopIteration

>>> for x in mylist:
	print(x)

In our for loop, the map iterator is empty, so no output is produced. To do that, we need to create a new iterator:

>>> mylist = map(abs, (-4, 5, -6))
>>> for x in mylist:
	print(x)
	
4
5
6
>>> list(map(abs, (-4, 5, -6)))
[4, 5, 6]

The zip built-in function, introduced in the previous article, returns iterators that work the same way:

>>> myzip = zip((1, 2, 3), (4, 5, 6))
>>> myzip


>>> list(myzip)
[(1, 4), (2, 5), (3, 6)]
>>> for pair in myzip:
	print(pair)  	# Exhausted after one pass

>>> myzip = zip((1, 2, 3), (4, 5, 6))
>>> for pair in myzip:
	print(pair)

(1, 4)
(2, 5)
(3, 6)

>>> myzip = zip((1, 2, 3), (4, 5, 6))
>>> next(myzip)
(1, 4)
>>> next(myzip)
(2, 5)

The filter built-in function also behaves in a similar way. It returns items in an iterable for which a passed-in function returns True:

>>> filter(bool, ['not null', '', 'abc', 1, 0, -1])

['not null', 'abc', 1, -1]

Like map and zip, filter both accepts an iterable to process and returns an iterable to generate results in Python 3.0.

External Links:

The map built-in function at docs.python.org
Python Iterators at Python Wiki
Python Iterator tutorial at bogotobogo.com

Python Iterators: Part Five

Python iteratorAnother common use for list comprehensions is with files. A file object has a readlines method that loads the file into a list of line strings all at once; e.g.:

>>> fp = open('simple.py')
>>> lines = fp.readlines()
>>> lines
['a = [1, 2, 3, 4]\n', 'for x in a:\n', 'print(x)\n', 'print(x*2)\n','\n']

This will work, but the lines in the result all include the newline character. Perhaps we can use list comprehensions to get rid of all the newlines.

One way we could accomplish this is by running each line in the list through the string rstrip method to remove whitespaces on the right side. So now we have:

>>> lines = [line.rstrip() for line in lines]
>>> lines
['a = [1, 2, 3, 4]', 'for x in a:', 'print(x)', 'print(x*2)','']

This works as we expected. Because the list comprehensions are an iteration conect just like the for loop statements, however, we do not have to open the file ahead of time. If we open the file inside the expression, the list comprehension will automatically use the iteration protocol used earlier. It will read one line from the file at a time by calling the file’s next method, run the line through the rstrip expression, and add it to the result list. The code now looks like this:

>>> lines = [line.rstrip() for line in open('simple.py')]
>>> lines
['a = [1, 2, 3, 4]', 'for x in a:', 'print(x)', 'print(x*2)','']

Again, the code does what we want it to do. This expression does a lot implicitly, although we are getting a lot of work done for free. Python scans the file and builds a list of operation results automatically. It is also an efficient way to write the code: because most of the work is done inside the Python interpreter, it is likely much faster than the equivalent for statement. Especially for large files, the speed advantages of list comprehensions can be considerable.

List comprehensions are also expressive. For example, we can run any string operation on a file’s lines as we iterate. For example, we can use a list comprehension to convert the contents of the file to uppercase:

>>> [line.upper() for line in open('simple.py')]
['A = [1, 2, 3, 4]\n', 'FOR X IN A:\n', '\tPRINT(X)\n', '\tPRINT(X*2)\n', '\n']

If we also want to strip out the newline, we can do that too:

>>> [line.rstrip().upper() for line in open('script1.py')]
['A = [1, 2, 3, 4]', 'FOR X IN A:', '\tPRINT(X)', '\tPRINT(X*2)', '']

Beyond this complexity level, however, list comprehension expressions can become too compact for their own good. They are generally intended for simple types of iterations; for more involved programming statements, a simple for statement structure will likely be easier to understand and modify in the future.

External Links:

Python Iterators at Python Wiki

Python Iterator tutorial at bogotobogo.com

Python Iterators: Part Four

Python iteratorsTo see a practical application of the iteration protocol, let’s consider the case of the list comprehension. You can use range to change a list as you step across it; e.g.:

>>> L = [1, 2, 3, 4, 5]
>>> for i in range(len(L)):
	L[i] += 1
>>> L
[2, 3, 4, 5, 6]

This works, but there is an easier way. We can replace the loop with a single expression that produces the desired list:

>>> L = [x + 1 for x in L]
>>> L
[2, 3, 4, 5, 6]

The final result is the same, but it requires less coding on our part and is substantially faster to boot. The list comprehension is not exactly the same as the for loop statement version in that it makes a new list object.

The list comprehension’s syntax is derived from a construct in set theory notation that applies an operation to each item in set. Alternatively, you can think for it as a backwards for loop.

List comprehensions are written in square brackets because they are ultimately a way to construct a new list. They begin with an arbitrary expression that we make up, which uses a loop variable that we make up (x + 1). That is followed by what you should recognize as the head of the for loop, which names the loop variable, and an iterable object (for x in L).

To run the expression, Python executes an iteration across L inside the interpreter, assigning x to each item in turn, and collects the results of running the items through the expression on the left side. The result list we get back is exactly what the last comprehension says: a new list containing x + 1 for every x in L.

Technically, we do not need list comprehensions, because we can always build up a list of expression results manually with for loops. List comprehensions, however, are more concise to write, and because this code pattern of building up result lists is so common in Python work, they turn out to be very handy in many contexts. Moreover, list comprehensions can run much faster than manual for loop statements because their iterations are performed in C language speed inside the interpreter (rather than with manual Python code). Especially for larger data sets, there is a large performance advantage.

External Links:

Python Iterators at Python Wiki

Python Iterator tutorial at bogotobogo.com