Guides | PythonScript HQ

Filed Under: Guides, Tutorials Tagged With: file, iterator, list comprehension, Python, Python iterators, rstrip, uppercase

Python Iterators: Part Four

February 20, 2015 by maximumdx Leave a Comment

To see a practical application of the iteration protocol, let’s consider the case of the list comprehension. You can use range to change a list as you step across it; e.g.:

>>> L = [1, 2, 3, 4, 5]
>>> for i in range(len(L)):
	L[i] += 1
>>> L
[2, 3, 4, 5, 6]

This works, but there is an easier way. We can replace the loop with a single expression that produces the desired list:

>>> L = [x + 1 for x in L]
>>> L
[2, 3, 4, 5, 6]

The final result is the same, but it requires less coding on our part and is substantially faster to boot. The list comprehension is not exactly the same as the for loop statement version in that it makes a new list object.

The list comprehension’s syntax is derived from a construct in set theory notation that applies an operation to each item in set. Alternatively, you can think for it as a backwards for loop.

List comprehensions are written in square brackets because they are ultimately a way to construct a new list. They begin with an arbitrary expression that we make up, which uses a loop variable that we make up (x + 1). That is followed by what you should recognize as the head of the for loop, which names the loop variable, and an iterable object (for x in L).

To run the expression, Python executes an iteration across L inside the interpreter, assigning x to each item in turn, and collects the results of running the items through the expression on the left side. The result list we get back is exactly what the last comprehension says: a new list containing x + 1 for every x in L.

Technically, we do not need list comprehensions, because we can always build up a list of expression results manually with for loops. List comprehensions, however, are more concise to write, and because this code pattern of building up result lists is so common in Python work, they turn out to be very handy in many contexts. Moreover, list comprehensions can run much faster than manual for loop statements because their iterations are performed in C language speed inside the interpreter (rather than with manual Python code). Especially for larger data sets, there is a large performance advantage.

External Links:

Filed Under: Guides, Tutorials Tagged With: iterators, list comprehensions, Python, Python iterators

Python Iterators: Part Three

February 17, 2015 by maximumdx Leave a Comment

Besides files and physical sequences such as lists, other types also have useful iterators. In older versions of Python, for example, one would step through the keys of a dictionary by requesting the keys list explicitly:

>>> K = {'alpha':1, 'bravo':2, 'charlie':3}
>>> for key in K.keys():
	print(key,K[key])

alpha 1
bravo 2
charlie 3

In more recent versions of Python, however, dictionaries have an iterator that automatically, returns one key at a time in an iteration context:

>>> I = iter(K)
>>> next(I)
'alpha'
>>> next(I)
'bravo'
>>> next(I)
'charlie'
>>> next(I)
Traceback (most recent call last):
  File "<pyshell#88>", line 1, in 
    next(I)
StopIteration

The effect here as that we no longer need to call the keys method to step through dictionary keys. The for loop will use the iteration protocol to grab one key each time through.

Other Python object types also support the iterator protocol and therefore may be used in for loops as well. For example, a shelf is a persistent, dictionary-like object in which the values can be arbitrary Python objects, such as class instances, recursive data types, and objects. Shelves support the iterator protocol. So does os.popen, a tool for reading the output of shell commands:

>>> import os
>>> I = os.popen('dir')
>>> I.__next__()
' Volume in drive C has no label.\n'
>>> I.__next__()
' Volume Serial Number is 9664-E470\n'
>>> next(I)
Traceback (most recent call last):
  File "<pyshell#93>", line 1, in 
    next(I)
TypeError: '_wrap_close' object is not an iterator

Note that the popen objects support a P.next() method in Python 2.6. In 3.0 and later, they support the P.__next__() method, but not the next(P) built-in. It is not clear if this behavior will continue in future releases, but as of Python 3.4.1, it is still the case. This is only an issue for manual iteration; if you iterate over these objects automatically with for loops and other iteration contexts, they return successive lines in either Python version.

The iteration protocol is also the reason we have had to wrap some results in a list call to see their values all at once. Objects that are iterable return results one at a time, not in a physical list:

>>> RG = range(5)
>>> RG 
range(0, 5)
>>> I = iter(RG)
>>> next(I)
0
>>> next(I)
1
>>> list(range(5))
[0, 1, 2, 3, 4]

Now that you have a better understanding of this protocol, you should be able to see how it explains why the enumerate tool introduced in the prior chapter works the way it does:

>>> EN = enumerate('quick')
>>> EN

>>> I = iter(EN)
>>> next(I)
(0, 'q')
>>> next(I)
(1, 'u')
>>> list(enumerate('quick'))
[(0, 'q'), (1, 'u'), (2, 'i'), (3, 'c'), (4, 'k')]

We don’t normally see what is going on under the hood because for loops run it for us automatically to step through results. In face, everything that scans left-to-right in Python employes the iteration protocol.

External Links:

Filed Under: Guides, Tutorials Tagged With: enumerate, file, iter, iterator, list, popen, Python, Python iterator

Python Iterators: Part Two (The Next Function)

February 13, 2015 by maximumdx Leave a Comment

In the first article in this series, we introduced Python iterators and how they can be used to streamline Python code. In this article, we will continue our look at iterators, beginning with the next function.

To support manual iteration code, Python 3.0 also provides a built-in function, next, that automatically calls an object’s __next__ method. Given an iterable object X, the call next(X) is the same as X.__next__(). With files, for example, either form could be used:

>>> f = open('simple.py')
>>> f.__next__()

>>> f = open('simple.py')
>>> next(f)

Technically, there is one more piece to the iteration protocol. When the for loop begins, it obtains an iterator from the iterable object by passing it to the iter built-in function; the object returned by iter has the required next method. We can illustrate this with the following code:

>>> LS = [1, 2, 3, 4, 5]
>>> myIter = iter(LS)
>>> myIter.next()
1
>>> myIter.next()
2

This initial step is not required for files, because a file object is its own iterator: files have their own __next__ method and so do not need to return a different object that does.

Lists and many other built-in object, are not their own iterators because they support multiple open operations. For such objects, we must call iter to start iterating. For example:

>>> LS = [1, 2, 3, 4, 5]
>>> iter(LS) is LS
False
>>> LS.__next__()
Traceback (most recent call last):
  File "<pyshell#50>", line 1, in 
    LS.__next__()
AttributeError: 'list' object has no attribute '__next__'

>>> myIter = iter(LS)
>>> myIter.__next__()
1
>>> next(myIter)
2

Although Python iteration tools call these functions automatically, we can use them to apply the iteration protocol manually, too. The following demonstrates the equivalence between automatic and manual iteration:

>>> LS = [1, 2, 3, 4, 5]
>>> for X in LS:
	print(X ** 2, end=' ')

1 4 9 16 25

>>> myIter = iter(LS)
>>> while True:
	try:
		X = next(I)
	except StopIteration:
		break
	print(X ** 2, end=' ')

1 4 9 16 25

To understand this code, you need to know that try statements run an action and catch exceptions (we covered that in the series of articles on exceptions). Also, for loops and other iteration contexts can sometimes work differently for user-defined classes, repeatedly indexing an object instead of running the iteration protocol.

External Links:

Filed Under: Guides, Tutorials Tagged With: iter, iterators, list, next function, Python

Python Iterators: Part One

February 9, 2015 by maximumdx Leave a Comment

Now that we have completed our look at strings, we will begin to look at another useful object: Python iterators.

The for loop is often used when we have to iterate over a series of numbers; for example:

>>> for x in range(1,5):
	print(x)

1
2
3
4

The for loop can also work on any sequence type in Python, including lists, tuples and strings. For example:

>>> for x in [1, 2, 3, 4]:
	print(x)

1
2
3
4

>>> for x in 'quick':
	print(x * 2)

qq
uu
ii
cc
kk

Actually, the for loop turns out to be even more generic than this: it works on any iterable object. In fact, this is true of all iteration tools that scan objects from left to right in Python, including for loops, list comprehensions, membership tests, the map built-in function, and more.

Introduction to Python Iterators

The concept of iterable objects is relatively recent in Python, but it has come to permeate the language’s design.

We can get a better understanding of Python iterators and how they have become pervasive in the language’s design by looking at a built-in type such as the file. Open file objects have a method called readline, which reads one line of text from a file at a time. Each time we call the readline method, we advance to the next line. At the send of the file, an empty string is returned, which we can detect to break out of the loop:

>>> f = open('simple.py')
>>> f.readline()
'a = [1, 2, 3, 4]\n'
>>> f.readline()
'for x in a:\n'
>>> f.readline()
'	print(x)\n'
>>> f.readline()
'	print(x*2)\n'
>>> f.readline()
''

However, files also have a method name __next__ that has a nearly identical effect: it returns the next line from a file each time it is called. The only noticeable difference is that __next__ raises a built-in StopIteration exception at the end of file instead of returning an empty string:

>>> f = open('simple.py')
>>> f.__next__()
'a = [1, 2, 3, 4,]\n'
>>> f.__next__()
'for x in a:\n'
>>> f.__next__()
'\tprint(x)\n'
>>> f.__next__()
'\tprint(x*2)\n'
>>> f.__next__()
Traceback (most recent call last):
  File "<pyshell#31>", line 1, in 
    f.__next__()
StopIteration

This interface is exactly what we call the iteration protocol in Python. Any object with a __next__ method to advance to a next result, which raises StopIteration at the end of the series of results, is considered a Python iterator. Any such object may also be stepped through with a for loop or other iteration tool, because all Python iterators tools normally work internally by calling __next__ on each iteration and catching the StopIteration exception to determine when to exit.

Thus, the best way to reach a text file line by line is to allow the for loop to automatically call __next__ to advance to the next line on each iteration. The file object’s Python iterator will do the work of automatically loading lines as you go. For example, if we want to read in simple.py line by line, we can code it as follows:

>>> for line in open('simple.py'):
	print(line, end='')

a = [1, 2, 3, 4]
for x in a:
	print(x)
	print(x*2)

Almost as easily, we can iterate through the file and print it out in uppercase:

>>> for line in open('simple.py'):
	print(line.upper(), end='')

A = [1, 2, 3, 4]
FOR X IN A:
	PRINT(X)
	PRINT(X*2)

Notice that the print uses end=” here to suppress adding a \n, because line strings already have one. This is considered the best way to read text files line by line today for several reasons: it is the simplest to code, it might be the quickest to run, and is the best in terms of memory and usage. The older, original way to achieve the same effect with a for loop was to call the file readlines method to load the file’s content into memory as a list of line strings:

>>> for line in open('simple.py').readlines():
	print(line.upper(), end='')

This readlines technique still works, but it is not considered the best practice today and performs poorly in terms of memory usage. In fact, because this version really does load the entire file into memory all at once, it will not even work for files too big to fit into the memoery space available on your computer. Rather, because it reads one line at a time, the Python iterator-based version is not prone to such memory issues.

The iterator version might run quicker as well, though it can vary by release. It is also possible to read a file line by line with a while loop:

>>> f = open('simple.py')
>>> while True:
	line = f.readline()
	if not line: break
	print(line.upper(), end='')

	
A = [1, 2, 3, 4]
FOR X IN A:
	PRINT(X)
	PRINT(X*2)

However, this may run slower than the iterator-based for loop version, because Python iterators run at C language speed inside Python, whereas the while loop version run Python byte code through the Python virtual machine. Any time you swap Python code, for C code, speed tends to increase, though not in all cases.

External Links:

Filed Under: Guides, Tutorials

Python Strings: Part Six

February 6, 2015 by maximumdx Leave a Comment

In the previous article, we began our look at indexing and slicing. In this article, we will continue our look at slicing and show some practical applications of slicing.

In Python 2.3 and later, there is support for a third index, used as a step. The step is added to the index of each item extracted. The three-index form of a slice is X[I:J:K], which means “extract all the items in X, from offset I through J-1, by K.” The third limit, K, defaults to 1, which is why normally all items in a slice are extracted from left to right. But if you specify an explicit value, you can use the third limit to skip items or to reverse their order.

For instance, a[1:10:2] will fetch every other item in X from offsets 1-9; that is, it will collect the items at offsets 1, 3, 5, 7 and 9. As usual, the first and second limits default to 0 and the length of the sequence, respectively, so a[::2] gets severy other item from the beginning to the end of the sequence:

>>> a = 'nowisthetimeto'
>>> a[1:10:2]
>>> 'oitei'

You can also use a negative stride. For example, the slicing expression “every”[::-1] returns the new string “yreve” – the first two bounds default to 0 and the length of the sequence, as before, and a stride of -1 indicates that the slice should go from right to left instead of the usual left to right. The effect is to reverse the sequence:

>>> a = 'every'
>>> a[::-1]
'yreve'

With a negative stride, the meanings of the first two bounds are essentially reversed. That is, the slice a[5:1:-1] fetches the items from 2 to 5, in reverse order (the result contains items from offsets 5, 4, 3, and 2):

>>> a = 'thequick'
>>> a[5:1:-1]
'iuqe'

Skipping and reverse like this are the most common use cases for three-limit slices, but see Python’s standard library manual for more details.

Slices have many applications. For example, argument words listed on a system command line are made available in the argv attribute of the built-in sys module:

#File command.py - echo command line args
import sys
print(sys.argv)

% python command.py -1 -2 -3
['command.py', '-1', '2', '3']

Usually, however, you’re only interested in inspected the arguments that follow the program name. This leads to a typical application of slices: a single slice expression can be used to return all but the first item of a list. Here, sys.argv[1:] returns the desired list, [‘-1’, ‘-2’, ‘-3’]. You can then process this list without having to accommodate the program name at the front.

External Links:

Python strings tutorial at afterhoursprogramming.com

Filed Under: Guides, Tutorials Tagged With: argv, indexing, Python, Python strings, slicing, strings, sys module, third index

Python Strings: Part Five

February 2, 2015 by maximumdx Leave a Comment

Because strings are defined as ordered collections of characters, we can access their components by position. In Python, characters in a string are fetched by indexing – providing the numeric offset of the desired component in square brackets after the string. When you specify an index, you get back a one-character string at the specified position.

Strings in Python are similar to strings in the C/C++ language in that Python offsets start at 0 and end at one less than the length of the string. Unlike C, however, Python also lets you fetch items from sequences such as strings using negative offsets. Technically, a negative offset is added to the length of a string to derive a positive offset. You can also think of negative offsets as counting backward from the end. For example:

>>> a = 'party'
>>> a[0], a[-2]
>>> ('p', 't')
>>> a[1:3], a[1:], a[:-1]
('ar', 'arty', 'part')

The first line defines a five-character string and assigns it the name a. The next line indexes it in two ways: a[0] gets the item at offset 0 from the left (the one-character string ‘p’), and a[-2] gets the item at offset 2 back from the end.

The last line in the preceding example demonstrates slicing, a generalized form of indexing that returns an entire section, not a single item. Most likely the best way to think of slicing is that it is a type of parsing, especially when applied to strings. It allows us to extract an entire section in a single step. Slices can be used to extract columns of data, chop off leading and trailing text, and more.

The basics of using slicing are fairly simple. When you index a sequence object such as a string on a pair of offsets separated by a colon, Python returns a new object containing the contiguous section identified by the offset pair. The left offset is taken to be the lower bound (which is inclusive), and the right is the upper bound (which is noninclusive). That is, Python fetches all items from the lower bound up to but not including the upper bound, and returns a new object containing the fetched items. If omitted, the left and right bounds default to 0 and the length of the object your are slicing, respectively.

For instance, in the example above, a[1:3] extracts the items at offsets 1 and 2. It grabs the second and third items, and strops before the fourth item and offset 3. Next, a[1:] gets tall the items beyond the first. The upper bound, which is not specified, defaults to the length of the string. Finally, a[:-1] fetches all but the last item. The lower bound defaults to 0, and -1 refers to the last item (noninclusive).

Indexing and slicing are powerful tools, and if you’re not sure about the effects of a slice, you can always try it out at the Python interactive prompt. You can even change an entire section of another object in one step by assigning to a slice, though not for immutables like strings.

External Links:

Python strings tutorial at afterhoursprogramming.com

Filed Under: Guides, Tutorials Tagged With: indexing, Python, Python strings, slicing, strings

Python Strings: Part Four

January 30, 2015 by maximumdx Leave a Comment

In the previous articles, we introduced Python strings and covered escape sequences, raw strings and triple-quoted strings. Now we can cover some basic string operations. Strings can be concatenated using the + operator and repeated using the * operator:

>>> len('string')
6
>>> 'str' + 'ing'
'string'
>>> 'repeat' * 4
'repeatrepeatrepeatrepeat'

Formally, adding two string objects creates a new string object, with the contents of its operands joined. Repetition is like adding a string to itself a number of times. In both cases, Python lets you create arbitrarily-sized strings. There is no need to pre-declare anything in Python, including the sizes of data structures such as strings. The len built-in function returns the length of a string, or any object with a length.

Repetition comes in handy in a number of contexts. For example, if you want to print out 80 asterisks, just do this:

>>> print('*' * 80)

Notice that we are using the same + and * operators that perform addition and multiplication when using numbers, so we are using operator overloading. Python does the correct operation because it knows the types of the objects being added and multiplied. But there’s a limit to what you can do with operator overloading in Python. For example, Python does not allow you to mix numbers and strings in + expressions. ‘repeat’ + 3 will raise an error instead of automatically converting 3 to a string.

You can also iterate over strings in loops using for statements and test membership for both characters and substrings with the in expression operator, which is essentially a search. For substrings, in is much like the str.find() method, but it returns a Boolean result instead of the substring’s position. For example:

>>> mystr = "repeat"
>>> for c in mystr:
	print(c, ' ')

r e p e a t
>>> "p" in mystr
True
>>> "y" in mystr
False
>>> 'straw' in 'strawberry'
True

The for loop assigns a variable to successive items in a sequence and executes one or more statements for each item. In effect, the variable c becomes a cursor stepping across the string here.

External Links:

Python strings tutorial at afterhoursprogramming.com

Filed Under: General, Guides Tagged With: iteration, operators, Python, Python strings, repetition, strings

Python Strings: Part Three

January 26, 2015 by maximumdx Leave a Comment

As we saw in the previous article, escape sequences are handy for embedding special byte codes within strings. Sometimes, however, the special treatment of backslashes can cause problems. For example, let’s assume we want to open a file called thefile.txt for writing in the C directory newdir, and we use a statement like this:

fp = open('C:\newdir\thefile.txt','w')

The problem here is that \n is taken to stand for a newline character, and \t is replaced with a tab. In effect, the call tries to open a file name c:[newline]ew[tab]hefile.txt, which is not what we want.

The solution is to use raw strings. If the letter r (uppercase or lowercase) appears just before the opening quote of a string, it turns off the escape mechanism. The result is that Python retains your backslashes literally, exactly as you type them. Therefore, to fix the filename problem, just remember to add the letter r on Windows:

fp = open(r'C:\newdir\thefile.txt','w')

Alternatively, because two backslashes are really an escape sequence for one backslash, you can keep your backslashes by simply doubling them up:

fp = open('C:\\newdir\\thefile.txt','w')

In fact, Python does this sometimes when it prints strings with embedded backslashes:

>>> path = r'C:\newdir\thefile.txt'
>>> path
'C:\\newdir\\thefile.txt'
>>> print(path)
'C:\\newdir\\thefile.txt'

As with numeric representation, the default format at the interactive prompt prints results as if they were code, and therefore escapes backslashes in the output. The print statement provides a more user-friendly format that shows that there is actually only one backslash in each spot. To verify that this is the case, you can check the result of the built-in len function, which returns the number of bytes in the string, independent of display formats. If you count the characters in the print(path) output, you will see that there is really just one character per backslash, for a total of 21.

Besides directory paths on Windows, raw strings are commonly used for regular expressions. Also note that Python scripts can usually use forward slashes in directory paths on Windows and Unix. This is because Python tries to interpret paths portably. Raw strings are useful, however, if you code paths using native Windows backslashes.

Finally, Python also has a triple-quoted string literal format (sometimes called a block string) that is a syntactic convenience for coding multiline text data. This form begins with three quotes of either the single or double variety, is followed by any number of lines of text, and is closed with the same triple-quote sequence that opened it. Single and double quotes embedded in the string’s text may be, but do not have to be, escaped. The string does not end until Python sees three unescaped quotes of the same kind used to start the literal:

>>> mystr = """This is an example
of using triple quotes
to code a multiline string"""
>>> mystr
'This is an example\nof using triple quotes\nto code a multiline string'

This string spans three lines. Python collects all the triple-quoted text into a single multiline string, with embedded newline characters (\n) at the places where your code has line breaks. To see the string with the newlines interpreted, print it instead of echoing:

>>> print(mystr)
This is an example
of using triple quotes
to code a multiline string

Triple-quoted strings are useful any time you need multiline text in your program. You can embed such blocks directly in your scripts without resorting to external text files or explicit concatenation and newline characters.

Triple-quoted strings are also commonly used for documentation strings, which are string literals that are taken as comments when they appear at specific points in your file. They do not have to be triple-quoted blocks, but they usually are to allow for multiline comments.

External Links: