Python Strings: Part Three

Python stringsAs we saw in the previous article, escape sequences are handy for embedding special byte codes within strings. Sometimes, however, the special treatment of backslashes can cause problems. For example, let’s assume we want to open a file called thefile.txt for writing in the C directory newdir, and we use a statement like this:

fp = open('C:\newdir\thefile.txt','w')

The problem here is that \n is taken to stand for a newline character, and \t is replaced with a tab. In effect, the call tries to open a file name c:[newline]ew[tab]hefile.txt, which is not what we want.

The solution is to use raw strings. If the letter r (uppercase or lowercase) appears just before the opening quote of a string, it turns off the escape mechanism. The result is that Python retains your backslashes literally, exactly as you type them. Therefore, to fix the filename problem, just remember to add the letter r on Windows:

fp = open(r'C:\newdir\thefile.txt','w')

Alternatively, because two backslashes are really an escape sequence for one backslash, you can keep your backslashes by simply doubling them up:

fp = open('C:\\newdir\\thefile.txt','w')

In fact, Python does this sometimes when it prints strings with embedded backslashes:

>>> path = r'C:\newdir\thefile.txt'
>>> path
'C:\\newdir\\thefile.txt'
>>> print(path)
'C:\\newdir\\thefile.txt'

As with numeric representation, the default format at the interactive prompt prints results as if they were code, and therefore escapes backslashes in the output. The print statement provides a more user-friendly format that shows that there is actually only one backslash in each spot. To verify that this is the case, you can check the result of the built-in len function, which returns the number of bytes in the string, independent of display formats. If you count the characters in the print(path) output, you will see that there is really just one character per backslash, for a total of 21.

Besides directory paths on Windows, raw strings are commonly used for regular expressions. Also note that Python scripts can usually use forward slashes in directory paths on Windows and Unix. This is because Python tries to interpret paths portably. Raw strings are useful, however, if you code paths using native Windows backslashes.

Finally, Python also has a triple-quoted string literal format (sometimes called a block string) that is a syntactic convenience for coding multiline text data. This form begins with three quotes of either the single or double variety, is followed by any number of lines of text, and is closed with the same triple-quote sequence that opened it. Single and double quotes embedded in the string’s text may be, but do not have to be, escaped. The string does not end until Python sees three unescaped quotes of the same kind used to start the literal:

>>> mystr = """This is an example
of using triple quotes
to code a multiline string"""
>>> mystr
'This is an example\nof using triple quotes\nto code a multiline string'

This string spans three lines. Python collects all the triple-quoted text into a single multiline string, with embedded newline characters (\n) at the places where your code has line breaks. To see the string with the newlines interpreted, print it instead of echoing:

>>> print(mystr)
This is an example
of using triple quotes
to code a multiline string

Triple-quoted strings are useful any time you need multiline text in your program. You can embed such blocks directly in your scripts without resorting to external text files or explicit concatenation and newline characters.

Triple-quoted strings are also commonly used for documentation strings, which are string literals that are taken as comments when they appear at specific points in your file. They do not have to be triple-quoted blocks, but they usually are to allow for multiline comments.

External Links:

Strings at docs.python.org

Python Strings at Google for Developers

Python strings tutorial at afterhoursprogramming.com