Python Strings: Part Four

Python stringsIn the previous articles, we introduced Python strings and covered escape sequences, raw strings and triple-quoted strings. Now we can cover some basic string operations. Strings can be concatenated using the + operator and repeated using the * operator:

>>> len('string')
6
>>> 'str' + 'ing'
'string'
>>> 'repeat' * 4
'repeatrepeatrepeatrepeat'

Formally, adding two string objects creates a new string object, with the contents of its operands joined. Repetition is like adding a string to itself a number of times. In both cases, Python lets you create arbitrarily-sized strings. There is no need to pre-declare anything in Python, including the sizes of data structures such as strings. The len built-in function returns the length of a string, or any object with a length.

Repetition comes in handy in a number of contexts. For example, if you want to print out 80 asterisks, just do this:

>>> print('*' * 80)

Notice that we are using the same + and * operators that perform addition and multiplication when using numbers, so we are using operator overloading. Python does the correct operation because it knows the types of the objects being added and multiplied. But there’s a limit to what you can do with operator overloading in Python. For example, Python does not allow you to mix numbers and strings in + expressions. ‘repeat’ + 3 will raise an error instead of automatically converting 3 to a string.

You can also iterate over strings in loops using for statements and test membership for both characters and substrings with the in expression operator, which is essentially a search. For substrings, in is much like the str.find() method, but it returns a Boolean result instead of the substring’s position. For example:

>>> mystr = "repeat"
>>> for c in mystr:
	print(c, ' ')

r e p e a t
>>> "p" in mystr
True
>>> "y" in mystr
False
>>> 'straw' in 'strawberry'
True

The for loop assigns a variable to successive items in a sequence and executes one or more statements for each item. In effect, the variable c becomes a cursor stepping across the string here.

External Links:

Strings at docs.python.org

Python Strings at Google for Developers

Python strings tutorial at afterhoursprogramming.com

Python Strings: Part Three

Python stringsAs we saw in the previous article, escape sequences are handy for embedding special byte codes within strings. Sometimes, however, the special treatment of backslashes can cause problems. For example, let’s assume we want to open a file called thefile.txt for writing in the C directory newdir, and we use a statement like this:

fp = open('C:\newdir\thefile.txt','w')

The problem here is that \n is taken to stand for a newline character, and \t is replaced with a tab. In effect, the call tries to open a file name c:[newline]ew[tab]hefile.txt, which is not what we want.

The solution is to use raw strings. If the letter r (uppercase or lowercase) appears just before the opening quote of a string, it turns off the escape mechanism. The result is that Python retains your backslashes literally, exactly as you type them. Therefore, to fix the filename problem, just remember to add the letter r on Windows:

fp = open(r'C:\newdir\thefile.txt','w')

Alternatively, because two backslashes are really an escape sequence for one backslash, you can keep your backslashes by simply doubling them up:

fp = open('C:\\newdir\\thefile.txt','w')

In fact, Python does this sometimes when it prints strings with embedded backslashes:

>>> path = r'C:\newdir\thefile.txt'
>>> path
'C:\\newdir\\thefile.txt'
>>> print(path)
'C:\\newdir\\thefile.txt'

As with numeric representation, the default format at the interactive prompt prints results as if they were code, and therefore escapes backslashes in the output. The print statement provides a more user-friendly format that shows that there is actually only one backslash in each spot. To verify that this is the case, you can check the result of the built-in len function, which returns the number of bytes in the string, independent of display formats. If you count the characters in the print(path) output, you will see that there is really just one character per backslash, for a total of 21.

Besides directory paths on Windows, raw strings are commonly used for regular expressions. Also note that Python scripts can usually use forward slashes in directory paths on Windows and Unix. This is because Python tries to interpret paths portably. Raw strings are useful, however, if you code paths using native Windows backslashes.

Finally, Python also has a triple-quoted string literal format (sometimes called a block string) that is a syntactic convenience for coding multiline text data. This form begins with three quotes of either the single or double variety, is followed by any number of lines of text, and is closed with the same triple-quote sequence that opened it. Single and double quotes embedded in the string’s text may be, but do not have to be, escaped. The string does not end until Python sees three unescaped quotes of the same kind used to start the literal:

>>> mystr = """This is an example
of using triple quotes
to code a multiline string"""
>>> mystr
'This is an example\nof using triple quotes\nto code a multiline string'

This string spans three lines. Python collects all the triple-quoted text into a single multiline string, with embedded newline characters (\n) at the places where your code has line breaks. To see the string with the newlines interpreted, print it instead of echoing:

>>> print(mystr)
This is an example
of using triple quotes
to code a multiline string

Triple-quoted strings are useful any time you need multiline text in your program. You can embed such blocks directly in your scripts without resorting to external text files or explicit concatenation and newline characters.

Triple-quoted strings are also commonly used for documentation strings, which are string literals that are taken as comments when they appear at specific points in your file. They do not have to be triple-quoted blocks, but they usually are to allow for multiline comments.

External Links:

Strings at docs.python.org

Python Strings at Google for Developers

Python strings tutorial at afterhoursprogramming.com

Python Strings: Part Two

Python stringsIn the first article, we introduced Python strings and covered some of the basics. In this article, we will continue our look at strings.

Escape Sequences

In the last article, we introduced the following example:

>>> 'string\'s', "string\"s"

This example embedded a quote inside a string by preceding it with a backslash. This is representative of a general pattern in strings: backslashes are used to introduce special byte codings known as escape sequences. Escape sequences let us embed byte codes in strings that cannot be easily typed on a keyboard. The character \, and one or more characters following it in the string literal, are replaced with a single character in the resulting string object, which has the binary value specified by the escape sequence. For example, we can embed a newline:

>>> a = 'some\nstring'

We can also embed a tab:

>>> a = 'some\tstring'

The two characters \n stand for a single character – the byte containing the binary value of the newline character in your character set, which is usually ASCII code 10). Similarly, the sequence \t is replaced with the tab character. If we just type the variable at the Python interpreter command line, it shows the escape sequences:

>>> a
'some\tstring'

But print interprets the escape sequences, so we get a different result:

>>> print(a)
some	string

To be completely sure how many bytes are in the string, you can use the built-in len function, which returns the actual number of bytes, regardless of how the string is displayed:

>>> len(a)
11

The string is eleven bytes long. Note that the original backslash characters are not really stored with the string in memory. Rather, they are used to tell Python to store special byte values in the string. Apart from \n and \t, here are some of the more interesting escape sequences:

\\ Backslash (stores one \)
\’ Single quote (stores ‘)
\” Double quote (stores “)
\b Backspace
\xhh Character with hex value hh (at most 2 digits
\ooo Character with octal value ooo (up to three digits)
\uhhhh Unicode 16-bit hex
\Uhhhhhhhh Unicode 32-bit hex

Note that some escape sequences allow you to embed absolute binary values into the bytes of a string. For example, here’s a string that embeds two binary zero bytes:

>>> a = 'a\0d\0e'

This is a five-character string, as we can see:

>>> len(a)
5

In Python, the zero byte does not terminate a string the way it typically does in C. Instead, Python keeps both the string’s length and text in memory. In fact, no character terminates a string in Python. Notice also that Python displays nonprintable characters in hex, regardless of how they were specified.

If Python does not recognize the character after a \ as being a valid escape code, it simply keeps the backslash in the resulting string:

>>> a = "d:\download\mycode"
>>> a
'd:\\download\\mycode'
>>> len(a)
18

Unless you want to memorize the escape codes; you probably should not rely on this behavior. To code literal backslashes explicitly such that they are retained in your strings, double them up (\\ instead of \) or use raw strings.

External Links:

Strings at docs.python.org

Python Strings at Google for Developers

Python strings tutorial at afterhoursprogramming.com

Python Strings: Part One

Python strings

Introduction to Python Strings

A string in Python is an ordered collection of characters used to store and represent text-based information. From a functional perspective, strings can be used to represent just about anything that can be encoded as text. They can also be used to hold the absolute binary values values of bytes and multibyte Unicode text.

You may have used strings in other languages, and Python’s strings serve the same role as character arrays in languages such as C. In C, we might see a statement such as this:

char ch = ‘a’;

If we want to have a string, we would use something like this:

char *str = “Some arbitrary string”;

or:

char str[] = “Some arbitrary string”;

But in either case, our string is actually an array of characters. Python has no distinct type for individual characters; instead you just use one-character strings. Also, unlike in C, strings in Python are a somewhat higher-level tool and come with a powerful set of processing tools.

Python strings are categorized as immutable sequences, meaning that the characters they contain have a left-to-right positional order and that they cannot be changed in place. In fact, strings are a subset of the larger class of objects called sequences.

There are many ways to write strings in Python. This is a valid Python string:

a = ‘string’

But then again, so is this:

a = “string”

You can also use triple quotes:

a = ”’…string…”’
a = “””…string…”””

Around Python strings, single and double quote characters are interchangeable. That is, string literals can be written enclosed in either two single or two double quotes – the two forms work the same and return the same type of object. The reason for supporting both is that it allows you to embed a quote character of the other variety inside a string without escape it with a backslash. You may embed a single quote character in a string enclosed in double quote characters, and vice versa:

>>> ‘string”s’, “string’s”
(‘string”s’, “string’s”)

Incidentally, Python automatically concatenates adjacent string literals in any expression, although it is almost as simple to add a + operator between them to invoke concatenation explicitly:

>>> a = “Some ” ‘arbitrary’ ” string”
>>> a
‘Some arbitrary string’

Note that adding commas between these strings would result in a tuple, not a string. Also notice in all of these outputs that Python prefers to print strings in single quotes, unless they embed one. You can also embed quotes by escaping them with backslashes:

>>> ‘string\’s’, “string\”s”
(“string’s”, ‘string”s’)

External Links:

Strings at docs.python.org

Python Strings at Google for Developers

Python strings tutorial at afterhoursprogramming.com

Python Exceptions: Part Six

exceptionsAs a special case for debugging purposes, Python includes the assert statement; it can be thought of as a conditional raise statement. A statement of the form:

	assert , <test> <data>

works like the following code:

	if __debug__:
		if not :
			raise AssertionError()

In other words, if the test evaluates to false, Python raises an exception: the data item is used as the exception’s constructor argument, if a data item is provided. Like all exceptions, the AssertionError exception will kill your program if it’s not caught with a try, in which case the data item shows up as part of the error message. Otherwsie, AssertionError exceptions can be caught and handled like any other exception.

As an added feature, assert statements may be removed from a compiled program’s byte code if the -0 Python command-line flag is used, thus optimizing the program (similar to assert statements in C/C++). AssertionError is a built-in exception, and the __debug__ flag is a built-in name that is automatically set to True unless the -0 flag is used. You can use a command line like python -0 code.py to run in optimized mode and disable asserts.

Assertions are typically used to verify program conditions during development. When displayed, their error message text automatically includes source code line information and the value listed in the assert statement.

As an example, consider a function to convert from Fahrenheit to Celsius. We’ll make it bail out if it sees a temperature less than absolute zero:

def FahrenheitToCelsius(ftemp):
	assert (ftemp >= -460), "Less than absolute zero!"
	return ((ftemp-32)*(5.0/9.0))

FahrenheitToCelsius(32)
FahrenheitToCelsius(55)
FahrenheitToCelsius(-500)

When the above code is executed, it produces the following result:

0.0
12.777777777777779
Traceback (most recent call last):
  File "<pyshell#11>", line 1, in 
    FahrenheitToCelsius(-500)
  File "<pyshell#8>", line 2, in FahrenheitToCelsius
    assert (ftemp >= -460), "Less than absolute zero!"
AssertionError: Less than absolute zero!

It is important to keep in mind that assert is mostly intended for trapping user-defined constraints and not for catching actual programming errors. Because Python traps programming errors itself, there is usually no need to code asserts to catch things like out-of-bounds indexes, type mismatches, and zero divides. Such asserts are generally unnecessary. Because Python raises exceptions on errors automatically, you can let it do the job for you.

With/As Clauses

Python 2.6 and above introduced a new exception-related statement: the with, and its optional as clause. This statement is designed to work with context manager objects, which support a new method-based protocol. The with/as statement is designed to be an alternative to common try/finally statements. Like try/finally, with/as is intended for specifying termination-time or cleanup activity that must run regardless of whether an exception occurs in a processing step. Unlike try/finally, the with statement supports a richer object-based protocol for specifying both entry and exit actions around a block of code.

The basic format of the with statement looks like this:

	with expression [as variable]:
		with-block

The expression here is assumed to return an object that supports the context management protocol. This object may also return a value that will be assigned to the name variable if the optional as clause is present.

Note that the variable is not necessarily assigned the result of the expression. The result of the expression is the object that supports the context protocol, and the variable may be assigned something else intended to be used inside the statement. The object returned by the expression may then run startup code before the with-block is started, as well as termination code after the block is done, regardless of whether the block raised an exception or not.

Some built-in Python objects have been augmented to support the context management protocol, and so can be used with the with statement. For example, file objects have a context manager that automatically closes the file after the with block regardless of whether an exception is raised:

	with open(r'file.txt') as myfile:
		for line in myfile:
			print(line)

Here, the call to open returns a simple file object that is assigned to the name myfile. We can use myfile with the usual file tools. In this case, the file iterator reads line by line in the for loop.

But this object also supports the context management protocol used by the with statement. After this with statement has run, the context management machinery guarantees that the file object referenced by myfile is automatically closed, even if the for loop raised an exception while processing the file.

Although file objects are automatically closed on garbage collection, it is not always easy to know when that will occur. The with statement in this role is an alternative that allows us to be sure that the close will occur after execution of a specific block of code. We can accomplish a similar effect with the more general and explicit try/finally statement, but it requires four lines of code instead of one:

	myfile = open(r'file.txt')
	try:
		for line in myfile:
			print(line)
	finally:
		myfile.close()

The lock and condition synchronization objects they define may also be used with the with statement, because they support the context management protocol:

	lock = threading.lock()
	with lock:
	# critical code
	...access shared resources here...

Here, the context management machinery guarantees that the lock is automatically acquired before the block is executed and released on the block is complete, regardless of exception outcomes.

External Links:

Assertions in Python at www.tutorialspoint.com

Python Exceptions: Part Five

exceptions

The Raise Statement

One of the statements not yet covered in this series on exceptions is the raise statement. To trigger excepts explicitly in Python, you can code raise statements. Their general form is simple – a raise statement consists of the word raise, optionally followed by the class to be raised or an instance of it:

	raise  # Raise an instance of a class
	raise  # Make and raise instance of a class
	raise  # Re-raise the most recent exception

The first raise form presented here is the most common one. We provide an instance directly, either created before the raise or within the raise statement itself. If we pass a class instead, then Python will call the class with no constructor arguments to create an instance to be raised. This form is equivalent to adding parenthesis after the class reference. The last form re-raises the most recently raised exception. It is commonly used in exception handlers to propagate exceptions that have been caught.

With built-in exceptions, the following two forms are equivalent:

	raise IndexError
	raise IndexError()

Both examples raise an instance of the exception class named, although the first creates the instance implicitly. We can also create the instance ahead of time, because the raise statement accepts any kind of object reference:

	my_exc = IndexError()
	raise my_exc

	my_excs = [IndexError, TypeError]
	raise my_excs[0]

When an exception is raised, Python sends the raised instance along with the exception. If a try includes an except name as X: clause, the variable X will be assigned the instance provided in the raise:

	try:
		...
	except IndexError as X: 
		...

The as is optional in the try handler. If it is omitted, the instance is simply not assigned to to a name. Including it allows the handler to access both data in the instance and methods in the exception class.

This model works the same for user-defined exceptions coded in classes:

	class MyExc(Exception): pass
	...
	raise MyExc('wooble')
	...
	try:
		...
	except MyExc as X:
		print(X.args)

Regardless of how exceptions are named, they are always identified by instance objects, and at most one is active at any given time. Once caught by an except clause anywhere in the program, an exception dies unless it is re-raised by another raise statement or error.

A raise statement that does not include an exception name or extra data value simply re-raises the current exception. This form is typically used if you need to catch and handle an exception, but you don’t want the exception to die in your code. Running a raise this way re-raises the exception and propagates it to a higher handler.

Finally, Python (3.0 and above) also allows an optional from clause:

	raise exception from otherexception

When the from is used, the second expression specifies another exception class or instance to attach to the raised exception’s __cause__ atrribute. If the raised exception is not caught, Python prints both exceptions as part of the standard error message:

>>> try:
... 	1/0
...	except Exception as E:
...	raise TypeError('Not good') from E

When an exception is raised inside an exception handler, a similar procedure is followed implicitly. The previous exception is attached to the new exception’s __context__ attribute and is again displayed in the standard error message if the exception goes uncaught.

External Links:

Handling Exceptions at Python Wiki Built-in Exceptions at docs.python.org

Python Exceptions: Part Four

exceptionsPython Exceptions: try/except/finally

In all versions of Python prior to Release 2.5, there were two types of try statements. You could either use a finally to ensure that cleanup code was always run, or write except blocks to catch and recover from specific exceptions and optionally specify an else clause to be run if no exceptions occurred. In other words, the finally clause could not by mixed with except and else.

That has changed with Python 2.5 and later. Now, the two statements have merged; we can mix finally except and else clauses in the same statement:

try:
	statements
except Exception1:
	handler1
except Exception2:
	handler2
else:
	else_block

The code in this statement’s main-action block is executed first, as usual. If that code raises an exception, all the except blocks are tested, one after another, looking for a match to the exception raised. If the exception raised is Exception1, the handler1 block is executed; if it’s Exception2, handler2 is run, etc. If no exception is raised, the else-block is executed.

No matter what’s happened previously, the finally-block is executed once the main action block is complete and any raised exceptions have been handled. In fact, the code in the finally-block will be run even if there is an error in an exception handler or the else-block and a new exception is raised.

As always, the finally clause does not end the exception. If an exception is active when the finally-block is executed, it continues to be propagated after the finally-block runs, and control jumps somewhere else in the program. If no exception is active when the finally is run, control resumes after the entire try statement.

The effect here is that the finally-block is always run, regardless of whether [1] an exception occurred in the main action and was handled; [2] an exception occurred in the main action and was not handled; no exceptions occurred in the main action, and/or [4] a new exception was triggered in one of the handlers. Again, the finally serves to specify cleanup actions that must always occur on the way out of the try, regardless of what exceptions have been raised or handled.

When try, except, else and finally are combined like this, the order must be like this:

	try -> except -> else -> finally

where the else and finally are optional, and there may be zero or more except blocks, but there must be at least one except if an else appears. The try statement essentially consists of two parts: excepts with an optional else, and/or the finally.

Because of these rules, the else can appear only if there is at least one except, and it is always possible to mix except and finally, regardless of whether an else appears to though the except can omit an exception name to catch everything and run a raise statement to re-raise the current exception. If you violate any of these rules, Python will raise a syntax error exception before your code runs.

Finally, prior to Python 2.5, it is actually possible to combine finally and except clauses in a try by syntactically nesting a try/except in the try block of a try/finally statement. The following has the same effect as the new merged form:

try:
	try:
		main-action
	except Exception1:
		handler1
	except Exception2:
		handler2
	...
	else
		no-error
	finally:
		cleanup

Again, the finally block is always run on the way out, regardless of what happened in the main action and regardless of any exception handlers run in the nested try. Since an else always requires an except, this nested form even sports the same mixing constraints of the unified statement form outlined in the preceding section. But this nested equivalent is more obscure and requires more code than the new merged form. Mixing finally into the same statement makes your code easier to write and read and is thus the generally preferred technique.

External Links:

Handling Exceptions at Python Wiki Built-in Exceptions at docs.python.org