Python Strings: Part Six

Python stringsIn the previous article, we began our look at indexing and slicing. In this article, we will continue our look at slicing and show some practical applications of slicing.

In Python 2.3 and later, there is support for a third index, used as a step. The step is added to the index of each item extracted. The three-index form of a slice is X[I:J:K], which means “extract all the items in X, from offset I through J-1, by K.” The third limit, K, defaults to 1, which is why normally all items in a slice are extracted from left to right. But if you specify an explicit value, you can use the third limit to skip items or to reverse their order.

For instance, a[1:10:2] will fetch every other item in X from offsets 1-9; that is, it will collect the items at offsets 1, 3, 5, 7 and 9. As usual, the first and second limits default to 0 and the length of the sequence, respectively, so a[::2] gets severy other item from the beginning to the end of the sequence:

>>> a = 'nowisthetimeto'
>>> a[1:10:2]
>>> 'oitei'

You can also use a negative stride. For example, the slicing expression “every”[::-1] returns the new string “yreve” – the first two bounds default to 0 and the length of the sequence, as before, and a stride of -1 indicates that the slice should go from right to left instead of the usual left to right. The effect is to reverse the sequence:

>>> a = 'every'
>>> a[::-1]
'yreve'

With a negative stride, the meanings of the first two bounds are essentially reversed. That is, the slice a[5:1:-1] fetches the items from 2 to 5, in reverse order (the result contains items from offsets 5, 4, 3, and 2):

>>> a = 'thequick'
>>> a[5:1:-1]
'iuqe'

Skipping and reverse like this are the most common use cases for three-limit slices, but see Python’s standard library manual for more details.

Slices have many applications. For example, argument words listed on a system command line are made available in the argv attribute of the built-in sys module:

#File command.py - echo command line args
import sys
print(sys.argv)

% python command.py -1 -2 -3
['command.py', '-1', '2', '3']

Usually, however, you’re only interested in inspected the arguments that follow the program name. This leads to a typical application of slices: a single slice expression can be used to return all but the first item of a list. Here, sys.argv[1:] returns the desired list, [‘-1’, ‘-2’, ‘-3’]. You can then process this list without having to accommodate the program name at the front.

External Links:

Strings at docs.python.org

Python Strings at Google for Developers

Python strings tutorial at afterhoursprogramming.com

Python Strings: Part Five

Python stringsBecause strings are defined as ordered collections of characters, we can access their components by position. In Python, characters in a string are fetched by indexing – providing the numeric offset of the desired component in square brackets after the string. When you specify an index, you get back a one-character string at the specified position.

Strings in Python are similar to strings in the C/C++ language in that Python offsets start at 0 and end at one less than the length of the string. Unlike C, however, Python also lets you fetch items from sequences such as strings using negative offsets. Technically, a negative offset is added to the length of a string to derive a positive offset. You can also think of negative offsets as counting backward from the end. For example:

>>> a = 'party'
>>> a[0], a[-2]
>>> ('p', 't')
>>> a[1:3], a[1:], a[:-1]
('ar', 'arty', 'part')

The first line defines a five-character string and assigns it the name a. The next line indexes it in two ways: a[0] gets the item at offset 0 from the left (the one-character string ‘p’), and a[-2] gets the item at offset 2 back from the end.

The last line in the preceding example demonstrates slicing, a generalized form of indexing that returns an entire section, not a single item. Most likely the best way to think of slicing is that it is a type of parsing, especially when applied to strings. It allows us to extract an entire section in a single step. Slices can be used to extract columns of data, chop off leading and trailing text, and more.

The basics of using slicing are fairly simple. When you index a sequence object such as a string on a pair of offsets separated by a colon, Python returns a new object containing the contiguous section identified by the offset pair. The left offset is taken to be the lower bound (which is inclusive), and the right is the upper bound (which is noninclusive). That is, Python fetches all items from the lower bound up to but not including the upper bound, and returns a new object containing the fetched items. If omitted, the left and right bounds default to 0 and the length of the object your are slicing, respectively.

For instance, in the example above, a[1:3] extracts the items at offsets 1 and 2. It grabs the second and third items, and strops before the fourth item and offset 3. Next, a[1:] gets tall the items beyond the first. The upper bound, which is not specified, defaults to the length of the string. Finally, a[:-1] fetches all but the last item. The lower bound defaults to 0, and -1 refers to the last item (noninclusive).

Indexing and slicing are powerful tools, and if you’re not sure about the effects of a slice, you can always try it out at the Python interactive prompt. You can even change an entire section of another object in one step by assigning to a slice, though not for immutables like strings.

External Links:

Strings at docs.python.org

Python Strings at Google for Developers

Python strings tutorial at afterhoursprogramming.com

Python Strings: Part Four

Python stringsIn the previous articles, we introduced Python strings and covered escape sequences, raw strings and triple-quoted strings. Now we can cover some basic string operations. Strings can be concatenated using the + operator and repeated using the * operator:

>>> len('string')
6
>>> 'str' + 'ing'
'string'
>>> 'repeat' * 4
'repeatrepeatrepeatrepeat'

Formally, adding two string objects creates a new string object, with the contents of its operands joined. Repetition is like adding a string to itself a number of times. In both cases, Python lets you create arbitrarily-sized strings. There is no need to pre-declare anything in Python, including the sizes of data structures such as strings. The len built-in function returns the length of a string, or any object with a length.

Repetition comes in handy in a number of contexts. For example, if you want to print out 80 asterisks, just do this:

>>> print('*' * 80)

Notice that we are using the same + and * operators that perform addition and multiplication when using numbers, so we are using operator overloading. Python does the correct operation because it knows the types of the objects being added and multiplied. But there’s a limit to what you can do with operator overloading in Python. For example, Python does not allow you to mix numbers and strings in + expressions. ‘repeat’ + 3 will raise an error instead of automatically converting 3 to a string.

You can also iterate over strings in loops using for statements and test membership for both characters and substrings with the in expression operator, which is essentially a search. For substrings, in is much like the str.find() method, but it returns a Boolean result instead of the substring’s position. For example:

>>> mystr = "repeat"
>>> for c in mystr:
	print(c, ' ')

r e p e a t
>>> "p" in mystr
True
>>> "y" in mystr
False
>>> 'straw' in 'strawberry'
True

The for loop assigns a variable to successive items in a sequence and executes one or more statements for each item. In effect, the variable c becomes a cursor stepping across the string here.

External Links:

Strings at docs.python.org

Python Strings at Google for Developers

Python strings tutorial at afterhoursprogramming.com

Python Strings: Part Three

Python stringsAs we saw in the previous article, escape sequences are handy for embedding special byte codes within strings. Sometimes, however, the special treatment of backslashes can cause problems. For example, let’s assume we want to open a file called thefile.txt for writing in the C directory newdir, and we use a statement like this:

fp = open('C:\newdir\thefile.txt','w')

The problem here is that \n is taken to stand for a newline character, and \t is replaced with a tab. In effect, the call tries to open a file name c:[newline]ew[tab]hefile.txt, which is not what we want.

The solution is to use raw strings. If the letter r (uppercase or lowercase) appears just before the opening quote of a string, it turns off the escape mechanism. The result is that Python retains your backslashes literally, exactly as you type them. Therefore, to fix the filename problem, just remember to add the letter r on Windows:

fp = open(r'C:\newdir\thefile.txt','w')

Alternatively, because two backslashes are really an escape sequence for one backslash, you can keep your backslashes by simply doubling them up:

fp = open('C:\\newdir\\thefile.txt','w')

In fact, Python does this sometimes when it prints strings with embedded backslashes:

>>> path = r'C:\newdir\thefile.txt'
>>> path
'C:\\newdir\\thefile.txt'
>>> print(path)
'C:\\newdir\\thefile.txt'

As with numeric representation, the default format at the interactive prompt prints results as if they were code, and therefore escapes backslashes in the output. The print statement provides a more user-friendly format that shows that there is actually only one backslash in each spot. To verify that this is the case, you can check the result of the built-in len function, which returns the number of bytes in the string, independent of display formats. If you count the characters in the print(path) output, you will see that there is really just one character per backslash, for a total of 21.

Besides directory paths on Windows, raw strings are commonly used for regular expressions. Also note that Python scripts can usually use forward slashes in directory paths on Windows and Unix. This is because Python tries to interpret paths portably. Raw strings are useful, however, if you code paths using native Windows backslashes.

Finally, Python also has a triple-quoted string literal format (sometimes called a block string) that is a syntactic convenience for coding multiline text data. This form begins with three quotes of either the single or double variety, is followed by any number of lines of text, and is closed with the same triple-quote sequence that opened it. Single and double quotes embedded in the string’s text may be, but do not have to be, escaped. The string does not end until Python sees three unescaped quotes of the same kind used to start the literal:

>>> mystr = """This is an example
of using triple quotes
to code a multiline string"""
>>> mystr
'This is an example\nof using triple quotes\nto code a multiline string'

This string spans three lines. Python collects all the triple-quoted text into a single multiline string, with embedded newline characters (\n) at the places where your code has line breaks. To see the string with the newlines interpreted, print it instead of echoing:

>>> print(mystr)
This is an example
of using triple quotes
to code a multiline string

Triple-quoted strings are useful any time you need multiline text in your program. You can embed such blocks directly in your scripts without resorting to external text files or explicit concatenation and newline characters.

Triple-quoted strings are also commonly used for documentation strings, which are string literals that are taken as comments when they appear at specific points in your file. They do not have to be triple-quoted blocks, but they usually are to allow for multiline comments.

External Links:

Strings at docs.python.org

Python Strings at Google for Developers

Python strings tutorial at afterhoursprogramming.com

Python Strings: Part Two

Python stringsIn the first article, we introduced Python strings and covered some of the basics. In this article, we will continue our look at strings.

Escape Sequences

In the last article, we introduced the following example:

>>> 'string\'s', "string\"s"

This example embedded a quote inside a string by preceding it with a backslash. This is representative of a general pattern in strings: backslashes are used to introduce special byte codings known as escape sequences. Escape sequences let us embed byte codes in strings that cannot be easily typed on a keyboard. The character \, and one or more characters following it in the string literal, are replaced with a single character in the resulting string object, which has the binary value specified by the escape sequence. For example, we can embed a newline:

>>> a = 'some\nstring'

We can also embed a tab:

>>> a = 'some\tstring'

The two characters \n stand for a single character – the byte containing the binary value of the newline character in your character set, which is usually ASCII code 10). Similarly, the sequence \t is replaced with the tab character. If we just type the variable at the Python interpreter command line, it shows the escape sequences:

>>> a
'some\tstring'

But print interprets the escape sequences, so we get a different result:

>>> print(a)
some	string

To be completely sure how many bytes are in the string, you can use the built-in len function, which returns the actual number of bytes, regardless of how the string is displayed:

>>> len(a)
11

The string is eleven bytes long. Note that the original backslash characters are not really stored with the string in memory. Rather, they are used to tell Python to store special byte values in the string. Apart from \n and \t, here are some of the more interesting escape sequences:

\\ Backslash (stores one \)
\’ Single quote (stores ‘)
\” Double quote (stores “)
\b Backspace
\xhh Character with hex value hh (at most 2 digits
\ooo Character with octal value ooo (up to three digits)
\uhhhh Unicode 16-bit hex
\Uhhhhhhhh Unicode 32-bit hex

Note that some escape sequences allow you to embed absolute binary values into the bytes of a string. For example, here’s a string that embeds two binary zero bytes:

>>> a = 'a\0d\0e'

This is a five-character string, as we can see:

>>> len(a)
5

In Python, the zero byte does not terminate a string the way it typically does in C. Instead, Python keeps both the string’s length and text in memory. In fact, no character terminates a string in Python. Notice also that Python displays nonprintable characters in hex, regardless of how they were specified.

If Python does not recognize the character after a \ as being a valid escape code, it simply keeps the backslash in the resulting string:

>>> a = "d:\download\mycode"
>>> a
'd:\\download\\mycode'
>>> len(a)
18

Unless you want to memorize the escape codes; you probably should not rely on this behavior. To code literal backslashes explicitly such that they are retained in your strings, double them up (\\ instead of \) or use raw strings.

External Links:

Strings at docs.python.org

Python Strings at Google for Developers

Python strings tutorial at afterhoursprogramming.com

Python Strings: Part One

Python strings

Introduction to Python Strings

A string in Python is an ordered collection of characters used to store and represent text-based information. From a functional perspective, strings can be used to represent just about anything that can be encoded as text. They can also be used to hold the absolute binary values values of bytes and multibyte Unicode text.

You may have used strings in other languages, and Python’s strings serve the same role as character arrays in languages such as C. In C, we might see a statement such as this:

char ch = ‘a’;

If we want to have a string, we would use something like this:

char *str = “Some arbitrary string”;

or:

char str[] = “Some arbitrary string”;

But in either case, our string is actually an array of characters. Python has no distinct type for individual characters; instead you just use one-character strings. Also, unlike in C, strings in Python are a somewhat higher-level tool and come with a powerful set of processing tools.

Python strings are categorized as immutable sequences, meaning that the characters they contain have a left-to-right positional order and that they cannot be changed in place. In fact, strings are a subset of the larger class of objects called sequences.

There are many ways to write strings in Python. This is a valid Python string:

a = ‘string’

But then again, so is this:

a = “string”

You can also use triple quotes:

a = ”’…string…”’
a = “””…string…”””

Around Python strings, single and double quote characters are interchangeable. That is, string literals can be written enclosed in either two single or two double quotes – the two forms work the same and return the same type of object. The reason for supporting both is that it allows you to embed a quote character of the other variety inside a string without escape it with a backslash. You may embed a single quote character in a string enclosed in double quote characters, and vice versa:

>>> ‘string”s’, “string’s”
(‘string”s’, “string’s”)

Incidentally, Python automatically concatenates adjacent string literals in any expression, although it is almost as simple to add a + operator between them to invoke concatenation explicitly:

>>> a = “Some ” ‘arbitrary’ ” string”
>>> a
‘Some arbitrary string’

Note that adding commas between these strings would result in a tuple, not a string. Also notice in all of these outputs that Python prefers to print strings in single quotes, unless they embed one. You can also embed quotes by escaping them with backslashes:

>>> ‘string\’s’, “string\”s”
(“string’s”, ‘string”s’)

External Links:

Strings at docs.python.org

Python Strings at Google for Developers

Python strings tutorial at afterhoursprogramming.com

Python Programming: Part Two (Python IDLE)

Python IDLEIn the previous article, we introduced some basic concepts about Python and how to use Python at the interactive command line. In this article, we will introduce variables, and also consider how to save a code module and run it, both from the command line and the Python IDLE interface.

The previous article contained the following Python statement:
>>>> a = 3+4

This was our first example of a Python variable. As you may have deduced, in Python, unlike C/C++ or some other languages, you don’t have to declare variables separately. You declare a variable when you use it for the first time. There are three distinct numeric types: integers, floating point numbers, and complex numbers. In addition, Booleans are a subtype of integers.

Numbers are created by numeric literals or as the result of built-in functions and operators. Unadorned integer literals yield integers. For example:
>>> x = 10

yields an integer variable x. Numeric literals containing a decimal point or an exponent sign yield floating point numbers. For example:
>>> x = 3.5

yields a floating point variable x.

You can also create a complex number. Appending ‘j’ or ‘J’ to a numeric literal yields an imaginary number to which you can add an integer or float to get a complex number with real and imaginary parts. For example:
>>> x = 10j + 5

creates a complex number with 5 as the real part and 10 as the imaginary part. You can retrieve both parts by typing:
>>> print(x)

or just the real or imaginary parts:
>>> print(x.real)
>>> print(x.imag)

The variables real and imag, however, are read-only variables and cannot be used to change the values of the real and imaginary components. The constructors int(), float() and complex() can be used to produce numbers of a specific type. For example:
>>> x = int(23)

creates an integer with a value of 23. Interestingly, leaving the parameter blank like this:
>>> x = int()

results in the value 0 being assigned to x.

Textual data in Python is handled with str objects, or strings. Strings are immutable sequences of Unicode code points. String literals can be written in several different ways – single quoted, double quoted, or triple quoted:
>>> text = ‘This is a test’
>>> text = “This is a test”
>>> text = ”’This is a test”’
>>> text = “””This is a test”””

Strings are immutable sequences of Unicode code points. Therefore:
>>> text[0] = ‘B’

is not allowed. However, if we want to print a single character from the string, we can do this:
>>> print(text[0])

or, if we want to print multiple characters that represent a subset of the string, we can specify two indices separater by a colon. For example:
>>> print(text[0:3])

will result in ‘This‘ being printed.

So far, we have been typing in programs at the interactive prompt, which has one big disadvantage: programs you type there go away as soon as the Python interpreter executes them. Because the code typed in interactively is never stored in a file, you cannot run it again without retyping it. To save programs permanently, you need to write your code in files, which are usually called modules. Modules are text files containing Python statements. Once a module is coded and saved in a file, you can ask the Python interpreter to execute the module any number of times.

To code a module, open your favorite text editor (e.g. Notepad in Windows, perhaps vi, emacs or gedit in Linux) and type some statements into a new text file called module01.py:

# My first Python script
name = str(input(‘Please enter your name: ‘)) # Read some input
print(‘Welcome to Python, ‘+name) # Print something out

In this module, we introduced three new concepts. The first line of the module is a comment. A hashtag (#) introduces a comment, which is just a description of what the program is doing. Comments can appear on a line by themselves or to the right of a statement. Multiline comments can be added, but must begin and end with three single quote marks, like this:
”’This is an example
 of a multi-line comment that can be inserted into a Python script on the command line or in the Python IDLE interface”’

The other concept we introduced was the input function. input() simply reads characters from the standard input device (in this case, a keyboard). We used the str() constructor to ensure that name is created as a string.

The third concept we introduced was the plus sign (+) to concatenate the literal string ‘Welcome to Python, ‘ and name. As you probably guessed, the program will read in a string from the keyboard representing the user’s name, and will print out a welcome message containing the user’s name.

Once you have saved this text file, you can ask Python to run it by listing its full filename as the first argument to a python command, typed at the system command prompt; e.g.:
> python module01.py

Using the Python IDLE Interface to Run a Module

Python IDLE

Python IDLE interface running our simple script.

Alternatively, from the Python IDLE interface, you can navigate to File -> Open, and then browse to module01.py and open it. A new window should appear with the code for the module in it. Navigate to Run -> Module (or in Windows, just press F5) to run the module, and the output should appear in the main IDLE window, after it prints the “RESTART” banner message:

Please enter your name: Grumbledook
Welcome to Python, Grumbledook

There are other ways we can load a module. We could use the import command, assuming the module is within Python’s path:

>>> import module01

If we make changes to the code, we would have to use the reload command in order for them to take effect:

>>> from imp import reload
>>> reload(module01)

Finally, we could use the exec() and open.read() commands in conjunction to simultaneously load and run the module:

>>> exec(open(‘module01.py’).read())

These three options should work both at the command line and in the Python IDLE interface.

In the next article, we will introduce some additional concepts and also code our first Python function.

External Links:

Wikipedia article on Python IDLE interface

Python IDLE website

Python IDLE wiki

How to Install Python IDLE on Linux

How to Install Python IDLE on eHow