Python Sets: Part Two

Python setsIn part one of this series, we took a first look at sets, a relatively new core data type in Python. In this article, we will continue cover set methods, as well as a new literal format for sets.

In addition to expressions, the set object provides methods that correspond to the set operations. For example, the set add method inserts one item; update provides an in-place union, and remove deletes an item by value. Run a dir call on any set instance or the set type name to see all the available methods. Assuming we have mySet and otherSet from the previous article:

>>> mySet = set(‘hello’)
>>> otherSet = set([‘a’,’b’,’c’,’d’,’e’])

then we can perform these operations:

>>> si = mySet.intersection(otherSet)
>>> si
{‘e’}
si.add(‘MORE’)
si
{‘MORE’, ‘e’}
>>> si.update(set([‘X’,’Y’]))
>>> si
{‘X’, ‘MORE’, ‘e’, ‘Y’}
>>> si.remove(‘e’)
si
{‘X’, ‘MORE’, ‘Y’ }

As iterable containers, sets can also be used in operations such as len, for loops, and list comprehensions. Because they are unordered, however, they do not support sequence operations like indexing and slicing.

>>> for item in set('aeiou'):
        print(item * 4)

uuuu
iiii
oooo
aaaa
eeee

Although these set expressions generally require two sets, they often work with any iterable types, such as lists, tuples, and ranges. For example:

mySet.union([‘a’,’b’,’c’])
{‘b’, ‘l’, ‘c’, ‘o’, ‘a’, ‘e’, ‘h’}
mySet.intersection((‘e’,’l’,’p’))
{‘e’, ‘l’}

Using Curly Braces

Python 3.0 adds a new set literal form, using the curly braces formerly reserved for dictionaries. In 3.0 and above, the following are equivalent:

set([‘a’,’b’,’c’])
{‘a’, ‘b’, ‘c’}

Because sets are unordered, unique, and immutable, a set’s items behave much like a dictionary’s keys, and therefore the new syntax makes sense. Regardless of how a set is made, 3.0 and above displays sets using the new literal format. The set built-in is still required in 3.0 to create empty sets and to build sets from existing iterable objects, but the new literal format is convenient for initializing sets of known structure:

>>> {‘a’,’b’,’c’}
{‘c’, ‘b’, ‘a’}

>>> S = {‘a’,’e’,’i’,’o’,’u’}
>>> S.add(‘MORE’)
>>> S
{‘MORE’, ‘i’, ‘o’, ‘a’, ‘e’, ‘u’}

All the set processing operations discussed in the previous article work the same in 3.0 and above, but the result sets print differently:

>>> S1 = {‘a’,’b’,’c’}
>>> S1 & {‘a’,’c’}
{‘c’,’a’}
>>> {‘a’,’e’,’i’,’o’,’u’} | S1
{‘b’, ‘i’, ‘o’, ‘c’, ‘a’, ‘e’, ‘u’}
>>> S1 – {‘a’,’b’}
{‘c’}
>>> S1 > {‘a’,’c’}
True

Note that {} is still a dictionary in Python. Empty sets must be created with the set built-in, and print the same way:

>>> S1 – {‘a’,’b’,’c’}
set()
>>> type({})
<class ‘dict’>

>>> mySet = set()
>>> mySet.add(‘a’)
>>> mySet
{‘a’}

As in Python 2.6, sets created with 3.0 literals support the same methods, some of which allow general iterable operands that expressions do not:

>>> {‘a’,’b’,’c’} | {‘c’,’d’}
{‘c’, ‘b’, ‘a’, ‘d’}
>>> {‘a’,’b’,’c’} | [‘c’,’d’]
Traceback (most recent call last):
File “<pyshell#34>”, line 1, in
{‘a’,’b’,’c’} | [‘c’,’d’]
TypeError: unsupported operand type(s) for |: ‘set’ and ‘list’

>>> {‘a’,’b’,’c’}.union([‘c’,’d’])
{‘c’, ‘b’, ‘a’, ‘d’}
>>> {‘a’,’b’,’c’}.union([‘c’,’d’])
{‘c’, ‘b’, ‘a’, ‘d’}
>>> {‘a’,’b’,’c’}.union(set([‘c’,’d’]))
{‘c’, ‘b’, ‘a’, ‘d’}
>>> {‘a’,’b’,’c’}.intersection((‘c’,’e’,’f’))
{‘c’}
>>> {2,3,4}.issubset(range(-5,5))
True
>>>

External Links:

Set, frozenset at docs.python.org – Official documentation on sets and frozensets for Python 3.4

Classes and Inheritance: Part One

classesIn the previous article, we introduced some hashing functions, but had to pass the number of buckets (numBuckets) to each function, which was not an ideal solution. In this article, we introduce a programming construct that will provide a better way of working with these hashing functions: classes.

Classes Defined

A class is an extensible program-code-template for creating objects, providing initial values for member variables and implementations of functions. When an object is created by a constructor of the class, the resulting object is called an instance of the class, and the member variables specific to the object are called instance variables (unlike class variables, which are variables that are shared across the class).

Classes are essentially combinations of variables and functions. We use the word “encapsulation” to refer to this combination. The variables either belong to the class or specific instances of the class. The functions are subroutines with the ability to operate on objects or classes. Many kinds of functions exist. In most programming language, there are constructors (functions automatically called when an instance of the class is created), destructors (functions automatically called when an instance of the class is deleted or otherwise destroyed), and conversion operators (functions that define how certain operators, e.g. the equality operator, work with instances of the class).

Classes: Implementation in Python

In Python, the simplest form of a class definition looks like this:

class HashObject:
	
	...
	

Class definition must precede any instantiation of the class to have any effect. The statements inside a class definition will usually be function definitions, but other statements are allowed. The function definitions inside a class normally have a peculiar form of argument list, dictated by the calling convention for functions.

Class objects support attribute references and instantiation. Attribute reference use the standard syntax used for all attribute references in Python: obj.name. If the class definition looked like this:

class HashObject:
	i = 123
	def myfunc(self)
		return 'A simple function definition'

then HashObject.i and HashObject.myfunc are valid attribute references. Class attribute can be assigned, so you can change the value of HashObject.i by assignment.

Class instantiation uses function notation; just pretend that the class object is a parameterless function that returns a new instance of the class. For example,

h = HashObject()

creates a new instance of the class HashObject and assigns this object to the local variable h.

The instantiation operation creates an empty object. Many classes like to create objects with instances customized to a specific initial state. Therefore, a class may define a special method, called a constructor. Here is an example:

def __init__(self):
	self.data = []

When a class defines an __init__() method, class instantiation automatically invokes __init__() for the newly-created class instance. In the case of HashObject, a new, initialized instance can be obtained by:

h = HashObject()

The __init__() method may have arguments for greater flexibility. In that case, arguments given to the class instantiation operator are passed on to init. For example:

class HashObject:
   def __init__(self, num):
      self.numBuckets = num

>>> h = HashObject(51)
>>> x.numBuckets
51

With instance objects, there are two kinds of valid attribute names, data attributes and methods. Data attributes correspond to data members in C++. Like local variables, they do not need to be declared and spring into existence when they first have a value assigned to them. For example, if we created an instance of HashObject called h, we can create and print out the value of x like so:
>>> h.x = 123
>>> print(‘h.x = ‘,h.x)
123
The other kind of instance attribute is a method. A method is a function that belongs to an object. Valid method names of an instance object depend on its class. By definition, all attributes of a class that are function objects define corresponding methods of its instances.

Usually, a method is called right after it is bound. Suppose we add a function definition to HashObject:

class HashObject:
    def test(self):
        print('Hello, world!')

We can call the method in object h like so:
>>> h.test()
Hello, world!

Notice that h.test() was called without an argument, even though the function definition for test() specifies an argument. This is where Python deviates from C++. In this example, the call h.test() is exactly equivalent to HashObject.test(h). In general calling a method with a list of n arguments is the same as calling the corresponding function with an argument list that is created by inserting the method’s object before the first argument.

Python classes also supports inheritance. The syntax for a derived class definition looks like this:

class DerivedClass(BaseClass):
	
	...
	

When the class object is constructed, the base class is remembered. If a requested attribute is not found in the class, the search proceeds to look in the base class. This rule is applied recursively if the base class itself is derived from some other class.

If we create an instance of DerivedClass, method references are resolved as follows: the corresponding class attribute is searched, descending down the chain of base classes if necessary, and the method reference is valid if this process yields a function object. Derived classes may also override methods of their base classes. Because methods have no special privileges when calling other methods of the same object, a method of a base class that calls another method defined in the same base class may end up calling a method of a derived class that overrides it. This is similar to virtual functions in C++, where a function is left undefined in the base class but defined in one or more derived classes.

Python also supports a form of multiple inheritance as well. A class definition with multiple base classes looks like this:

class DerivedClass(BaseClass1, BaseClass2, BaseClass3)
	
	...
	

The search for attributes is depth-first, left to right, so if an attribute is not found in the derived class, it is searched for in BaseClass1, then the base classes of BaseClass1, then BaseClass2, and so on. It should be mentioned that the multiple inheritance method resolution in Python uses a dynamic algorithm to linearize the search order, both preserving the search order, and ensuring that each parent is called only once.

One significant way in which Python classes differ from C++ classes is that there is no means of declaring functions or variables to be private or protected. Thus, there are no private variables in Python. The closest you’ll get are variables that start with a double underscore, like __variable. Such a variable cannot be directly referenced outside the class definition, at least not directly. Thus, if we have something like this:

class HashObject:
	def __init__(self, num):
		__numBuckets = num

and we create an instance of HashObject:

>>> h = HashObject(47)

We can’t access __numBuckets directly:

>>> print(h.__numBuckets)

will result in an error message.

Now that we have a solid foundation for understanding Python classes, we should be able to rewrite our hash functions from the previous article as a class. We will do that in the next article.

External Links:

Classes Tutorial on python.org

Class (computer programming) on Wikipedia