Creating our own classes of objects

The two lists of data values that HQ asked us to get—cheese consumption and W75 deaths—form two objects that are very similar. They seem to be two instances of the same class of things.

In this case, the class of things seems to have annual statistics. They're collections with a consistent structure of a year and a measurement. Both these annual statistics objects have a common set of operations. Indeed, the operations are pretty tightly bound to the measurement, and they are not at all bound to the year number.

Our collection of statistical functions is not very tightly bound to our data at all.

We can improve the binding between data structure and processing through a class definition. If we define the general features of a class of objects that we can call annual statistics, we can create two instances of this class and use the defined methods on the unique data of each instance. We can easily reuse our method functions by having multiple objects of the same class.

A class definition in Python is a collection of method functions. Each method function definition has an additional parameter variable, usually named self, which must be the first parameter to each function. The self variable is how we can access the attribute values that are unique to each instance of the class of objects.

Here's how we might define a class for the simple statistics HQ is asking us to get:

from collections import Counter
class AnnualStats:
    def __init__(self, year_measure):
        self.year_measure = list(year_measure)
        self.data = list(v for yr, v in self.year_measure)
        self.counter= Counter(self.data)
    def __repr__(self):
        return repr(self.year_measure)
    def min_year(self):
        return min( yr for yr, v in self.year_measure )
    def max_year(self):
        return max( yr for yr, v in self.year_measure )
    def mean(self):
        return sum(self.data)/len(self.data)
    def median(self):
        mid, odd = divmod( len(self.data), 2 )
        if odd:
            return sorted(self.data)[mid]
        else:
            pair= sorted(self.data)[mid-1:mid+1]
            return sum(pair)/2
    def mode(self):
        value, count = self.counter.most_common1)[0]
        return value

The class statement provides a name for our definition. Within the indented body of the class statement, we provide def statements for each method function within this class. Each def statement contains the instance variable, self.

We've defined two methods with special names, as shown in the following list. These names have double underscores, they're fixed by Python, and we must use exactly these names in order to have objects initialized or printed properly:

  • The __init__() method is used implicitly to initialize the instance when it's created. We'll show an example of instance creation in the following section. When we create an AnnualStats object, three internal attributes are created, as shown in the following list:
    • The self.year_measure instance variable contains the data provided as an argument value
    • The self.data instance variable contains just the data values extracted from the year-data two-tuples
    • The self.counter instance variable contains a Counter object built from the data values
  • The __repr__() method is used implicitly when we attempt to print the object. We returned the representation of the internal self.year_measure instance variable as the representation for the instance as a whole.

The other method functions look similar to the standalone function definitions shown previously. Each of these method functions depend on having the instance variables properly initialized by the __init__() method. These names are entirely part of our software design; we can call them anything that's syntactically legal and meaningful.

Using a class definition

Here's how we can use our AnnualStats class definition:

   from ch_5_ex_1 import get_deaths, get_cheese

deaths = AnnualStats( get_deaths() )
cheese = AnnualStats( get_cheese() )

print("Year Range", deaths.min_year(), deaths.max_year())
print("Average W75 Deaths", deaths.mean())

print("Median Cheese Consumption", cheese.median())
print("Mean Cheese Consumption", cheese.mean())

print(deaths )

We built two instances of the AnnualStats class. The deaths object is an AnnualStats object built from the year-death set of data. Similarly, the cheese object is an AnnualStats object built from the cheese consumption set of data.

In both cases, the AnnualStats.__init__() method is evaluated with the given argument value. When we evaluate AnnualStats( get_deaths() ), the result of get_deaths() is provided to AnnualStats.__init__() as the value of the year_measure parameter. The statements of the __init__() method will then set the values of the three instance variables.

When we evaluate deaths.min_year(), this will evaluate the AnnualStats.min_year() method function. The self variable will be deaths. This means that self.year_measure denotes the object originally created by get_deaths().

When we evaluate deaths.mean(), this will evaluate the AnnualStats.min_year() method function with the self variable referring to deaths. This means deaths.data is the sorted sequence we derived from the object originally created by get_deaths().

Each instance (deaths, cheese) refers to the instance variables created by the __init__() method. A class encapsulates the processing of the method functions with the various instance variables. The encapsulation idea can help us design software that is more tightly focused and less likely to have confusing bugs or inconsistencies.