On that which I call Semantic Logic

Today I want to talk about something I call semantic logic. This is something I started thinking about after playing with Haskell some months ago. The name probably makes it sound like some sort of esoteric symbolic logic, but it’s really just a practical programming concept.

Disclaimer: I don’t know much about language design or the intricacies of functional programming models, so I may be retreading some basic ideas. However, I haven’t seen anybody talk about these ideas in this way, so here it goes.

The idea is thus: write code that visually reflects the logical cases that you wish to cover.

Lets say we have the following program:

def process_students():
	students = get_list_of_students()
	for student in students:
	    if student.first_name.startswith('J')
	        process_j_student(student)
	    else:
	        process_student(student)

The only remarkable logical feature of this program is the check for names that start with J. Notice where this check and its downstream code live. They’re nested in about the middle of the program. An unremarkable location for a remarkable feature.

Here’s where semantic logic comes in. Our goal is to make the structure of the program reflect the structure of the data. As noted, the logical feature that we need to address is names that start with J. Here’s one way to refactor:

def process_students():
	students = get_list_of_students()
	j_students = [s for s in students if s.startswith('J')]
	everyone_else = [s for s in students if not s.startswith('J')]

	for s in j_students:
	    process_j_student(s)
	for s in everyone_else:
	    process_student(s)

What we did here was essentially invert the logical hierarchy. Instead of progressively drilling down into cases with if statements, we constructed sets that matched our cases up front, and did the processing at the end.

Lets try something a little more complex to further illustrate the point.

def process_students():
    students = get_list_of_students()
    for student in students:
        if student.gpa < 2.0:
            student.on_probation = True
        else:
            student.on_probation = False
        if student.on_probation and student.credits < 50:
            student.expelled = True
        if student.gpa > 3.0 or student.credits > 150:
            student.early_reg = True

The business logic here is as follows:

Students with a GPA < 2.0 are put on probation.
Those who were previously put on probation who have a GPA >= 2.0 are taken off probation
If a student is put on probation and has completed less than 50 credits, he is expelled.
If a student has a GPA > 3.0 or has completed >= 150 credits, he is allowed to register early.

These rules aren’t difficult to deduce, but the code isn’t terribly explicit about them either. You essentially need to trace the conditions to the outcome. Here’s an alternative:

def process_students():
    students = get_list_of_students()
    
    bad_grades = [s for s in students if s.gpa < 2.0]
    ok_grades = [s for s in students if s.gpa >= 2.0]
    good_grades = [s for s in students if s.gpa > 3.0]
    freshmen = [s for s in students if s.credits < 50]
    seniors = [s for s in students if s.credits >= 150]
    freshmen_with_bad_grades = list(set(freshmen) & set(bad_grades))
    early_reg_eligible = list(set(good_grades) | set(seniors))
    
    # make the changes
    bad_grades = list(map(
        lambda s: setattr(s, 'on_probation', True), bad_grades))
    ok_grades = list(map(
        lambda s: setattr(s, 'on_probation', False), ok_grades))
    freshmen_with_bad_grades = list(map(
        lambda s: setattr(s, 'expelled', True), freshmen_with_bad_grades))
    early_reg_eligible = list(map(
        lambda s: setattr(s, 'early_reg', True), early_reg_eligible))

Personally I like this much better. The relevant sets are built up front, and the functions we want to apply to them are very plain to see. Technically we’re not doing the exact same thing here, since we’re modifying filtered copies of the original students list. Since this is just an example and we don’t have an interface for saving our changes, I’m not sweating the difference.

Surely there are some of you objecting at this point, saying that the first version of this function was shorter and more readable. It certainly was shorter, and if you’re used to tracing control flow, maybe even more readable. I think it must be granted, however, that understanding complex control flow can be an error-prone endeavor. I’m sure we’ve all seen much more convuluted code than the trivial examples I’ve yet produced.

Here’s a third and less contrived example:

def load_modules(self, module_paths):
    package_path = os.path.dirname(__file__) + '/modules'
    module_paths.append(package_path)
    for path in module_paths: 
        for file_name in os.listdir(path):
            if file_name == '__init__.py' or file_name[-3:] != '.py':
                continue
            module_name = file_name[:-3]
            if path == package_path:
                load_package_module(module_name)
            else:
                load_custom_module(path, module_name)

To me, it’s less than easy to tell what’s going on here (and more importantly, what’s supposed to be going on here) just by looking at the code.

We’re attempting to dynamically load two kinds of modules: those included in the package, and those within directories specified by the user. This is evident in the last four lines of the program. This distinction is the most important high-level distinction being made, and it’s burried three-levels deep at the bottom of the program. This is not good for readability.

Lets try to clean this up using the model we’ve developed. Remember our most important distinction: package and custom modules.

def load_modules(self, module_paths):    
    package_path = os.path.dirname(__file__) + '/modules'
    files = os.listdir(package_path)
    package_modules = [f[:-3] for f in files if is_module(f)]    
    for module in package_modules:
        load_package_module(module)

    for path in module_paths:
        files = os.listdir(path)
        modules = [f[:-3] for f in files if is_module(f)]
        for module in modules:
            load_custom_module(path, module)
        
    
def is_module(file_name):
    return (file_name != '__init__.py' and file_name[-3:] == '.py')

This is better. We split the function body up into two parts, one to cover package modules and one to cover custom modules. This structure communicates to the reader that there are two distinct things going on, and only two. It also has the benefit of completely decoupling the two parts. The two parts could be dropped into seperate functions with no further modification.

Hopefully by this point you have some idea of what I’m talking about, even if it’s fuzzy. The point is that semantic logic lets the reader quickly build a mental model of the data.

I’m going to keep working on coming up with more examples and fleshing out the idea more fully. Shoot me an email if you have any thoughts.