CompleteMe

Turing School of Software Design: Module 1, Week 2

Week two of my Turing School Module 1 experience is in the books and I am feeling great. This week we finished a project called CompleteMe and had classes covering topics ranging from Agile Development to Test Driven Design to Ruby's Object Model. Just like my previous post, I want to give readers additional information from a student's perspective about Turing specifically and software bootcamps more generally. I also want to reflect on my own biggest takeaways from the week.

There were three big stand out understandings that came from class time this week for me. First, founder Jeff Casimir, shared one of his favorite lessons - Ruby's Object Model, which I'll discuss below. Second, we looked at Test Driven Development (TDD), which is a methodology that involves writing tests for code before writing code. Third, we talked about Agile Development, which is a big deal in the software development world. Adherents take a cyclic approach to building software that involves rapid iteration and getting feedback from clients early and often. Agile Development on the micro scale is Test Driven Development in a lot of ways. On the macro scale Agile is how Turing is attempting to grow and achieve its mission

Test Driven Development

TDD separates design and implementation in a way that theoretically prioritizes the former. A TDD practitioner asks, 'before I build this thing, what with this thing's form and function be and how will I know that I have been successful when it is built?' It's remarkably similar to what the education world calls backwards planning. When a teacher engages in backwards planning, they write the assessment students will have to complete at the end of a course of study before creating the course. In both TDD and backwards planning, design decisions are made up front so that implementation does not incentivize the creation of an easy to build but poorly designed course or product. This is the same reason architects and general contractors are typically two separate people. General contractors have an incentive to build things as inexpensively as possible, just like coders have an incentive to get code to work as quickly as possible. Architects are incentivized to design buildings that will bring utility to their clients. TDD practitioners take on the architect role thereby prioritizing functionality before engaging with questions of implementation. 'It should do this thing' should not yield to 'How do I make it do that thing?'

I found it particularly noteworthy that one of my classmates, a former student of astrophysics, took to TDD so religiously (I overheard him say that his test file was 400 lines, compared to the actual project file, which was probably fewer than 100 lines of code). TDD is basically the scientific method applied to programming. We ask questions, make observations and then refine the questions we ask based on those observations. I love science.

Ruby's Object Model

Above is Turing Founder, Jeff Casimir's whiteboard notes from his lesson on Ruby's Object Model. My notebook's version is pictured at the top of this post.

Above is Turing Founder, Jeff Casimir's whiteboard notes from his lesson on Ruby's Object Model. My notebook's version is pictured at the top of this post.

Through a combination of lecture and code along we walked down Ruby's object hierarchy with a hypothetical class, Ice Cream. The idea as I currently understand it, is that Object Oriented Programming in Ruby represents ideas with nouns, verbs, and adjectives - just like a regular spoken language. In programs I create with Ruby, I am able to define nouns, like in our classroom example of an object class called 'Ice Cream.' The Class Ice Cream is a template or cookie cutter that explains what things Ice Cream can do (Ice Cream's verbs, or in Ruby speak, methods) and what attributes Ice Cream possesses. For example, Ice Cream can melt. I can instantiate an instance of Ice Cream, called ic1, which takes on all the defined abilities and characteristics of my Ice Cream Class.

class IceCream
  def melt
    "I am melting, better eat faster."
  end
end

ic1 = IceCream.new

ic1.melt

=> "I am melting, better eat faster."

Where things in Ruby's Object Model get especially heady is that my ic1 instance inherits characteristics from other objects that sit above my IceCream class. At the top of the object hierarchy sits a Class poetically named Object, whose methods are available to my ic1 instance of IceCream (I'm told this is a case of information on a need to know basis and right now this is all I need to know). In Ruby the Class Object has a .class method which returns the object's class name. Since ic1 is an instance of an IceCream class that inherits methods from the Object class, ic1 does not exist in a permanent state of existential uncertainty. Instead, we can simply ask ic1 what type of object it is by calling the .class method and it will tell us that it is in fact IceCream.

ic1.class

=> IceCream

Project: CompleteMe

Our second project of Module 1, called CompleteMe, was a rudimentary autocomplete program that first loads in a dictionary of words and then can take in a prefix and return words from the dictionary that begin with the prefix. The project gets a little spicier with the notion of weighted suggestions whereby the program needs to also store the user's preferences regarding a given prefix so that next time that prefix is given, the user's past selected words are prioritized over other words. The full spec can be found here.

My key takeaways from the project were starting to wrap my mind (this is literal in this case) around the idea of recursion, how different variable scopes can cause huge headaches, and how TDD can really improve code design.

I got great feedback about my work from Turing Instructor, Michael Dao, who thought my code in general was well tested. Mike made several recommendations including:

  1. Test for more edge cases. For example, instead of testing to see if my code could suggest the word "pizza" from the prefix "piz" in a one word dictionary, add additional words like "egg" and "pi" to make sure the suggest method is really doing what it needs to do. In this case, "pi" is an edge case because it would be easy to mistakenly return "pi" from a "piz" prefix and that is incorrect behavior we would want to catch through testing.
  2. Employ predicate methods with human readable names. Predicate methods in Ruby are methods that return true or false. For example, in my original code I was using multi-clause conditional statements to figure out it I was at the end of a word suggested by a prefix. Instead, my refactored code uses predicate methods that encapsulate the multi-clause conditionals into a more human readable condition (see below). 
  3. Organize the overall architecture of the code to include three classes: Nodes, Tries, and CompleteMes. In my case I turned in code with Nodes and a CompleteMe, but the big idea is to think about what these actual objects should be able to do. So a CompleteMe can suggest words, select words, and order words by weight, but a Trie can insert words, delete words (well my trie can't). Splitting that functionality into two separate classes makes the code more organized.
#Predicate Method Example

if a && b && c

#becomes

def my_conditions_are_true?(a, b, c)
  a && b && c
end

if my_conditions_are_true?(a, b, c)

GEtting technical

The first challenge of project CompleteMe is to figure out how to store a dictionary. In keeping with Turing's preference for just in time instruction and just too late instruction, our instructor Michael Dao delivered a lesson on the Trie data structure. In the world of project-based learning, which I know intimately from my days as an assistant principal at the NYC iSchool, the idea of just in time instruction is to provide learning at the moment closest to when it will be applied by the student.

A Trie is a data structure that organizes information on a series of branching nodes such that each node is keyed to its values. Meaning is inferred by traversing the tree. For example, a Trie that stores the words "pi", "pizza", and "pizzeria" can do so by using nodes to represent the letters "p", "i", "z", "z", "a", "e", "r", "i", and "a". That is a savings of 6 letters - "p" and "i" twice for "pizza" and "pizzeria", and "z", "z" once for "pizzeria"! By traversing down the tree, the structure builds words from individual nodes. To access the word "pizza" we move down the "p", "i", nodes shared by all three words in our dictionary, and the "z" shared by "pizza" and "pizzeria" and then down the "a" branch unique to "pizza". Tries are similar to linked lists and binary trees, but dissimilar because each node in a Trie can have many child nodes.  

To build a Trie and to traverse it in order to make meaning from the arrangement of nodes requires a little trick called recursion. Recursion in Computer Science is when a function calls itself on its own results. I think about it like folding a piece of paper where folding is the verb (or method in Ruby) and the paper is the input and the output that the method fold is called on. In the physical world paper can only be folded so many times, but in the virtual world of code a method can be called on its own results indefinitely.

def fold(paper)
  paper = "paper is now folded in half"
  fold(paper)
end

The Code

I want to share what I think are the most interesting parts of the code here. My entire project can be found on my GitHub page

class Node

  attr_accessor :flag,
                :children,
                :weight

  def initialize
    @flag = false
    @children = {}
    @weight = {}
  end

  def has_children?
    !@children.empty?
  end

  def does_not_have_children?
    @children.empty?
  end

end

In the above code snippet I define the Node Class. A node begins with three attributes. First, a node has a flag to signify if it is the final node of a word. By default a node's flag is is initialized to false. A node also has a hash of child nodes, which, of course, is empty if the node has no children. Finally, a node has weight, which is a hash of prefix keys and weight values to signify the preference a given word should be awarded. I think about it as if certain branches of the Trie are heavier than others. In my imagining of a node, they can also tell if they have children or not with has_children? and does_not_have_children? methods.

def insert(word)
  node = @root
  word.each_char.map do |letter|
    if !node.children.has_key?(letter)
      node.children[letter] = Node.new
    end
    node = node.children[letter]
  end
  node.flag = true
  return
end

The insert method above is part of the Trie Class. A Trie initially consists of one root node, which has no key because it actually does not represent a letter. It only represents the start of the entire trie. Every time the Trie is told to insert a word, the Trie traverses down itself from its root and checks for each letter in the given word. If the letter is found to exist as a child node, the traversal moves down a level in the Trie. If not, a new node is instantiated and keyed to the given letter. When the final letter's node is located or instantiated, the node's flag is set to true to signify the end of a newly inserted word.

def count(node=@root)
  word_count = 0
  word_count += 1 if node.flag
  if node.has_children?
    node.children.keys.each do |letter|
      child_node = node.children[letter]
      word_count += count(child_node)
    end
  end
  return word_count
end

Like the insert method, the count method is also part of the Trie Class. A Trie can count all of its words by finding all of the nodes with flags set to the boolean true value. The count function is one of a few places in this project where I had to employ recursion. Count takes a node as an input and will count all of the words that are built from the children of that input node. If no node is input, the Count method will take the root node of the Trie and begin its count from there. Everytime count finds a node with a flag set to true, it increments the word_count variable by one. Next, it checks to see if the node has children nodes. If there are no child nodes, the count method has reached the end of the trie and returns the word_count variable. However, if the node has children, the Count method iteratively calls itself on each child node.

This was by far the hardest part of the program to conceptualize for myself. In fact, one of my classmates helped me troubleshoot my count method after it had me stuck for several hours. Now that I've had to use that recursive technique in a few other places, it makes a lot more sense. For readers interested in learning recursion, the trick for me was just write multiple methods that are recursive. Eventually the concept became clear.

def suggest(prefix)
  suggestions = []
  node = @trie.root
  prefix.each_char do |letter|
    return nil if !node.children.has_key?(letter)
    node = node.children[letter]
  end
  find_all_the_words(node, prefix, suggestions)
  suggestions = order_suggestions_by_weight(suggestions, prefix)
  return suggestions
end

The suggest method is part of the CompleteMe class. It takes in a prefix, like "piz". Next it looks at the root of the Trie to see if any of the children nodes are keyed to the first letter of the prefix, such as "p". If it finds an appropriately keyed child node, it moves down the trie node = node.children[lettter]. When the end of the prefix is found, the suggest method calls the find_all_the_words method on the node it traversed down to. Again, recursion is the magic that makes finda_all_the_words return an array of all the words that lie below the starting node. Lastly, the suggestions are organized by weight in the order_suggestions_by_weight method. What is interesting about this snippet of code is how I utilize the idea of scope to hold on to and update variables as I traverse down the Trie. Scope is basically the code's field of vision; the data it knows about and can access. Since I am still beginning my Ruby journey, I often mistakenly believe that my code does or does not have access to particular data. This was a section of my code where that shaky understanding really slowed my progress.

def find_all_the_words(node, prefix, suggestions)
  suggestions << prefix if node.flag
  if node.has_children?
    node.children.keys.each do |letter|
      node_prefix = prefix
      node_prefix += letter
      child_node = node.children[letter]
      find_all_the_words(child_node, node_prefix, suggestions)
    end
  end
  return suggestions
end

Notice how find_all_the_words creates a node_prefix and child_node variables. This is to help me keep track of these values as the method traversed down the Trie and avoid issues of scope conflict.

def test_it_can_find_weight
  c = CompleteMe.new
  t = c.trie
  words = ["pi", "pize", "pizz", "pizza", "pizzeria", "pizzicato"].join("\n")
  c.populate(words)
  expected1 = 1
  expected2 = 2
  expected3 = 4
  expected4 = 1

  c.select("piz", "pizzeria")
  actual1 = c.find_weight("piz", "pizzeria")
  c.select("piz", "pizzeria")
  actual2 = c.find_weight("piz", "pizzeria")
  c.select("piz", "pizzeria")
  c.select("piz", "pizzeria")
  c.select("piz", "pizza")
  actual3 = c.find_weight("piz", "pizzeria")
  actual4 = c.find_weight("piz", "pizza")

  assert_equal expected1, actual1
  assert_equal expected2, actual2
  assert_equal expected3, actual3
  assert_equal expected4, actual4
end

This test method is really when TDD started to click. Like trying to learn any new technique, TDD initially significantly slowed my progress. However, by the time I started to consider how to weight certain words given particular prefixes I actually think TDD if not accelerated my work certainly improved the final product. I wrote the above code prior to writing the find_weight method. I played the part of the architect and imagined in my ideal world what the find_weight method should return. I gave no consideration to implementation or my poor future general contractor self. I imagined a method that would return an integer based on the number of times a given word was selected off of a particular prefix. I designed cases to test whether or not my as-yet-unwritten find_weight method would return the weights I expected given a particular set of selections. For example, "pizzeria" one time from the prefix "piz", I would expect find_weight to return a weight of 1 on the "piz" prefix "pizzeria" suggestion pair. Once my tests were designed my implementation had direction. More importantly, my implementation had a vision of a future state where a find_weight method could return a simple integer corresponding to a prefix-suggestion pair's weight.

Conclusions

There was definitely a point on Wednesday when I expected that I would have to pull an all-nighter on Thursday to meet the Friday deadline. Thankfully that was not the case, but the feeling of self-doubt and dejection was very real and made the completion of the project that much sweeter. At Turing though, there is little time to celebrate as my next project has already begun.