Black Thursday

Turing School of Software Design: Module 1, Weeks 5 & 6

Black Thursday is the final project of Module 1 at the Turing School of Software Design is to build a system able to load, parse, search and execute business intelligence queries against data from a typical e-commerce business. This was a paired project and my partner and my code can be found on her GitHub repository. The project learning goals are to:

  • Use tests to drive both the design and implementation of code
  • Decompose a large application into components
  • Use test fixtures instead of actual data when testing
  • Connect related objects together through references
  • Learn an agile approach to building software

Technical Approaches and Solutions

pushing responsibility down

The early iterations of this project require five classes that have hierarchical relationships with each other. At the top of the pyramid is a SalesEngine that is instantiated with a hash of CSV file paths. The SalesEngine in turn instantiates child repositories. For iteration 0 we worked with a MerchantRepository and an ItemRepository. Each repository class has children of their own; in this case Merchant and Item children. A SalesEngine has two repositories. Repositories can have hundreds or thousands of their own respective Merchant and Item children. Each row of the CSV is converted into a single Merchant or Item and stored in its respective parent repository.

The lesson from this early stage is that it is often beneficial to push responsibility down to the most fundamental building blocks of one's code. To illustrate this idea, let us examine the Item object:

require 'bigdecimal'

class Item
  attr_reader :id,
              :name,
              :description,
              :case_insensitive_description,
              :unit_price,
              :created_at,
              :updated_at,
              :merchant_id,
              :parent,
              :unit_price_to_dollars

  def initialize(data, parent)
    @id = data[:id].to_i
    @name = data[:name]
    @description = data[:description]
    @downcase_description = description.downcase
    @unit_price = BigDecimal.new(data[:unit_price])/100
    @created_at = Time.parse(data[:created_at])
    @updated_at = Time.parse(data[:updated_at])
    @merchant_id = data[:merchant_id].to_i
    @parent = parent
    @unit_price_to_dollars = unit_price.to_f
  end
end

To create an item we pass in some data and a parent repository (which we'll get to in the next section). CSV data comes into the program as a string. Strings are fine for item attributes like descriptions or names, but unhelpful for unit price. When we get into business intelligence later in the project we will want to use mathematics, which requires numbers, not strings. By performing data validation and sanitization at the point in which the Item itself is created, we write clearer methods later on. 

Imagine that we need to find the average price of all items in the ItemRepository. If the price of each Item is stored as a string, our averaging method would have to loop through each item and first convert it to a number and then perform the appropriate arithmetic. By creating Items that contain the types of data that will be most useful to us, our actual averaging method does not have to concern itself with data conversion. Its sole responsibility is averaging numbers.

Incidentally, this was my first experience with BigDecimal, which is an extension library for Ruby that allows code to work reliably with numbers that have any number of significant digits. I found this tutorial really useful when I first started working with BigDecimal.

items have parents

To start answering useful business intelligence questions, like how many items does a given merchant sell, we have to have a way for child Merchants and Items to talk to each other. One approach would be to have Merchants talk directly to Items, but a far superior solution is to employ the concept of Hide Delegate, from Jay Fields' book Refactoring: Ruby Edition. Preventing Items from talking directly to Merchants in our program is beneficial because we should strive for object encapsulation - the idea that an object has access to data on a need to know basis. The benefit of encapsulating objects is that if something changes in the code, fewer objects will be impacted. If we decide that an Item suddenly has a new attribute, like a lowercase version of the description, I may also need to additionally make changes in the Merchant class if the Merchant is expecting the Item to respond to a certain call to its description. However, if I force my Merchants and Items to communicate with each other indirectly through their parent repositories and the SalesEngine, I am free to make changes in either class without impacting the other.

The vertical lines represent encapsulated responsibilities.

The vertical lines represent encapsulated responsibilities.

Encapsulating responsibility in the code means that every child Merchant and Item must be instantiated with access to its parent. Item Repositories create and store Items. How can an Item Repository create and store Items that are aware of the Item Repository itself? The answer is the Class method self.

class ItemRepository
  include Repository

  attr_reader :items,
              :contents,
              :parent,
              :repository

  def initialize(contents, parent)
    @repository = contents.map { |row| Item.new(row, self) }
    @parent = parent
  end
end

When an ItemRepository is created, it takes the contents of a CSV and loops through each row. The data from each row is used to instantiate a new Item object. The Item object additionally takes in a parent as its second initialize argument. The parent of an Item is an ItemRepository. To pass itself to its child Item, the ItemRepository passes self to the new Item. Self is actually a pointer back to the ItemRepository. Self is something I wish I had understood when writing CompleteMe

Refactoring Patterns

Thanks to a lot of hard work and a great partner, Jasmin, we were able to finish all of the required functionality for the project with a about a day to refactor the code. One of the refactoring opportunities that we found was in our MerchantAnalyst class, which is responsible for a variety of business intelligence operations. MerchantAnalyst was a beast of a method that we created to encapsulate all of the methods that had one or more merchants as their return value. The responsibilities of MerchantAnalyst grew to include any method that was designed to tell us interesting information about a Merchant. Needless to say, the method needed some help.

One of the business intelligence operations contained within MerchantAnalyst was the ability to query all merchants and find their best item(s) in terms of quantity sold as well as total revenue generated. We built about eight different methods to tackle a small portion of these larger problems. We noticed that each of these methods took in the Merchant's merchant_id did a piece of the required process and then handed the merchant_id to the next method. Our Object Oriented Programming alarms went off and we realized that we could collect all of these methods into a new class, or extracting a class to use the terminology from Refactoring: Ruby Edition.

The new class, which we called the MerchantItemAnalyst, not only contained the methods we had already tested and written, but also preserved state - the merchant_id. By storing the merchant_id as an instance variable in MerchantItem Analyst, we were able to eliminate all of the ugly method calls that moved merchant_id from operation to operation. The result of this right-out-of-the-book refactoring opportunity I've included below.

class MerchantItemAnalyst
  attr_reader :merchant_id,
              :analyst

  def initialize(merchant_id, analyst)
    @merchant_id = merchant_id
    @analyst = analyst
  end

  def merchant_paid_in_full_invoices
    analyst.merchants.find_by_id(merchant_id).invoices.find_all do |invoice|
      invoice.is_paid_in_full?
    end
  end

  def merchant_paid_in_full_invoice_items
    merchant_paid_in_full_invoices.map do |invoice|
      analyst.invoice_items.find_all_by_invoice_id(invoice.id)
    end.flatten
  end

  def group_invoice_items_by_quantity
    merchant_paid_in_full_invoice_items.group_by do |invoice_item|
      invoice_item.quantity
    end
  end

  def group_invoice_items_by_revenue
    merchant_paid_in_full_invoice_items.group_by do |invoice_item|
      invoice_item.price
    end
  end

  def max_quantity_invoice_items
    invoice_items = group_invoice_items_by_quantity
    invoice_items[invoice_items.keys.max]
  end

  def max_revenue_invoice_items
    invoice_items = group_invoice_items_by_revenue
    invoice_items[invoice_items.keys.max]
  end

  def most_sold_item_for_merchant
    max_quantity_invoice_items.map do |invoice_item|
      analyst.items.find_by_id(invoice_item.item_id)
    end
  end

  def best_item_for_merchant
    max_revenue_invoice_items.map do |invoice_item|
      analyst.items.find_by_id(invoice_item.item_id)
    end.first
  end

end

The VAGARIES of Specifications

The most challenging part of this project was the intentional vagueness of the specifications. This was meant to mimic the reality of most projects in the real world. As an example, the specification requires that we have a method that calculates total revenue. When should an invoice be counted as revenue, though? Does revenue mean that the transaction was successful? What if the item was returned later on, should it still count as revenue? What if the invoice is still pending? 

The project came with its own spec harness test suite that the instructors used to judge whether or not our project met the basic requirements. The only way to get clarity on the specifications was to actually look at the instructor-created test files and make guesses as to how they were getting to the answers they were requiring. This was a huge frustration, but definitely illustrated a valuable point.

ConcluSion

Black Thursday felt like a rite of passage. It is a project that predates Turing and serves as the gateway between Module 1 and Module 2. For future Turing students and Black Thursday attempters, the technical approaches and solutions that I discussed above are what I consider to be the keys to successfully completing this behemoth.