Pulling Spreadsheet Data: No Scripts Required

Once a person gets going with spreadsheets a use case that tends to pop up pretty quickly is the need to combine data from multiple spreadsheets for some kind of analysis. In this post I will share how to combine the arrayformula and importrange functions in Google Spreadsheets to make data from multiple spreadsheets more accessible.

Lets imagine that I have two separate spreadsheets for recording evaluations for two separate classes of students in my school. Child Spreadsheet 1 contains a list of six students, the class name, "Michigan", and each student's evaluation (E, S, N, U scale). Child Spreadsheet 2 is the same, but contains the six students in class "Arizona". The question is how can I see both sets of class lists in one place, my Master Spreadsheet?

Importrange

The importrange function is like a wormhole connecting two points in space. With importrange the two points in space are two Google Spreadsheets and the connection is a one-way data pull. 

The importrange function expects two arguments, spreadsheet_key and range_string. 

A spreadsheet key is the set of alphanumeric characters that can be found in the URL of every Google Document or Spreadsheet. In our example, I want to use the spreadsheet key of the spreadsheet from which I intend to import data. I have highlighted the spreadsheet key within URL of the Child Spreadsheet 1:

https://docs.google.com/spreadsheets/d/1vIcMh0SRtwXgYRsmCfiPYo-H4_wpA8IGRZ0RVbOsHDg/edit#gid=0

The range string is the section of the spreadsheet from which to import. The format of the range string is not the standard string notation that one would use when typing out a formula in the formula bar. With importrange, sheet names with spaces do not have to be surrounded by single quotes. Instead, the format is the sheet name, an exclamation point, and the range all surrounded by quotation marks. To start with I'll just select "Sheet1!A1" as the range I am interested in importing.

First, Google will require that I allow my master spreadsheet to access the child spreadsheet, which I can do by cursoring over the error message and then clicking "Allow access". The "Allow access" button along with the strange spreadsheet_key argument are two reasons that I start linking spreadsheets simply by pulling a single cell. Once one cell works, I know that I have the spreadsheet key and permissions set correctly and then I can move on to actually grabbing the data I'm interested in. Some folks will disagree with this approach, opting instead to start with the desired data range, but keeping it simple and building complexity out in steps is generally how I continue to approach spreadsheet work.

Next, I update the range I am pulling from. I will not have to allow access again since I've already connected the sheets. The new formula with updated range should read:

It is important to point out that in this case I am using a bottomless range, Sheet1!A2:C, which means that I am going to import every row of the spreadsheet, even if additional rows are added to Sheet1 later on. In comparison, if I use the range Sheet1!A2:C6, I will get all the data, but future data added to row 7, 8, 9 etc. will not be included in the data pulled by importrange. There is definitely a time and place for finite ranges, but this is not one of them.

At this point I have columns A through C, starting with row 2, from Child Spreadsheet 1 importing into my Master Spreadsheet. If I change the evaluation of Barack to an E in Child Spreadsheet 1, the change will also happen in the Master Spreadsheet. However, if I change Barack's evaluation in the Master Spreadsheet I will get a reference error that says  

Array result was not expanded because it would overwrite data in C2. 

Here importrange has decided not to display so as to avoid overwriting the change I made in cell C2. Child Spreadsheet 1 will remain unchanged.

Combining Multiple Spreadsheets

Next I construct a second importrange formula in cell A7 to bring in the data from Child Spreadsheet 2. The function is exactly the same is exactly the same as the formula I already wrote, except with a new spreadsheet key that points to Child Spreadsheet 2. The range I am interested in pulling remains the same.

Notice that in the above example the original importrange function outputs an error to prevent it from overwriting the new importrange function. It is important to conceptually understand that even though the first importrange function contains only 6 rows of actual data, it also contains about 1000 rows of empty data because of the bottomless range in the second argument of the function. From the perspective of the Master Spreadsheet, these rows look like normal empty spreadsheet cells, but they actually contain the imported empty data from Child Spreadsheet 1.

Enter Arrayformula

To combine multiple spreadsheets, Google Spreadsheet power users used to have use a script called vmerge, which as its names suggests, would vertically merge data with equal numbers of columns. Arranging the data vertically greatly eases the burden of performing analysis on the data in question. The new Google Spreadsheets have the vmerge functionality built right into the product. The syntax is a little wonky, but using the following arrayformula, users can stack multiple arrays on top of each other:

=arrayformula({ARRAY1; ARRAY2; ARRAYn})

The arguments in the above array formula are ranges or a function that returns a result greater than one cell (e.g. importrange).

When nesting functions in other functions, it is good practice to build them in pieces. In cell D2 I input the following:

Which will output two arrays stacked vertically, as shown below.

Next, I take the two importrange functions I have in cells C2 and C7 and combine them into the arrayformula, replacing the first range argument "A2:C6" with the first importrange function "IMPORTRANGE("1vIcMh0SRtwXgYRsmCfiPYo-H4_wpA8IGRZ0RVbOsHDg", "sheet1!A2:C")" and the second range argument "A7:C11" with the second importrange function "importrange("1vdooBX7tdPNAa5kn5LlbrJ95V_SjZAly4HW3kp0P8gw","Sheet1!A2:C")". I take the combined arrayformula and importrange functions and put the entire nested function in cell A2:

=arrayformula({IMPORTRANGE("1vIcMh0SRtwXgYRsmCfiPYo-H4_wpA8IGRZ0RVbOsHDg", "sheet1!A2:C");importrange("1vdooBX7tdPNAa5kn5LlbrJ95V_SjZAly4HW3kp0P8gw","Sheet1!A2:C")})

The result may be surprising. I get the first range, class Michigan, as I expect:

However, the second range, class Arizona, shows up 1001 rows below.

The reason that Arizona shows up on row 1001 goes back to the bottomless ranges I used in my importrange formulas. The importrange takes all of the cells in the given range, those with data and those that are empty. So as far as the Arrayformula is concerned, there is no difference between full and empty cells. It just sees two ranges, or lists, that are 1000 rows long each, and then stacks them vertically.

The final step is to wrap the arrayformula in a sort, which takes three arguments:

In this case, the range is going to be the entire arrayformula. The sort column, will be "2" to correspond with the class column (though 1 or 3 would work as well if I wanted an alphabetical list by name or a list sorted by evaluation score). The is_ascending argument will be TRUE because I want to see the classes listed in alphabetical ascending order. The completed formula should read:

=sort(arrayformula({IMPORTRANGE("1vIcMh0SRtwXgYRsmCfiPYo-H4_wpA8IGRZ0RVbOsHDg", "sheet1!A2:C");importrange("1vdooBX7tdPNAa5kn5LlbrJ95V_SjZAly4HW3kp0P8gw","Sheet1!A2:C")}),2,True)

The final output will be the two ranges of data that are being pulled from Child Spreadsheet 1 and Child Spreadsheet 2 stacked vertically and ordered by class. Remember, changes to the data in Child Spreadsheet 1 or Child Spreadsheet 2 will be almost instantly apparent in the Master Spreadsheet. The final output can be seen below.