Lab 3, Data Sources: AR, Aging, and Cash Application

Points: 100  |  Lab due: Wed Jun 3, 12:00 PM MT  |  HW due: Wed Jun 3, 12:00 PM MT  |  Submit: in the Lab 3 quiz
Lab instructions updated 2026-06-02. Lab 3 is a single Excel deliverable. You submit one workbook, lab_03.xlsx, to the Lab 3 quiz. If you saw an earlier version of this page that mentioned bullet points in the quiz, a fill-in template, a Colab notebook, or an HTML explainer, those are no longer part of Lab 3. The SQL work moved to HW 3 on the SQL Workbench.
Plan your week. Lab 3 is the most substantial piece of work in the course so far, five parts, all in Excel. Homework 3 picks up where Lab 3 leaves off and recreates the same headline numbers in SQL on the SQL Workbench. Aim to have the lab in working shape by Friday so HW 3 does not pile onto the weekend.

The story of this lab (start here)

Summit Gear Co. just handed you their accounts receivable. About 20,000 invoices, about 18,700 payments, five tables, three transactional (customers, invoices, payments) and two lookup (invoice_terms, invoice_statuses). The analyst's first job is to read it, the second is to compute the AR aging report, and the third is to find the discrepancies, places where the ERP's stored invoice status does not match what the payments actually do.

This week the lab IS the Excel workbook. You'll do the work by hand in Excel, the tool you already know, on the full ~20,000 invoices. The friction you feel at 20K rows is the case for SQL, which is what HW 3 picks up: you'll recreate the same headline numbers as SQL queries in the SQL Workbench, using the summary statistics you build in Part 5 of this lab as your validation target.

This is the close of Part 1 of the course. After Lab 3 + HW 3 you can read a relational schema, fill an aging bucket from a due date in Excel, write the SQL that scales it, and know when to reach for which tool.

Two moves this week: build the workbook, then validate it in SQL on the HW.

What you'll do

  1. Parts 1–4, Excel by hand. Download one pre-loaded template workbook. Fill in the analytical columns on invoices, cash application, balance, derived status, days past due, aging bucket, plus the lookup columns. Then build two pivots (aging on aging_summary, discrepancies on discrepancies), all in Excel, on the full ~20,000-row dataset.
  2. Part 5, Summary statistics. Add a summary sheet with headline numbers (counts, totals, open balance) and build the top-10 customer pivot on the existing top_customers sheet. These are the validation targets your HW 3 SQL queries will need to match.
No spec deliverable this week. Earlier weeks asked for a written spec before you built. This week's lab is non-programmatic, the workbook IS the deliverable, the formulas are the spec, the validation happens via the summary sheet you build in Part 5. (If writing a personal spec helps you organize, the prior weeks' template is still on GitHub for reference, it just isn't a graded artifact this round.)
What you hand in, one file, to the Lab 3 quiz.
  1. Your Excel workbook: lab_03.xlsx, the template you downloaded in Part 1, filled in through Part 5, with the summary sheet and top_customers pivot completed.

Part 0, Framework + pre-lab interactives

0.1, The origin of the data

Summit Gear Co. is a fictional wholesale outdoor-equipment distributor based in Salt Lake City. Four customer segments, Outdoor Retail, School District, Resort, Outfitter. The CFO reads an AR aging report on Monday morning to decide which customers to call. You are building that report.

A few details worth knowing up front:

Using AI on this lab is fine and expected. Direct an LLM to draft an Excel formula, debug a broken one, or explain a pivot step. The skill being graded is whether you can read the result and trust it, not whether you produced it unaided.
Heads up for later weeks. This lab uses a simplified grain: every invoice has at most one payment row (with one designed exception you'll discover). Real B2B AR systems often have one-to-many and many-to-many invoice-to-payment relationships, a deposit split across several invoices, or one invoice paid by several deposits. Handling those properly takes a junction table and more careful JOINs. We'll cover that in Week 5+. For now, you're seeing the simpler case on purpose.
About Void status, and a known limitation of the dataset. Void in this dataset means cancelled, the invoice was withdrawn after issue. The stored row stays in the table for the audit trail; the balance does not. Voided invoices are excluded from the aging report (no money is owed) and from cash-application sums.

One thing the data does not tell you: when the void was posted. Our invoices table records the current status only, there is no voided_at timestamp. That matters in Part 4: when you see a Void invoice with a payment against it, the data alone cannot say whether the payment came before or after the void. You can only say the two disagree. The diagnostic move is to flag the row for the controller and pull the supporting documents.

One more note: this dataset has no effective-dated status log. Stored status is the current value only, if an invoice is voided today, we lose the ability to ask "what status was it on 2025-12-31?" Production AR systems solve this with effective-dated status logs; we don't have those here. The discipline (which the Homework exercises): when you freeze a historical aging snapshot, freeze it as values, not as a recomputing formula, so the number you cited stays cited.

0.2, Pre-lab interactives

Five pieces, all in your browser. Strongly recommended before you start the lab if SQL is new to you.

Tight on time? Minimum on-ramp: walk aging_builder.html (the SQL trainer that lands exactly the query you'll recreate in HW 3) and walk Lessons 1, 3, and 5 of the Excel concepts trainer (SUMIFS, chained XLOOKUP, the cutoff cell). The rest is reference you can pull up as you hit each part.
The 1:n trap. When two rows on the right of a join match the same row on the left, your row count doubles, and so does any SUM you compute. The JOIN trainer demonstrates this with a one-button toggle. You will see it again in HW 3 if you aggregate on the wrong side of a SQL join.

0.3, Cheat sheets

Keep these open in tabs while you work.

0.4, The kickstart prompt (TLDR)

If you'd rather start in the LLM and pull context as you go, the button below copies a minimal-essentials prompt to your clipboard, the lab's idea, the five-table schema in compact form, the single Excel deliverable, and a link to the cheat sheet. Paste it into ChatGPT, Claude, or Gemini and you have enough to land the headline parts of the lab even if you have not walked the trainers.

Use this as an escape valve, not a default path. The trainers and cheat sheets are how the SQL lands; the kickstart prompt is for when you are stuck and need to start somewhere.
The dataset, what you're working with

Summit Gear's AR covers 2024-01-01 → 2026-04-30, about 28 months. Five tables, ~20,000 invoices, ~18,700 payments. You work the whole lab in one pre-loaded Excel template, which you download in Part 1, with the five source tables already loaded as named sheets. The raw CSVs are also hosted if you want to inspect a table on its own, but you do not need them.

The schema

About PK and FK. A primary key (PK) is the column that uniquely identifies each row of a table. customer_id is the PK of customers: every customer has one, and no two customers share it. A foreign key (FK) is a column that points to another table's PK. invoices.customer_id is an FK pointing at customers.customer_id; that is how you know which customer an invoice belongs to. PK and FK are the glue that turns a pile of tables into a relational database. The arrows in the diagram below show those FK relationships.

                    invoice_terms                invoice_statuses
                    +-----------+                +-----------+
                    | terms_id  |                | status_id |
                    | terms_name|                | status_name|
                    | net_days  |                | description|
                    +-----+-----+                +-----+-----+
                          |                            |
                          | (terms_id FK)              | (status_id FK)
                          v                            v
   +-----------+    +-----+-----+              +-----+-----+
   | customers +----> customer_id|              | invoices  |
   +-----------+    | terms_id   |<-------------+ customer_id|
                    | segment    |              | status_id  |
                    +-----+-----+               | amount     |
                          ^                     | due_date   |
                          |                     +-----+-----+
                          |                           |
                          |                           | (invoice_id FK)
                          |                           v
                          |                     +-----+-----+
                          |                     | payments  |
                          +---------------------+ invoice_id|
                                                | amount    |
                                                | pay_date  |
                                                +-----------+
TableGrainKey columns
customers one row per customer customer_id PK; terms_id FK → invoice_terms
invoices one row per invoice (~20,000) invoice_id PK; customer_id FK → customers; status_id FK → invoice_statuses
payments one row per payment (~18,700) payment_id PK; invoice_id FK → invoices
invoice_terms one row per terms code (4 rows) terms_id PK
invoice_statuses one row per stored status (4 rows) status_id PK
Net terms are on customers, not invoices. A given customer always invoices on the same terms. To find an invoice's terms, JOIN invoicescustomersinvoice_terms. Three tables to answer one question.

As-of dates that matter in this lab

DateWhat it's for
2026-04-30The lab's primary as-of date. The aging report references this.
2024-01-01 → 2026-04-30The invoice-date range.

Download the data

One pre-loaded Excel template is all you need. The raw CSVs are also hosted if you want to inspect a table on its own or take the CSV-import path described in Part 1.

How to introspect the tables

In Excel, open the template (or a CSV): the row count is in the bottom-left of the window, the column headers are row 1, and clicking a cell shows the raw value.


Part 1, Get the template

Download lab_03_template.xlsx and open it. Save it as lab_03.xlsx right away, this one workbook is the entire Excel deliverable. Do not put your name or university ID in the filename; Canvas attaches your identity automatically when you upload.

The nine sheets

The template ships with nine sheets, in this order. Do not rename them. The grader reads the workbook by sheet name.

SheetWhat's on it
cutoffParameters. B1 is the cutoff date (text, ISO format) that drives every formula in the workbook. Currently 2026-04-30.
customers~1,500 rows, fully populated. Named table: tbl_customers.
invoices~20,015 rows. The six source columns are populated; the nine analytical columns to the right are empty, you fill them in Part 2. Named table: tbl_invoices.
payments~18,667 rows, fully populated. Named table: tbl_payments.
invoice_terms4 rows. Named table: tbl_invoice_terms.
invoice_statuses4 rows. Named table: tbl_invoice_statuses.
aging_summaryEmpty target. You build a pivot here in Part 3.
discrepanciesEmpty target. You build a pivot here in Part 4.
top_customersEmpty target. You build a pivot here in Part 5.

The cutoff sheet

One cell drives the whole workbook. cutoff!$B$1 is the as-of date. Every formula you write in Part 2 references it. When you change it, every formula on invoices recomputes, and every pivot built on tbl_invoices follows on refresh. Hard-coding 2026-04-30 into formulas instead misses the lesson. In HW 3 you recreate these same as-of numbers in SQL, so a cleanly parameterized workbook is what lets the two tie out.

Optional CSV-import path. If you want to prove you can rebuild this from scratch, for portfolio reasons, or because you prefer Power Query, the five raw CSVs are in the "Download the data" section above. You'd import each as a table, set the same names (tbl_customers, tbl_invoices, etc.), add a cutoff sheet, and add the nine analytical columns to invoices yourself. It is slower, and the rest of the lab assumes you used the template. Pick one path and stick with it.
Part 1 done when: the template is open, saved as lab_03.xlsx, and you have located cutoff!$B$1, the cell every formula in Part 2 will reference.
Part 2, Fill the analytical columns on invoices

On the invoices sheet, columns G through O are headers only, the body is empty. You write one formula per column in the first data row; the Excel Table auto-fills the column down all ~20,000 rows. Use structured references everywhere: tbl_invoices[@[invoice_id]], not A2. Structured references survive table extensions and reorderings; cell references do not. Note the double brackets: [@[column_name]] is the safe form for the current row's value, use it consistently.

Touch each formula in this order: cash application first, then the date math and aging bucket, then the lookup columns. Each block depends on the one above, balance needs paid_to_date, aging_bucket needs days_past_due. The lookup columns are display polish and come last.
Heads up: Excel data type quirks. Depending on your Excel version and import path, some columns in the template can come through as text rather than dates or numbers. If a formula returns #VALUE!, returns 0 when you expect a real total, or your row counts do not match the expected numbers from the summary sheet, this is the most common cause.

Two ways to fix it:

  • Copilot in Excel. Select the misbehaving column and ask Copilot to convert it to a date or a number. It will surface the right Power Query step or formula.
  • The add-column-and-zero trick. In a blank column to the right of the table, enter =[@column] + 0. The +0 forces Excel to coerce text into a number, and text-dates into real dates. Copy the new column, then paste it back over the original column as values (Paste Special, Values). The column now has the correct data type.

Sean will demo both moves in class. If a formula below is not behaving as described, check column data types before chasing the formula itself.

2.1, Cash application

Three columns that turn the raw payments into per-invoice cash applied, outstanding balance, and a derived status. Write the formula once in the first data row; the Excel Table auto-fills the column down all ~20,000 rows.

ColumnFormula + what it does
paid_to_date =SUMIFS(tbl_payments[amount], tbl_payments[invoice_id], [@[invoice_id]], tbl_payments[payment_date], "<="&cutoff!$B$1)

For each invoice, sum every payment row that points back at that invoice and arrived on or before the cutoff date. The "<="&cutoff!$B$1 filter is what lets the as-of-date cell drive the whole report, change the cutoff and every paid_to_date recomputes.
balance =[@amount] - [@[paid_to_date]]

What's still owed. If a customer overpaid, this goes slightly negative (a credit balance), the next column treats anything within a penny as paid.
derived_status =IF([@balance]<0.01, "Paid", IF([@[paid_to_date]]>0, "Partial", "Open"))

Nested IF, evaluated left to right: balance essentially zero → Paid; otherwise some payment exists → Partial; otherwise no payment → Open. The 0.01 is a rounding-tolerance margin (treats $0.003 of float drift as zero). This column drives both the aging filter in Part 3 and the discrepancy hunt in Part 4.

At 20,000 rows the paid_to_date SUMIFS is slow. Expect Excel to pause for several seconds when you fill the column. That pause is the point. You are doing by hand what SQL does in three lines.

If SUMIFS at scale is new to you, walk Excel concepts trainer Lesson 1 first.

2.2, Date math and aging

Two columns that turn the due date into a bucket label. These formulas assume due_date is stored as a real Excel date. If it came through as text, the subtraction will throw #VALUE!. See the data-type heads-up above for the fix.

ColumnFormula + what it does
days_past_due =cutoff!$B$1 - [@[due_date]]

Straight date subtraction. Positive number = past due (and how many days); zero or negative = not yet due.
aging_bucket =IFS([@[days_past_due]]<=0,"0. Current", [@[days_past_due]]<=30,"1. 01-30", [@[days_past_due]]<=60,"2. 31-60", [@[days_past_due]]<=90,"3. 61-90", TRUE,"4. 90+")

IFS walks the conditions top-to-bottom and stops at the first true one; TRUE at the end catches everything else. The "0. " / "1. " / "2. " prefixes make the buckets sort correctly in the pivot table without manually re-ordering rows.
If IFS and absolute references like cutoff!$B$1 are new, walk Lessons 4 and 5 of the Excel concepts trainer.

2.3, Lookup columns (display polish)

Setup step, add payment_terms to the customers sheet

The customers table only stores terms_id (a foreign key into invoice_terms), not the actual terms name. Rather than nesting two lookups every time you want the terms name on an invoice row, add a helper column called payment_terms to the customers sheet first. With it in place, every invoices-side lookup below stays single-hop.

SheetColumn to addFormula + what it does
customerspayment_terms =VLOOKUP([@[terms_id]], tbl_invoice_terms, 2, FALSE)

For each customer row, look up that customer's terms_id in tbl_invoice_terms, return column 2 (the terms name, "Net 30", "Net 45", etc.). The trailing FALSE forces an exact match (VLOOKUP defaults to approximate, which is rarely what you want for an ID lookup).
The helper-column pattern. When a lookup keeps getting nested, materialize the intermediate as a column on the source table. The customers row now carries both terms_id AND payment_terms, the second exists purely to make downstream formulas one hop instead of two. Same pattern shows up in SQL as a view, and in pandas as a derived column.

Now fill the four lookups on the invoices sheet

Four columns to fill via lookups against tbl_customers and tbl_invoice_statuses. Two worked XLOOKUP examples below; you write the other two yourself, following the same pattern with a different return column or lookup table.

ColumnFormula + what it does
customer_name =XLOOKUP([@[customer_id]], tbl_customers[customer_id], tbl_customers[customer_name])

Look up the invoice's customer_id in the customers table, return the matching customer_name. Single-hop XLOOKUP, this is the pattern segment and status_name below follow.
terms_name =XLOOKUP([@[customer_id]], tbl_customers[customer_id], tbl_customers[payment_terms], FALSE)

Same XLOOKUP shape as customer_name, but the return column is the payment_terms helper column you just added to customers in the setup step above. The trailing FALSE is XLOOKUP's if_not_found argument, returns FALSE instead of #N/A on a miss, which is defensive polish since every invoice should match a customer.
segment You write this one. Same single-hop XLOOKUP pattern as customer_name, same lookup key and same lookup table, just a different return column on tbl_customers.
status_name You write this one. Same single-hop XLOOKUP shape, but the lookup key is [@[status_id]] and the table is tbl_invoice_statuses.
Why VLOOKUP on customers but XLOOKUP on invoices? VLOOKUP works when the lookup key is the first column of the target table, which is true for tbl_invoice_terms (terms_id is column 1). XLOOKUP handles arbitrary column positions and is more readable for the invoices-side lookups where you're naming columns explicitly. Either function works on either side; the choice here is mostly to show you both in their natural homes.
If single XLOOKUP or VLOOKUP is new to you, walk Excel concepts trainer Lesson 3 first.
Part 2 done when: all nine analytical columns are filled down the table, a spot-check on three invoices reconciles by hand (you can find the matching payment rows and confirm balance and aging_bucket are right), and the formulas all reference cutoff!$B$1 rather than hard-coded dates.
Part 3, AR aging pivot

On the aging_summary sheet, insert a pivot table from tbl_invoices.

Pivot areaField
Rowsaging_bucket
ValuesCOUNT of invoice_id, SUM of balance
Filterderived_status, keep Open and Partial; drop Paid and Void

The buckets will sort themselves, the "0. " / "1. " prefixes from your IFS formula in Part 2.2 mean the rows appear in order automatically (no manual reorder). Format the SUM column as currency, no decimals.

Why filter on derived_status and not on balance > 0? Excel's pivot Filters area doesn't reliably support numerical conditions like "greater than zero", it expects categorical values. Filtering to Open + Partial keeps every invoice that still owes money and drops the noise (Paid, Void) cleanly. Same outcome, a filter the pivot engine actually likes.
Deliverable for Part 3. The aging_summary sheet with the aging pivot at the current cutoff date, ordered Current → Over 90, and the SUM column formatted as currency.

Note for the curious: the workbook's cutoff!$B$1 cell drives every analytical formula in Part 2, so you could change it (say, to 2025-12-31) and watch the pivot recompute. That's the parameterization payoff. The lab doesn't ask you to change it here, but that as-of-date discipline is what keeps your numbers reproducible when you rebuild them in SQL for HW 3.

Part 4, Discrepancy hunt

Discrepancy Check: We often see mismatches between data that can have a material impact. We want to get in the habit of auditing outputs and ensuring they meet expected outcomes. The stored invoice status is what a person in the ERP recorded. The payments table is what actually happened. When the two disagree, the analyst's job is to surface the difference and the dollars at stake.

Real-world note. Summit Gear's books, like every B2B AR system, also have credit memos that reduce a balance without a cash payment. This lab simplifies, there are no credit memos here. When you find a real "paid but no payment row" case in industry, the first thing to check is whether a credit memo was applied. Outside this lab the discrepancy categories are a longer list.

On the discrepancies sheet, insert a pivot from tbl_invoices in a matrix layout: stored status across one axis, derived status across the other:

Pivot areaField
Rowsstatus_name (the stored status from the ERP)
Columnsderived_status (what the payments actually do)
ValuesCOUNT of invoice_id
FilterNone, show every cell

You should see a 4×4 grid: four stored statuses down the rows, four derived statuses across the columns, a count in every cell. The diagonal cells (Open/Open, Paid/Paid, Partial/Partial, Void/Void) are where the stored status agrees with the payment-derived state, most rows live there. The off-diagonal cells are the discrepancies.

The three off-diagonal cells you should expect to see populated:

You do not need to filter the pivot to highlight the mismatches. The grid layout does the work. The diagonal is the agreement. The off-diagonal cells are the discrepancies.

Deliverable for Part 4. The discrepancies sheet with the 4×4 status_name × derived_status grid (COUNT of invoice_id in each cell), built inside your lab_03.xlsx workbook. There is nothing separate to submit for Part 4, the pivot in the workbook is the deliverable. As you read the grid, note which off-diagonal cell is the most concerning and why, that judgment is the analytical point of the part.
Part 5, Summary statistics sheet (your HW 3 validation target)

One more sheet on the workbook before you submit. The summary sheet consolidates your headline numbers in one place, counts, totals, the open-balance figure, the top-10 customers, so you have a validation target when you write SQL for HW 3. Each row of this sheet maps to a homework question: when you run the SQL, your result should match the number on this sheet, or you have a bug somewhere.

5.1, Build a summary sheet

Add a new sheet named summary (right-click any tab, then Insert or New Sheet). Lay out two columns: Label in column A, Value in column B. Use formulas that reference your named tables and the Part 2 columns, not hard-coded numbers. If you change cutoff!$B$1, every value here should follow.

Label (col A)Value (col B), formula shapeValidates HW#
Total invoice count=ROWS(tbl_invoices)HW-01
Unique customer count=ROWS(UNIQUE(tbl_invoices[customer_id]))HW-02
Total payment count=ROWS(tbl_payments)HW-03 (related)
Total invoiced amount=SUM(tbl_invoices[amount])HW-04 (related)
Total payments received=SUM(tbl_payments[amount])HW-03 (related)
Open invoice count=COUNTIF(tbl_invoices[balance], ">0")HW-05
Total open balance=SUMIFS(tbl_invoices[balance], tbl_invoices[balance], ">0")HW-04, HW-05

The per-bucket aging totals (HW-07) and per-flavor discrepancy counts (HW-08) already live on your aging_summary and discrepancies pivots from Parts 3 and 4, no extra cells needed; just know which numbers to cite when validating.

5.2, Build the top_customers pivot

The template ships with an empty top_customers sheet. Build a PivotTable there to give yourself the top 10 customers by open balance, the HW-06 validator.

Why this matters. In HW 3 you'll write SQL against the same dataset on the SQL Workbench. The first thing you do for every SQL result is compare it to your Excel summary sheet (or the relevant pivot). If they match, your SQL is right. If they don't, you have a bug, usually a missing filter, a wrong join, or an aggregation against the wrong column. This reconciliation is the accounting beat dressed as a SQL one.
Deliverable for Part 5. The summary sheet with at least the seven cells above filled in via formulas, plus the top_customers PivotTable populated. Both are part of the single lab_03.xlsx upload.

Submission

One file goes to the Lab 3 quiz in the Week 3 module. Use the plain filename listed below. Do not embed your name, university ID, or any other identifying information in the filename, Canvas attaches your identity automatically when you upload.

SlotFileCovers
1lab_03.xlsxParts 1–5, the filled-in template, including the new summary sheet and the top_customers pivot
The .xlsx is the template, filled in. The grader opens it expecting the original nine sheet names from Part 1 plus your new summary sheet, the named tables intact, the analytical columns on invoices populated, and pivots on the three target sheets (aging_summary, discrepancies, top_customers). Renaming sheets or rebuilding from a blank workbook breaks the deterministic checks.
Save your work as you go. If you're signed into Microsoft 365, Excel autosaves to OneDrive; if not, hit Ctrl+S often. HW 3 (SQL queries on the SQL Workbench) starts where this lab leaves off, your summary sheet is the validation target you'll quote from when you write each query.

Grading

Grading is generous and formative. The skill being checked is whether you can read a relational schema, fill in cash-app formulas at 20K rows, and produce numbers you can defend, not whether your formulas are optimal. AI use is expected, not penalized. Slow Excel is fine. What we look for: every analytical column on invoices referencing cutoff!$B$1 rather than a hard-coded date; the aging pivot present with all five buckets; all three discrepancy flavors surfaced with their counts and dollars; the top-10 customer pivot built; and a summary sheet whose cells point at your tables (not hard-coded values) so the work re-runs cleanly when data changes.


Need help?

Summit Gear Co. is fictional. The dataset and the lab are custom-built for ACCTG 5150.