lab_03.xlsx, to the Lab 3 quiz. If you saw an earlier version of this page that mentioned bullet points in the quiz, a fill-in template, a Colab notebook, or an HTML explainer, those are no longer part of Lab 3. The SQL work moved to HW 3 on the SQL Workbench.
Summit Gear Co. just handed you their accounts receivable. About 20,000 invoices, about 18,700 payments, five tables, three transactional (customers, invoices, payments) and two lookup (invoice_terms, invoice_statuses). The analyst's first job is to read it, the second is to compute the AR aging report, and the third is to find the discrepancies, places where the ERP's stored invoice status does not match what the payments actually do.
This week the lab IS the Excel workbook. You'll do the work by hand in Excel, the tool you already know, on the full ~20,000 invoices. The friction you feel at 20K rows is the case for SQL, which is what HW 3 picks up: you'll recreate the same headline numbers as SQL queries in the SQL Workbench, using the summary statistics you build in Part 5 of this lab as your validation target.
This is the close of Part 1 of the course. After Lab 3 + HW 3 you can read a relational schema, fill an aging bucket from a due date in Excel, write the SQL that scales it, and know when to reach for which tool.
Two moves this week: build the workbook, then validate it in SQL on the HW.
invoices, cash application, balance, derived status, days past due, aging bucket, plus the lookup columns. Then build two pivots (aging on aging_summary, discrepancies on discrepancies), all in Excel, on the full ~20,000-row dataset.summary sheet with headline numbers (counts, totals, open balance) and build the top-10 customer pivot on the existing top_customers sheet. These are the validation targets your HW 3 SQL queries will need to match.summary sheet you build in Part 5. (If writing a personal spec helps you organize, the prior weeks' template is still on GitHub for reference, it just isn't a graded artifact this round.)
lab_03.xlsx, the template you downloaded in Part 1, filled in through Part 5, with the summary sheet and top_customers pivot completed.Summit Gear Co. is a fictional wholesale outdoor-equipment distributor based in Salt Lake City. Four customer segments, Outdoor Retail, School District, Resort, Outfitter. The CFO reads an AR aging report on Monday morning to decide which customers to call. You are building that report.
A few details worth knowing up front:
terms_id lives on customers, not on invoices. To find an invoice's terms you join through the customer.invoices table records the current status only, there is no voided_at timestamp. That matters in Part 4: when you see a Void invoice with a payment against it, the data alone cannot say whether the payment came before or after the void. You can only say the two disagree. The diagnostic move is to flag the row for the controller and pull the supporting documents.
Five pieces, all in your browser. Strongly recommended before you start the lab if SQL is new to you.
SELECT, FROM, WHERE, ORDER BY, LIMIT. Foundation for everything that follows.GROUP BY, SUM, COUNT, AVG, HAVING. Animated row-to-summary collapse.$A$1 cutoff-cell pattern, and pivots. Each lesson shows the formula plus a parallel Copilot-for-Excel prompt. Walk this if any of the Part 3–4 formulas are new to you.SUM you compute. The JOIN trainer demonstrates this with a one-button toggle. You will see it again in HW 3 if you aggregate on the wrong side of a SQL join.
Keep these open in tabs while you work.
If you'd rather start in the LLM and pull context as you go, the button below copies a minimal-essentials prompt to your clipboard, the lab's idea, the five-table schema in compact form, the single Excel deliverable, and a link to the cheat sheet. Paste it into ChatGPT, Claude, or Gemini and you have enough to land the headline parts of the lab even if you have not walked the trainers.
Summit Gear's AR covers 2024-01-01 → 2026-04-30, about 28 months. Five tables, ~20,000 invoices, ~18,700 payments. You work the whole lab in one pre-loaded Excel template, which you download in Part 1, with the five source tables already loaded as named sheets. The raw CSVs are also hosted if you want to inspect a table on its own, but you do not need them.
About PK and FK. A primary key (PK) is the column that uniquely identifies each row of a table. customer_id is the PK of customers: every customer has one, and no two customers share it. A foreign key (FK) is a column that points to another table's PK. invoices.customer_id is an FK pointing at customers.customer_id; that is how you know which customer an invoice belongs to. PK and FK are the glue that turns a pile of tables into a relational database. The arrows in the diagram below show those FK relationships.
invoice_terms invoice_statuses
+-----------+ +-----------+
| terms_id | | status_id |
| terms_name| | status_name|
| net_days | | description|
+-----+-----+ +-----+-----+
| |
| (terms_id FK) | (status_id FK)
v v
+-----------+ +-----+-----+ +-----+-----+
| customers +----> customer_id| | invoices |
+-----------+ | terms_id |<-------------+ customer_id|
| segment | | status_id |
+-----+-----+ | amount |
^ | due_date |
| +-----+-----+
| |
| | (invoice_id FK)
| v
| +-----+-----+
| | payments |
+---------------------+ invoice_id|
| amount |
| pay_date |
+-----------+
| Table | Grain | Key columns |
|---|---|---|
customers |
one row per customer | customer_id PK; terms_id FK → invoice_terms |
invoices |
one row per invoice (~20,000) | invoice_id PK; customer_id FK → customers; status_id FK → invoice_statuses |
payments |
one row per payment (~18,700) | payment_id PK; invoice_id FK → invoices |
invoice_terms |
one row per terms code (4 rows) | terms_id PK |
invoice_statuses |
one row per stored status (4 rows) | status_id PK |
customers, not invoices. A given customer always invoices on the same terms. To find an invoice's terms, JOIN invoices → customers → invoice_terms. Three tables to answer one question.
| Date | What it's for |
|---|---|
| 2026-04-30 | The lab's primary as-of date. The aging report references this. |
| 2024-01-01 → 2026-04-30 | The invoice-date range. |
One pre-loaded Excel template is all you need. The raw CSVs are also hosted if you want to inspect a table on its own or take the CSV-import path described in Part 1.
cutoff), five source tables (customers, invoices, payments, invoice_terms, invoice_statuses), and three empty target sheets for pivots (aging_summary, discrepancies, top_customers).In Excel, open the template (or a CSV): the row count is in the bottom-left of the window, the column headers are row 1, and clicking a cell shows the raw value.
Download lab_03_template.xlsx and open it. Save it as lab_03.xlsx right away, this one workbook is the entire Excel deliverable. Do not put your name or university ID in the filename; Canvas attaches your identity automatically when you upload.
The template ships with nine sheets, in this order. Do not rename them. The grader reads the workbook by sheet name.
| Sheet | What's on it |
|---|---|
cutoff | Parameters. B1 is the cutoff date (text, ISO format) that drives every formula in the workbook. Currently 2026-04-30. |
customers | ~1,500 rows, fully populated. Named table: tbl_customers. |
invoices | ~20,015 rows. The six source columns are populated; the nine analytical columns to the right are empty, you fill them in Part 2. Named table: tbl_invoices. |
payments | ~18,667 rows, fully populated. Named table: tbl_payments. |
invoice_terms | 4 rows. Named table: tbl_invoice_terms. |
invoice_statuses | 4 rows. Named table: tbl_invoice_statuses. |
aging_summary | Empty target. You build a pivot here in Part 3. |
discrepancies | Empty target. You build a pivot here in Part 4. |
top_customers | Empty target. You build a pivot here in Part 5. |
cutoff sheetOne cell drives the whole workbook. cutoff!$B$1 is the as-of date. Every formula you write in Part 2 references it. When you change it, every formula on invoices recomputes, and every pivot built on tbl_invoices follows on refresh. Hard-coding 2026-04-30 into formulas instead misses the lesson. In HW 3 you recreate these same as-of numbers in SQL, so a cleanly parameterized workbook is what lets the two tie out.
tbl_customers, tbl_invoices, etc.), add a cutoff sheet, and add the nine analytical columns to invoices yourself. It is slower, and the rest of the lab assumes you used the template. Pick one path and stick with it.
lab_03.xlsx, and you have located cutoff!$B$1, the cell every formula in Part 2 will reference.
invoicesOn the invoices sheet, columns G through O are headers only, the body is empty. You write one formula per column in the first data row; the Excel Table auto-fills the column down all ~20,000 rows. Use structured references everywhere: tbl_invoices[@[invoice_id]], not A2. Structured references survive table extensions and reorderings; cell references do not. Note the double brackets: [@[column_name]] is the safe form for the current row's value, use it consistently.
balance needs paid_to_date, aging_bucket needs days_past_due. The lookup columns are display polish and come last.
#VALUE!, returns 0 when you expect a real total, or your row counts do not match the expected numbers from the summary sheet, this is the most common cause.
Two ways to fix it:
=[@column] + 0. The +0 forces Excel to coerce text into a number, and text-dates into real dates. Copy the new column, then paste it back over the original column as values (Paste Special, Values). The column now has the correct data type.Sean will demo both moves in class. If a formula below is not behaving as described, check column data types before chasing the formula itself.
Three columns that turn the raw payments into per-invoice cash applied, outstanding balance, and a derived status. Write the formula once in the first data row; the Excel Table auto-fills the column down all ~20,000 rows.
| Column | Formula + what it does |
|---|---|
paid_to_date |
=SUMIFS(tbl_payments[amount], tbl_payments[invoice_id], [@[invoice_id]], tbl_payments[payment_date], "<="&cutoff!$B$1)
For each invoice, sum every payment row that points back at that invoice and arrived on or before the cutoff date. The "<="&cutoff!$B$1 filter is what lets the as-of-date cell drive the whole report, change the cutoff and every paid_to_date recomputes. |
balance |
=[@amount] - [@[paid_to_date]]
What's still owed. If a customer overpaid, this goes slightly negative (a credit balance), the next column treats anything within a penny as paid. |
derived_status |
=IF([@balance]<0.01, "Paid", IF([@[paid_to_date]]>0, "Partial", "Open"))
Nested IF, evaluated left to right: balance essentially zero → Paid; otherwise some payment exists → Partial; otherwise no payment → Open. The 0.01 is a rounding-tolerance margin (treats $0.003 of float drift as zero). This column drives both the aging filter in Part 3 and the discrepancy hunt in Part 4. |
At 20,000 rows the paid_to_date SUMIFS is slow. Expect Excel to pause for several seconds when you fill the column. That pause is the point. You are doing by hand what SQL does in three lines.
Two columns that turn the due date into a bucket label. These formulas assume due_date is stored as a real Excel date. If it came through as text, the subtraction will throw #VALUE!. See the data-type heads-up above for the fix.
| Column | Formula + what it does |
|---|---|
days_past_due |
=cutoff!$B$1 - [@[due_date]]
Straight date subtraction. Positive number = past due (and how many days); zero or negative = not yet due. |
aging_bucket |
=IFS([@[days_past_due]]<=0,"0. Current", [@[days_past_due]]<=30,"1. 01-30", [@[days_past_due]]<=60,"2. 31-60", [@[days_past_due]]<=90,"3. 61-90", TRUE,"4. 90+")
IFS walks the conditions top-to-bottom and stops at the first true one; TRUE at the end catches everything else. The "0. " / "1. " / "2. " prefixes make the buckets sort correctly in the pivot table without manually re-ordering rows. |
IFS and absolute references like cutoff!$B$1 are new, walk Lessons 4 and 5 of the Excel concepts trainer.
payment_terms to the customers sheetThe customers table only stores terms_id (a foreign key into invoice_terms), not the actual terms name. Rather than nesting two lookups every time you want the terms name on an invoice row, add a helper column called payment_terms to the customers sheet first. With it in place, every invoices-side lookup below stays single-hop.
| Sheet | Column to add | Formula + what it does |
|---|---|---|
customers | payment_terms |
=VLOOKUP([@[terms_id]], tbl_invoice_terms, 2, FALSE)
For each customer row, look up that customer's terms_id in tbl_invoice_terms, return column 2 (the terms name, "Net 30", "Net 45", etc.). The trailing FALSE forces an exact match (VLOOKUP defaults to approximate, which is rarely what you want for an ID lookup). |
terms_id AND payment_terms, the second exists purely to make downstream formulas one hop instead of two. Same pattern shows up in SQL as a view, and in pandas as a derived column.
invoices sheetFour columns to fill via lookups against tbl_customers and tbl_invoice_statuses. Two worked XLOOKUP examples below; you write the other two yourself, following the same pattern with a different return column or lookup table.
| Column | Formula + what it does |
|---|---|
customer_name |
=XLOOKUP([@[customer_id]], tbl_customers[customer_id], tbl_customers[customer_name])
Look up the invoice's customer_id in the customers table, return the matching customer_name. Single-hop XLOOKUP, this is the pattern segment and status_name below follow. |
terms_name |
=XLOOKUP([@[customer_id]], tbl_customers[customer_id], tbl_customers[payment_terms], FALSE)
Same XLOOKUP shape as customer_name, but the return column is the payment_terms helper column you just added to customers in the setup step above. The trailing FALSE is XLOOKUP's if_not_found argument, returns FALSE instead of #N/A on a miss, which is defensive polish since every invoice should match a customer. |
segment |
You write this one. Same single-hop XLOOKUP pattern as customer_name, same lookup key and same lookup table, just a different return column on tbl_customers. |
status_name |
You write this one. Same single-hop XLOOKUP shape, but the lookup key is [@[status_id]] and the table is tbl_invoice_statuses. |
tbl_invoice_terms (terms_id is column 1). XLOOKUP handles arbitrary column positions and is more readable for the invoices-side lookups where you're naming columns explicitly. Either function works on either side; the choice here is mostly to show you both in their natural homes.
balance and aging_bucket are right), and the formulas all reference cutoff!$B$1 rather than hard-coded dates.
On the aging_summary sheet, insert a pivot table from tbl_invoices.
| Pivot area | Field |
|---|---|
| Rows | aging_bucket |
| Values | COUNT of invoice_id, SUM of balance |
| Filter | derived_status, keep Open and Partial; drop Paid and Void |
The buckets will sort themselves, the "0. " / "1. " prefixes from your IFS formula in Part 2.2 mean the rows appear in order automatically (no manual reorder). Format the SUM column as currency, no decimals.
derived_status and not on balance > 0? Excel's pivot Filters area doesn't reliably support numerical conditions like "greater than zero", it expects categorical values. Filtering to Open + Partial keeps every invoice that still owes money and drops the noise (Paid, Void) cleanly. Same outcome, a filter the pivot engine actually likes.
aging_summary sheet with the aging pivot at the current cutoff date, ordered Current → Over 90, and the SUM column formatted as currency.
Note for the curious: the workbook's cutoff!$B$1 cell drives every analytical formula in Part 2, so you could change it (say, to 2025-12-31) and watch the pivot recompute. That's the parameterization payoff. The lab doesn't ask you to change it here, but that as-of-date discipline is what keeps your numbers reproducible when you rebuild them in SQL for HW 3.
Discrepancy Check: We often see mismatches between data that can have a material impact. We want to get in the habit of auditing outputs and ensuring they meet expected outcomes. The stored invoice status is what a person in the ERP recorded. The payments table is what actually happened. When the two disagree, the analyst's job is to surface the difference and the dollars at stake.
On the discrepancies sheet, insert a pivot from tbl_invoices in a matrix layout: stored status across one axis, derived status across the other:
| Pivot area | Field |
|---|---|
| Rows | status_name (the stored status from the ERP) |
| Columns | derived_status (what the payments actually do) |
| Values | COUNT of invoice_id |
| Filter | None, show every cell |
You should see a 4×4 grid: four stored statuses down the rows, four derived statuses across the columns, a count in every cell. The diagonal cells (Open/Open, Paid/Paid, Partial/Partial, Void/Void) are where the stored status agrees with the payment-derived state, most rows live there. The off-diagonal cells are the discrepancies.
The three off-diagonal cells you should expect to see populated:
stored = Paid, derived = Open, the ERP thinks it's paid; the cash says no payment came in. (Paid-but-not.)stored = Open, derived = Paid, the cash arrived; nobody flipped the status. (Open-but-paid.)stored = Void, derived = Paid (or Partial), the ERP says this invoice is cancelled, yet cash came in against it. (Void-with-payment.) From the data alone we can't say whether the void came before or after the payment (no voided_at timestamp, see the §0.1 callout). The diagnostic move: flag for the controller and pull the supporting documents.You do not need to filter the pivot to highlight the mismatches. The grid layout does the work. The diagonal is the agreement. The off-diagonal cells are the discrepancies.
discrepancies sheet with the 4×4 status_name × derived_status grid (COUNT of invoice_id in each cell), built inside your lab_03.xlsx workbook. There is nothing separate to submit for Part 4, the pivot in the workbook is the deliverable. As you read the grid, note which off-diagonal cell is the most concerning and why, that judgment is the analytical point of the part.
One more sheet on the workbook before you submit. The summary sheet consolidates your headline numbers in one place, counts, totals, the open-balance figure, the top-10 customers, so you have a validation target when you write SQL for HW 3. Each row of this sheet maps to a homework question: when you run the SQL, your result should match the number on this sheet, or you have a bug somewhere.
summary sheetAdd a new sheet named summary (right-click any tab, then Insert or New Sheet). Lay out two columns: Label in column A, Value in column B. Use formulas that reference your named tables and the Part 2 columns, not hard-coded numbers. If you change cutoff!$B$1, every value here should follow.
| Label (col A) | Value (col B), formula shape | Validates HW# |
|---|---|---|
| Total invoice count | =ROWS(tbl_invoices) | HW-01 |
| Unique customer count | =ROWS(UNIQUE(tbl_invoices[customer_id])) | HW-02 |
| Total payment count | =ROWS(tbl_payments) | HW-03 (related) |
| Total invoiced amount | =SUM(tbl_invoices[amount]) | HW-04 (related) |
| Total payments received | =SUM(tbl_payments[amount]) | HW-03 (related) |
| Open invoice count | =COUNTIF(tbl_invoices[balance], ">0") | HW-05 |
| Total open balance | =SUMIFS(tbl_invoices[balance], tbl_invoices[balance], ">0") | HW-04, HW-05 |
The per-bucket aging totals (HW-07) and per-flavor discrepancy counts (HW-08) already live on your aging_summary and discrepancies pivots from Parts 3 and 4, no extra cells needed; just know which numbers to cite when validating.
top_customers pivotThe template ships with an empty top_customers sheet. Build a PivotTable there to give yourself the top 10 customers by open balance, the HW-06 validator.
tbl_invoices (which has customer_name, segment, and balance after Part 2).customer_name (or customer_id if you prefer the ID-driven view).balance.summary sheet (or the relevant pivot). If they match, your SQL is right. If they don't, you have a bug, usually a missing filter, a wrong join, or an aggregation against the wrong column. This reconciliation is the accounting beat dressed as a SQL one.
summary sheet with at least the seven cells above filled in via formulas, plus the top_customers PivotTable populated. Both are part of the single lab_03.xlsx upload.
One file goes to the Lab 3 quiz in the Week 3 module. Use the plain filename listed below. Do not embed your name, university ID, or any other identifying information in the filename, Canvas attaches your identity automatically when you upload.
| Slot | File | Covers |
|---|---|---|
| 1 | lab_03.xlsx | Parts 1–5, the filled-in template, including the new summary sheet and the top_customers pivot |
summary sheet, the named tables intact, the analytical columns on invoices populated, and pivots on the three target sheets (aging_summary, discrepancies, top_customers). Renaming sheets or rebuilding from a blank workbook breaks the deterministic checks.
summary sheet is the validation target you'll quote from when you write each query.
Grading is generous and formative. The skill being checked is whether you can read a relational schema, fill in cash-app formulas at 20K rows, and produce numbers you can defend, not whether your formulas are optimal. AI use is expected, not penalized. Slow Excel is fine. What we look for: every analytical column on invoices referencing cutoff!$B$1 rather than a hard-coded date; the aging pivot present with all five buckets; all three discrepancy flavors surfaced with their counts and dollars; the top-10 customer pivot built; and a summary sheet whose cells point at your tables (not hard-coded values) so the work re-runs cleanly when data changes.
Summit Gear Co. is fictional. The dataset and the lab are custom-built for ACCTG 5150.