Lab 3, Summit Gear AR

0.1, The origin of the data

Summit Gear Co. is a fictional wholesale outdoor-equipment distributor based in Salt Lake City. Four customer segments, Outdoor Retail, School District, Resort, Outfitter. The CFO reads an AR aging report on Monday morning to decide which customers to call. You are building that report.

A few details worth knowing up front:

Net terms are a customer-level attribute. A given customer always invoices on the same terms, terms_id lives on customers, not on invoices. To find an invoice's terms you join through the customer.
Invoice status is stored on the invoice. Someone in accounting flips it when they think an invoice has been paid (or cancelled). This stored status is not always correct. Finding the disagreements is Part 4.
Most payments are 1:1 with an invoice. A handful are edge cases, underpayments, overpayments, and one split, that you discover when something in your aging report looks weird.

Using AI on this lab is fine and expected. Direct an LLM to draft an Excel formula, debug a broken one, or explain a pivot step. The skill being graded is whether you can read the result and trust it, not whether you produced it unaided.

Heads up for later weeks. This lab uses a simplified grain: every invoice has at most one payment row (with one designed exception you'll discover). Real B2B AR systems often have one-to-many and many-to-many invoice-to-payment relationships, a deposit split across several invoices, or one invoice paid by several deposits. Handling those properly takes a junction table and more careful JOINs. We'll cover that in Week 5+. For now, you're seeing the simpler case on purpose.

About Void status, and a known limitation of the dataset. Void in this dataset means cancelled, the invoice was withdrawn after issue. The stored row stays in the table for the audit trail; the balance does not. Voided invoices are excluded from the aging report (no money is owed) and from cash-application sums.

One thing the data does not tell you: when the void was posted. Our invoices table records the current status only, there is no voided_at timestamp. That matters in Part 4: when you see a Void invoice with a payment against it, the data alone cannot say whether the payment came before or after the void. You can only say the two disagree. The diagnostic move is to flag the row for the controller and pull the supporting documents.

One more note: this dataset has no effective-dated status log. Stored status is the current value only, if an invoice is voided today, we lose the ability to ask "what status was it on 2025-12-31?" Production AR systems solve this with effective-dated status logs; we don't have those here. The discipline (which the Homework exercises): when you freeze a historical aging snapshot, freeze it as values, not as a recomputing formula, so the number you cited stays cited.

0.2, Pre-lab interactives

Five pieces, all in your browser. Strongly recommended before you start the lab if SQL is new to you.

SQL 1/4SQL Fundamentals trainer, SELECT, FROM, WHERE, ORDER BY, LIMIT. Foundation for everything that follows.
SQL 2/4JOIN trainer, INNER vs LEFT, the Venn diagram, the VLOOKUP/XLOOKUP parallel, and the 1:n trap demo (see below). Walk this before the Aggregate trainer: the GROUP-BY-through-a-JOIN lesson there assumes the join already makes sense.
SQL 3/4Aggregate trainer, GROUP BY, SUM, COUNT, AVG, HAVING. Animated row-to-summary collapse.
SQL 4/4Aging builder, the headline trainer. Builds on joins and aggregates. Lands exactly the query you'll recreate in HW 3.
ExcelExcel concepts trainer, eight lessons covering SUMIFS at 20K rows, single XLOOKUP, chained XLOOKUP through three tables, IFS for aging buckets, the $A$1 cutoff-cell pattern, and pivots. Each lesson shows the formula plus a parallel Copilot-for-Excel prompt. Walk this if any of the Part 3–4 formulas are new to you.

Tight on time? Minimum on-ramp: walk aging_builder.html (the SQL trainer that lands exactly the query you'll recreate in HW 3) and walk Lessons 1, 3, and 5 of the Excel concepts trainer (SUMIFS, chained XLOOKUP, the cutoff cell). The rest is reference you can pull up as you hit each part.

The 1:n trap. When two rows on the right of a join match the same row on the left, your row count doubles, and so does any SUM you compute. The JOIN trainer demonstrates this with a one-button toggle. You will see it again in HW 3 if you aggregate on the wrong side of a SQL join.

0.3, Cheat sheets

Keep these open in tabs while you work.

HTMLSQLite cheat sheet, one-page reference covering every syntax in this lab.
MDSQLite cheat sheet (Markdown), the same content in a form you can paste into ChatGPT.

0.4, The kickstart prompt (TLDR)

If you'd rather start in the LLM and pull context as you go, the button below copies a minimal-essentials prompt to your clipboard, the lab's idea, the five-table schema in compact form, the single Excel deliverable, and a link to the cheat sheet. Paste it into ChatGPT, Claude, or Gemini and you have enough to land the headline parts of the lab even if you have not walked the trainers.

Use this as an escape valve, not a default path. The trainers and cheat sheets are how the SQL lands; the kickstart prompt is for when you are stuck and need to start somewhere.

Summit Gear's AR covers 2024-01-01 → 2026-04-30, about 28 months. Five tables, ~20,000 invoices, ~18,700 payments. You work the whole lab in one pre-loaded Excel template, which you download in Part 1, with the five source tables already loaded as named sheets. The raw CSVs are also hosted if you want to inspect a table on its own, but you do not need them.

The schema

About PK and FK. A primary key (PK) is the column that uniquely identifies each row of a table. customer_id is the PK of customers: every customer has one, and no two customers share it. A foreign key (FK) is a column that points to another table's PK. invoices.customer_id is an FK pointing at customers.customer_id; that is how you know which customer an invoice belongs to. PK and FK are the glue that turns a pile of tables into a relational database. The arrows in the diagram below show those FK relationships.

                    invoice_terms                invoice_statuses
                    +-----------+                +-----------+
                    | terms_id  |                | status_id |
                    | terms_name|                | status_name|
                    | net_days  |                | description|
                    +-----+-----+                +-----+-----+
                          |                            |
                          | (terms_id FK)              | (status_id FK)
                          v                            v
   +-----------+    +-----+-----+              +-----+-----+
   | customers +----> customer_id|              | invoices  |
   +-----------+    | terms_id   |<-------------+ customer_id|
                    | segment    |              | status_id  |
                    +-----+-----+               | amount     |
                          ^                     | due_date   |
                          |                     +-----+-----+
                          |                           |
                          |                           | (invoice_id FK)
                          |                           v
                          |                     +-----+-----+
                          |                     | payments  |
                          +---------------------+ invoice_id|
                                                | amount    |
                                                | pay_date  |
                                                +-----------+

Table	Grain	Key columns
`customers`	one row per customer	`customer_id` PK; `terms_id` FK → `invoice_terms`
`invoices`	one row per invoice (~20,000)	`invoice_id` PK; `customer_id` FK → `customers`; `status_id` FK → `invoice_statuses`
`payments`	one row per payment (~18,700)	`payment_id` PK; `invoice_id` FK → `invoices`
`invoice_terms`	one row per terms code (4 rows)	`terms_id` PK
`invoice_statuses`	one row per stored status (4 rows)	`status_id` PK

Net terms are on customers, not invoices. A given customer always invoices on the same terms. To find an invoice's terms, JOIN invoices → customers → invoice_terms. Three tables to answer one question.

As-of dates that matter in this lab

Date	What it's for
2026-04-30	The lab's primary as-of date. The aging report references this.
2024-01-01 → 2026-04-30	The invoice-date range.

Date

What it's for

2026-04-30

The lab's primary as-of date. The aging report references this.

2024-01-01 → 2026-04-30

The invoice-date range.

Download the data

One pre-loaded Excel template is all you need. The raw CSVs are also hosted if you want to inspect a table on its own or take the CSV-import path described in Part 1.

XLSXlab_03_template.xlsx, the template workbook you fill in. Nine sheets: a parameters sheet (cutoff), five source tables (customers, invoices, payments, invoice_terms, invoice_statuses), and three empty target sheets for pivots (aging_summary, discrepancies, top_customers).
CSVcustomers.csv · invoices.csv · payments.csv · invoice_terms.csv · invoice_statuses.csv, the raw tables, one per CSV. Not needed if you use the template.

How to introspect the tables

In Excel, open the template (or a CSV): the row count is in the bottom-left of the window, the column headers are row 1, and clicking a cell shows the raw value.

Download lab_03_template.xlsx and open it. Save it as lab_03.xlsx right away, this one workbook is the entire Excel deliverable. Do not put your name or university ID in the filename; Canvas attaches your identity automatically when you upload.

The nine sheets

The template ships with nine sheets, in this order. Do not rename them. The grader reads the workbook by sheet name.

Sheet What's on it

cutoff Parameters. B1 is the cutoff date (text, ISO format) that drives every formula in the workbook. Currently 2026-04-30.

customers ~1,500 rows, fully populated. Named table: tbl_customers.

invoices ~20,015 rows. The six source columns are populated; the nine analytical columns to the right are empty, you fill them in Part 2. Named table: tbl_invoices.

payments ~18,667 rows, fully populated. Named table: tbl_payments.

invoice_terms 4 rows. Named table: tbl_invoice_terms.

invoice_statuses 4 rows. Named table: tbl_invoice_statuses.

aging_summary Empty target. You build a pivot here in Part 3.

discrepancies Empty target. You build a pivot here in Part 4.

top_customers Empty target. You build a pivot here in Part 5.

Sheet	What's on it
`cutoff`	Parameters. `B1` is the cutoff date (text, ISO format) that drives every formula in the workbook. Currently `2026-04-30`.
`customers`	~1,500 rows, fully populated. Named table: `tbl_customers`.
`invoices`	~20,015 rows. The six source columns are populated; the nine analytical columns to the right are empty, you fill them in Part 2. Named table: `tbl_invoices`.
`payments`	~18,667 rows, fully populated. Named table: `tbl_payments`.
`invoice_terms`	4 rows. Named table: `tbl_invoice_terms`.
`invoice_statuses`	4 rows. Named table: `tbl_invoice_statuses`.
`aging_summary`	Empty target. You build a pivot here in Part 3.
`discrepancies`	Empty target. You build a pivot here in Part 4.
`top_customers`	Empty target. You build a pivot here in Part 5.

The `cutoff` sheet

One cell drives the whole workbook. cutoff!$B$1 is the as-of date. Every formula you write in Part 2 references it. When you change it, every formula on invoices recomputes, and every pivot built on tbl_invoices follows on refresh. Hard-coding 2026-04-30 into formulas instead misses the lesson. In HW 3 you recreate these same as-of numbers in SQL, so a cleanly parameterized workbook is what lets the two tie out.

Optional CSV-import path. If you want to prove you can rebuild this from scratch, for portfolio reasons, or because you prefer Power Query, the five raw CSVs are in the "Download the data" section above. You'd import each as a table, set the same names (tbl_customers, tbl_invoices, etc.), add a cutoff sheet, and add the nine analytical columns to invoices yourself. It is slower, and the rest of the lab assumes you used the template. Pick one path and stick with it.

Part 1 done when: the template is open, saved as lab_03.xlsx, and you have located cutoff!$B$1, the cell every formula in Part 2 will reference.

On the invoices sheet, columns G through O are headers only, the body is empty. You write one formula per column in the first data row; the Excel Table auto-fills the column down all ~20,000 rows. Use structured references everywhere: tbl_invoices[@[invoice_id]], not A2. Structured references survive table extensions and reorderings; cell references do not. Note the double brackets: [@[column_name]] is the safe form for the current row's value, use it consistently.

Touch each formula in this order: cash application first, then the date math and aging bucket, then the lookup columns. Each block depends on the one above, balance needs paid_to_date, aging_bucket needs days_past_due. The lookup columns are display polish and come last.

Heads up: Excel data type quirks. Depending on your Excel version and import path, some columns in the template can come through as text rather than dates or numbers. If a formula returns #VALUE!, returns 0 when you expect a real total, or your row counts do not match the expected numbers from the summary sheet, this is the most common cause.

Two ways to fix it:

Copilot in Excel. Select the misbehaving column and ask Copilot to convert it to a date or a number. It will surface the right Power Query step or formula.
The add-column-and-zero trick. In a blank column to the right of the table, enter =[@column] + 0. The +0 forces Excel to coerce text into a number, and text-dates into real dates. Copy the new column, then paste it back over the original column as values (Paste Special, Values). The column now has the correct data type.

Sean will demo both moves in class. If a formula below is not behaving as described, check column data types before chasing the formula itself.

2.1, Cash application

Three columns that turn the raw payments into per-invoice cash applied, outstanding balance, and a derived status. Write the formula once in the first data row; the Excel Table auto-fills the column down all ~20,000 rows.

Column Formula + what it does

paid_to_date =SUMIFS(tbl_payments[amount], tbl_payments[invoice_id], [@[invoice_id]], tbl_payments[payment_date], "<="&cutoff!$B$1)

For each invoice, sum every payment row that points back at that invoice and arrived on or before the cutoff date. The "<="&cutoff!$B$1 filter is what lets the as-of-date cell drive the whole report, change the cutoff and every paid_to_date recomputes.

balance =[@amount] - [@[paid_to_date]]

What's still owed. If a customer overpaid, this goes slightly negative (a credit balance), the next column treats anything within a penny as paid.

derived_status =IF([@balance]<0.01, "Paid", IF([@[paid_to_date]]>0, "Partial", "Open"))

Nested IF, evaluated left to right: balance essentially zero → Paid; otherwise some payment exists → Partial; otherwise no payment → Open. The 0.01 is a rounding-tolerance margin (treats $0.003 of float drift as zero). This column drives both the aging filter in Part 3 and the discrepancy hunt in Part 4.

Column	Formula + what it does
`paid_to_date`	`=SUMIFS(tbl_payments[amount], tbl_payments[invoice_id], [@[invoice_id]], tbl_payments[payment_date], "<="&cutoff!$B$1)` For each invoice, sum every payment row that points back at that invoice and arrived on or before the cutoff date. The `"<="&cutoff!$B$1` filter is what lets the as-of-date cell drive the whole report, change the cutoff and every paid_to_date recomputes.
`balance`	`=[@amount] - [@[paid_to_date]]` What's still owed. If a customer overpaid, this goes slightly negative (a credit balance), the next column treats anything within a penny as paid.
`derived_status`	`=IF([@balance]<0.01, "Paid", IF([@[paid_to_date]]>0, "Partial", "Open"))` Nested IF, evaluated left to right: balance essentially zero → Paid; otherwise some payment exists → Partial; otherwise no payment → Open. The `0.01` is a rounding-tolerance margin (treats $0.003 of float drift as zero). This column drives both the aging filter in Part 3 and the discrepancy hunt in Part 4.

At 20,000 rows the paid_to_date SUMIFS is slow. Expect Excel to pause for several seconds when you fill the column. That pause is the point. You are doing by hand what SQL does in three lines.

If SUMIFS at scale is new to you, walk Excel concepts trainer Lesson 1 first.

2.2, Date math and aging

Two columns that turn the due date into a bucket label. These formulas assume due_date is stored as a real Excel date. If it came through as text, the subtraction will throw #VALUE!. See the data-type heads-up above for the fix.

Column Formula + what it does

days_past_due =cutoff!$B$1 - [@[due_date]]

Straight date subtraction. Positive number = past due (and how many days); zero or negative = not yet due.

aging_bucket =IFS([@[days_past_due]]<=0,"0. Current", [@[days_past_due]]<=30,"1. 01-30", [@[days_past_due]]<=60,"2. 31-60", [@[days_past_due]]<=90,"3. 61-90", TRUE,"4. 90+")

IFS walks the conditions top-to-bottom and stops at the first true one; TRUE at the end catches everything else. The "0. " / "1. " / "2. " prefixes make the buckets sort correctly in the pivot table without manually re-ordering rows.

Column	Formula + what it does
`days_past_due`	`=cutoff!$B$1 - [@[due_date]]` Straight date subtraction. Positive number = past due (and how many days); zero or negative = not yet due.
`aging_bucket`	`=IFS([@[days_past_due]]<=0,"0. Current", [@[days_past_due]]<=30,"1. 01-30", [@[days_past_due]]<=60,"2. 31-60", [@[days_past_due]]<=90,"3. 61-90", TRUE,"4. 90+")` `IFS` walks the conditions top-to-bottom and stops at the first true one; `TRUE` at the end catches everything else. The `"0. " / "1. " / "2. "` prefixes make the buckets sort correctly in the pivot table without manually re-ordering rows.

If IFS and absolute references like cutoff!$B$1 are new, walk Lessons 4 and 5 of the Excel concepts trainer.

2.3, Lookup columns (display polish)

Setup step, add `payment_terms` to the `customers` sheet

The customers table only stores terms_id (a foreign key into invoice_terms), not the actual terms name. Rather than nesting two lookups every time you want the terms name on an invoice row, add a helper column called payment_terms to the customers sheet first. With it in place, every invoices-side lookup below stays single-hop.

Sheet Column to add Formula + what it does

customers payment_terms =VLOOKUP([@[terms_id]], tbl_invoice_terms, 2, FALSE)

For each customer row, look up that customer's terms_id in tbl_invoice_terms, return column 2 (the terms name, "Net 30", "Net 45", etc.). The trailing FALSE forces an exact match (VLOOKUP defaults to approximate, which is rarely what you want for an ID lookup).

Sheet	Column to add	Formula + what it does
`customers`	`payment_terms`	`=VLOOKUP([@[terms_id]], tbl_invoice_terms, 2, FALSE)` For each customer row, look up that customer's `terms_id` in `tbl_invoice_terms`, return column 2 (the terms name, "Net 30", "Net 45", etc.). The trailing `FALSE` forces an exact match (VLOOKUP defaults to approximate, which is rarely what you want for an ID lookup).

The helper-column pattern. When a lookup keeps getting nested, materialize the intermediate as a column on the source table. The customers row now carries both terms_id AND payment_terms, the second exists purely to make downstream formulas one hop instead of two. Same pattern shows up in SQL as a view, and in pandas as a derived column.

Now fill the four lookups on the `invoices` sheet

Four columns to fill via lookups against tbl_customers and tbl_invoice_statuses. Two worked XLOOKUP examples below; you write the other two yourself, following the same pattern with a different return column or lookup table.

Column Formula + what it does

customer_name =XLOOKUP([@[customer_id]], tbl_customers[customer_id], tbl_customers[customer_name])

Look up the invoice's customer_id in the customers table, return the matching customer_name. Single-hop XLOOKUP, this is the pattern segment and status_name below follow.

terms_name =XLOOKUP([@[customer_id]], tbl_customers[customer_id], tbl_customers[payment_terms], FALSE)

Same XLOOKUP shape as customer_name, but the return column is the payment_terms helper column you just added to customers in the setup step above. The trailing FALSE is XLOOKUP's if_not_found argument, returns FALSE instead of #N/A on a miss, which is defensive polish since every invoice should match a customer.

segment You write this one. Same single-hop XLOOKUP pattern as customer_name, same lookup key and same lookup table, just a different return column on tbl_customers.

status_name You write this one. Same single-hop XLOOKUP shape, but the lookup key is [@[status_id]] and the table is tbl_invoice_statuses.

Column	Formula + what it does
`customer_name`	`=XLOOKUP([@[customer_id]], tbl_customers[customer_id], tbl_customers[customer_name])` Look up the invoice's `customer_id` in the customers table, return the matching `customer_name`. Single-hop XLOOKUP, this is the pattern `segment` and `status_name` below follow.
`terms_name`	`=XLOOKUP([@[customer_id]], tbl_customers[customer_id], tbl_customers[payment_terms], FALSE)` Same XLOOKUP shape as `customer_name`, but the return column is the `payment_terms` helper column you just added to customers in the setup step above. The trailing `FALSE` is XLOOKUP's if_not_found argument, returns `FALSE` instead of `#N/A` on a miss, which is defensive polish since every invoice should match a customer.
`segment`	You write this one. Same single-hop XLOOKUP pattern as `customer_name`, same lookup key and same lookup table, just a different return column on `tbl_customers`.
`status_name`	You write this one. Same single-hop XLOOKUP shape, but the lookup key is `[@[status_id]]` and the table is `tbl_invoice_statuses`.

Why VLOOKUP on customers but XLOOKUP on invoices? VLOOKUP works when the lookup key is the first column of the target table, which is true for tbl_invoice_terms (terms_id is column 1). XLOOKUP handles arbitrary column positions and is more readable for the invoices-side lookups where you're naming columns explicitly. Either function works on either side; the choice here is mostly to show you both in their natural homes.

If single XLOOKUP or VLOOKUP is new to you, walk Excel concepts trainer Lesson 3 first.

Part 2 done when: all nine analytical columns are filled down the table, a spot-check on three invoices reconciles by hand (you can find the matching payment rows and confirm balance and aging_bucket are right), and the formulas all reference cutoff!$B$1 rather than hard-coded dates.

On the aging_summary sheet, insert a pivot table from tbl_invoices.

Pivot area Field

Rows aging_bucket

Values COUNT of invoice_id, SUM of balance

Filter derived_status, keep Open and Partial; drop Paid and Void

Pivot area	Field
Rows	`aging_bucket`
Values	`COUNT of invoice_id`, `SUM of balance`
Filter	`derived_status`, keep Open and Partial; drop Paid and Void

The buckets will sort themselves, the "0. " / "1. " prefixes from your IFS formula in Part 2.2 mean the rows appear in order automatically (no manual reorder). Format the SUM column as currency, no decimals.

Why filter on derived_status and not on balance > 0? Excel's pivot Filters area doesn't reliably support numerical conditions like "greater than zero", it expects categorical values. Filtering to Open + Partial keeps every invoice that still owes money and drops the noise (Paid, Void) cleanly. Same outcome, a filter the pivot engine actually likes.

Deliverable for Part 3. The aging_summary sheet with the aging pivot at the current cutoff date, ordered Current → Over 90, and the SUM column formatted as currency.

Note for the curious: the workbook's cutoff!$B$1 cell drives every analytical formula in Part 2, so you could change it (say, to 2025-12-31) and watch the pivot recompute. That's the parameterization payoff. The lab doesn't ask you to change it here, but that as-of-date discipline is what keeps your numbers reproducible when you rebuild them in SQL for HW 3.

Discrepancy Check: We often see mismatches between data that can have a material impact. We want to get in the habit of auditing outputs and ensuring they meet expected outcomes. The stored invoice status is what a person in the ERP recorded. The payments table is what actually happened. When the two disagree, the analyst's job is to surface the difference and the dollars at stake.

Real-world note. Summit Gear's books, like every B2B AR system, also have credit memos that reduce a balance without a cash payment. This lab simplifies, there are no credit memos here. When you find a real "paid but no payment row" case in industry, the first thing to check is whether a credit memo was applied. Outside this lab the discrepancy categories are a longer list.

On the discrepancies sheet, insert a pivot from tbl_invoices in a matrix layout: stored status across one axis, derived status across the other:

Pivot area Field

Rows status_name (the stored status from the ERP)

Columns derived_status (what the payments actually do)

Values COUNT of invoice_id

Filter None, show every cell

Pivot area	Field
Rows	`status_name` (the stored status from the ERP)
Columns	`derived_status` (what the payments actually do)
Values	`COUNT of invoice_id`
Filter	None, show every cell

You should see a 4×4 grid: four stored statuses down the rows, four derived statuses across the columns, a count in every cell. The diagonal cells (Open/Open, Paid/Paid, Partial/Partial, Void/Void) are where the stored status agrees with the payment-derived state, most rows live there. The off-diagonal cells are the discrepancies.

The three off-diagonal cells you should expect to see populated:

stored = Paid, derived = Open, the ERP thinks it's paid; the cash says no payment came in. (Paid-but-not.)
stored = Open, derived = Paid, the cash arrived; nobody flipped the status. (Open-but-paid.)
stored = Void, derived = Paid (or Partial), the ERP says this invoice is cancelled, yet cash came in against it. (Void-with-payment.) From the data alone we can't say whether the void came before or after the payment (no voided_at timestamp, see the §0.1 callout). The diagnostic move: flag for the controller and pull the supporting documents.

You do not need to filter the pivot to highlight the mismatches. The grid layout does the work. The diagonal is the agreement. The off-diagonal cells are the discrepancies.

Deliverable for Part 4. The discrepancies sheet with the 4×4 status_name × derived_status grid (COUNT of invoice_id in each cell), built inside your lab_03.xlsx workbook. There is nothing separate to submit for Part 4, the pivot in the workbook is the deliverable. As you read the grid, note which off-diagonal cell is the most concerning and why, that judgment is the analytical point of the part.

One more sheet on the workbook before you submit. The summary sheet consolidates your headline numbers in one place, counts, totals, the open-balance figure, the top-10 customers, so you have a validation target when you write SQL for HW 3. Each row of this sheet maps to a homework question: when you run the SQL, your result should match the number on this sheet, or you have a bug somewhere.

5.1, Build a `summary` sheet

Add a new sheet named summary (right-click any tab, then Insert or New Sheet). Lay out two columns: Label in column A, Value in column B. Use formulas that reference your named tables and the Part 2 columns, not hard-coded numbers. If you change cutoff!$B$1, every value here should follow.

Label (col A) Value (col B), formula shape Validates HW#

Total invoice count =ROWS(tbl_invoices) HW-01

Unique customer count =ROWS(UNIQUE(tbl_invoices[customer_id])) HW-02

Total payment count =ROWS(tbl_payments) HW-03 (related)

Total invoiced amount =SUM(tbl_invoices[amount]) HW-04 (related)

Total payments received =SUM(tbl_payments[amount]) HW-03 (related)

Open invoice count =COUNTIF(tbl_invoices[balance], ">0") HW-05

Total open balance =SUMIFS(tbl_invoices[balance], tbl_invoices[balance], ">0") HW-04, HW-05

Label (col A)	Value (col B), formula shape	Validates HW#
Total invoice count	`=ROWS(tbl_invoices)`	HW-01
Unique customer count	`=ROWS(UNIQUE(tbl_invoices[customer_id]))`	HW-02
Total payment count	`=ROWS(tbl_payments)`	HW-03 (related)
Total invoiced amount	`=SUM(tbl_invoices[amount])`	HW-04 (related)
Total payments received	`=SUM(tbl_payments[amount])`	HW-03 (related)
Open invoice count	`=COUNTIF(tbl_invoices[balance], ">0")`	HW-05
Total open balance	`=SUMIFS(tbl_invoices[balance], tbl_invoices[balance], ">0")`	HW-04, HW-05

The per-bucket aging totals (HW-07) and per-flavor discrepancy counts (HW-08) already live on your aging_summary and discrepancies pivots from Parts 3 and 4, no extra cells needed; just know which numbers to cite when validating.

5.2, Build the `top_customers` pivot

The template ships with an empty top_customers sheet. Build a PivotTable there to give yourself the top 10 customers by open balance, the HW-06 validator.

Source: tbl_invoices (which has customer_name, segment, and balance after Part 2).
Rows: customer_name (or customer_id if you prefer the ID-driven view).
Values: Sum of balance.
Filter: Use Value Filter → Greater Than → 0 on the row labels so closed invoices don't pull the totals down.
Sort: Descending by Sum of balance.
Top N: Value Filter → Top 10 Filter, by Sum of balance.

Why this matters. In HW 3 you'll write SQL against the same dataset on the SQL Workbench. The first thing you do for every SQL result is compare it to your Excel summary sheet (or the relevant pivot). If they match, your SQL is right. If they don't, you have a bug, usually a missing filter, a wrong join, or an aggregation against the wrong column. This reconciliation is the accounting beat dressed as a SQL one.

Deliverable for Part 5. The summary sheet with at least the seven cells above filled in via formulas, plus the top_customers PivotTable populated. Both are part of the single lab_03.xlsx upload.

Lab 3, Data Sources: AR, Aging, and Cash Application

The story of this lab (start here)

What you'll do

0.1, The origin of the data

0.2, Pre-lab interactives

0.3, Cheat sheets

0.4, The kickstart prompt (TLDR)

The schema

As-of dates that matter in this lab

Download the data

How to introspect the tables

The nine sheets

The `cutoff` sheet

2.1, Cash application

2.2, Date math and aging

2.3, Lookup columns (display polish)

Setup step, add `payment_terms` to the `customers` sheet

Now fill the four lookups on the `invoices` sheet

5.1, Build a `summary` sheet

5.2, Build the `top_customers` pivot

Submission

Grading

Need help?

Lab 3, Data Sources: AR, Aging, and Cash Application

The story of this lab (start here)

What you'll do

0.1, The origin of the data

0.2, Pre-lab interactives

0.3, Cheat sheets

0.4, The kickstart prompt (TLDR)

The schema

As-of dates that matter in this lab

Download the data

How to introspect the tables

The nine sheets

The cutoff sheet

2.1, Cash application

2.2, Date math and aging

2.3, Lookup columns (display polish)

Setup step, add payment_terms to the customers sheet

Now fill the four lookups on the invoices sheet

5.1, Build a summary sheet

5.2, Build the top_customers pivot

Submission

Grading

Need help?

The `cutoff` sheet

Setup step, add `payment_terms` to the `customers` sheet

Now fill the four lookups on the `invoices` sheet

5.1, Build a `summary` sheet

5.2, Build the `top_customers` pivot