Python Foundations for Analytics — Interactive Guide

What Is Python Actually Doing?

Understanding the language behind your analytics

Why This Matters

  • You've used Python all semester for real analytics work
  • But what happens between import pandas as pd and your output?
  • Today: strip away the magic, see how the language works
  • Goal: read code confidently and debug without panic

Variables: Naming Things

  • A variable is a label attached to a value
  • revenue = 50000 stores the number and names it "revenue"
  • company = "Acme Corp" stores text and names it "company"
  • You already do this: df = pd.read_csv("data.csv")
  • df is just a name — you could call it my_data or potato

Types: Not All Data Is the Same

  • int — whole numbers: units_sold = 142
  • float — decimals: tax_rate = 0.065
  • str — text in quotes: account = "Cash"
  • bool — True or False: is_debit = True
  • The type determines what operations are allowed

Why Types Matter

python
# This crashes — you can't add text to a number
price = "29.99"
total = price + 10
# TypeError: can only concatenate str to str

# Fix: convert the string to a number first
price = float("29.99")
total = price + 10  # Works: 39.99

Lists: Ordered Collections

  • A list holds multiple values in order
  • accounts = ["Cash", "AR", "Revenue", "COGS"]
  • Access by position starting at zero:
  • accounts[0] returns "Cash"
  • accounts[2] returns "Revenue"
  • Zero-indexing is why pandas column positions start at 0

Dictionaries: Key-Value Lookups

  • A dict maps labels to values — like a chart of accounts
  • Use the label (key) to retrieve the value

Dictionaries: Key-Value Lookups

python
account = {
    "number": 1010,
    "name": "Cash",
    "balance": 50000,
    "type": "Asset"
}
account["name"]     # "Cash"
account["balance"]  # 50000

DataFrames Are Just Fancy Dicts

python
# A DataFrame is a dict of lists under the hood
data = {
    "Account": ["Cash", "AR", "Revenue"],
    "Balance": [50000, 12000, 75000]
}
df = pd.DataFrame(data)

# df["Account"] is dict-style key lookup
# That's why column names must be exact

For Loops: Repeating Actions

  • "For each item in this collection, do something"

For Loops: Repeating Actions

python
accounts = ["Cash", "AR", "Revenue"]

for account in accounts:
    print(f"Processing: {account}")

# Output:
# Processing: Cash
# Processing: AR
# Processing: Revenue
  • This is what .iterrows() does with your DataFrame rows

If/Else: Making Decisions

python
balance = -15000

if balance > 0:
    entry_type = "Debit"
elif balance < 0:
    entry_type = "Credit"
else:
    entry_type = "Zero"

print(entry_type)  # "Credit"
  • You use this logic when you filter DataFrames
  • df[df["Balance"] > 0] is an if-check on every row

Functions: Reusable Recipes

  • A function is a named set of instructions you can reuse

Functions: Reusable Recipes

python
def calculate_tax(amount, rate=0.065):
    """Calculate sales tax for a given amount."""
    return amount * rate

tax1 = calculate_tax(1000)        # 65.0
tax2 = calculate_tax(5000, 0.08)  # 400.0
  • rate=0.065 is a default — used when you don't specify one
  • pd.read_csv() is just a function someone else wrote

What import Actually Does

  • import pandas as pd means three things:
  • Find a library called pandas on this computer
  • Load all its functions into memory
  • Let me use the shortcut pd instead of pandas
  • pd.read_csv() calls the read_csv function from pandas
  • Libraries are just collections of functions someone published
  • You could write everything pandas does yourself — it would just take years

Errors Are Your Friends

  • Python errors tell you exactly what went wrong
  • Always read from the bottom up:
  • Last line: error type and message
  • Lines above: file name and line number
  • The five you will see 90% of the time:
  • NameError — typo in a variable name
  • TypeError — wrong data type
  • KeyError — column or key does not exist
  • IndexError — position out of range
  • FileNotFoundError — wrong file path

NameError: The Typo Detector

python
revenue = 75000
profit_margin = 0.15
profit = revnue * profit_margin
# NameError: name 'revnue' is not defined

# Python won't guess what you meant
# Check your spelling — that's the fix

TypeError: Mismatched Types

python
quantity = input("Enter quantity: ")  # returns "10"
price = 5.99
total = quantity * price
# This runs but gives "5.995.995.99..."

# input() always returns a string!
total = int(quantity) * price  # 59.9

KeyError: Column Not Found

python
df["Revnue"]
# KeyError: 'Revnue'

# Step 1: Check what columns actually exist
print(df.columns.tolist())
# ['Revenue', 'Expenses', 'Net Income']

# Step 2: Fix the typo
df["Revenue"]  # Works

What You Now Know

  • Variables name your data so you can reuse it
  • Types determine what operations are allowed
  • Lists and dicts organize multiple values
  • Loops repeat actions, if/else makes decisions
  • Functions package reusable logic
  • Import loads someone else's functions
  • Errors are messages — read them bottom-up

Next Wednesday: Debug Lab

  • You will receive a broken Python script
  • Your job: fix it using an LLM as your assistant
  • For each bug you find, document:
  • What broke and what error you saw
  • What the LLM suggested
  • Whether the suggestion was correct
  • Come prepared with access to ChatGPT or Claude
1 / 1
Key Concept

This guide walks you through Python fundamentals using accounting examples. Work through each section and test yourself with the quizzes. Your progress is saved automatically.

Key Concept

Python is a general-purpose programming language. Pandas, NumPy, and matplotlib are add-on libraries — Python itself is simpler than you think.

Key Concept

Variable names are arbitrary labels you choose. df is just a convention — descriptive names like sales_data are actually better practice.

What does df = pd.read_csv('data.csv') do with the variable name df?

Which Python type would you use to store the value 0.065 representing a tax rate?

Navigate through the slides using the arrow keys or buttons below.

Key Concept

When data comes from files or user input, it is usually a string. You must convert it to a number before doing math. This is the number one source of subtle bugs.

What happens when you run: price = '50' + 10?

Key Concept

Lists use zero-based indexing: the first item is position 0, not 1. This is a universal programming convention.

Given items = ['Cash', 'AR', 'Revenue', 'COGS'], what does items[3] return?

Navigate through the slides using the arrow keys or buttons below.

Navigate through the slides using the arrow keys or buttons below.

Key Concept

A dictionary is like a lookup table: you provide the label (key) and get back the associated value. This is exactly how DataFrame column access works.

Given the account dictionary above, what does account['type'] return?

Navigate through the slides using the arrow keys or buttons below.

Key Concept

A pandas DataFrame is built on the same dict concept: column names are keys, column data are values (stored as lists/arrays). That is why df["column"] uses the same bracket syntax as a dict.

Why does df['Revnue'] fail with a KeyError when the column is actually named 'Revenue'?

Navigate through the slides using the arrow keys or buttons below.

Navigate through the slides using the arrow keys or buttons below.

Key Concept

The for loop pattern is: for ITEM in COLLECTION. Python pulls one item at a time, runs the indented code, then moves to the next. The variable name after for is your choice.

Given: for account in ["Cash", "AR", "Revenue"]: print(account) — how many times does print() execute?

Navigate through the slides using the arrow keys or buttons below.

Key Concept

if/elif/else checks conditions top to bottom and runs the first matching block. It is the same logic as Excel's IF() function, just written differently.

Given: if balance > 0: "Debit" / elif balance < 0: "Credit" / else: "Zero" — if balance = 0, which branch runs?

Navigate through the slides using the arrow keys or buttons below.

Navigate through the slides using the arrow keys or buttons below.

Key Concept

Functions let you write logic once and reuse it. Parameters with = have default values. return sends a result back to the caller.

What does calculate_tax(2000) return?

Key Concept

import loads external code (a library) so you can use its functions. as pd creates a shorter alias. Libraries are just organized collections of functions that other developers wrote and shared.

What does the 'as pd' part of 'import pandas as pd' do?

Key Concept

Read error messages from the bottom up. The last line tells you WHAT went wrong. The lines above tell you WHERE it happened (file name and line number). Errors are helpful messages, not punishment.

You see: NameError: name 'totl' is not defined. What is the most likely cause?

Navigate through the slides using the arrow keys or buttons below.

Key Concept

input() always returns a string, even if the user types a number. String times number repeats the string instead of multiplying. Always convert types before doing math.

What does 'ha' * 3 produce in Python?

Navigate through the slides using the arrow keys or buttons below.

Key Concept

When you get a KeyError on a DataFrame, run df.columns.tolist() to see the exact column names. Watch for typos, extra spaces, and capitalization differences.

You get KeyError: 'Net income' but you know the column exists. What should you check first?

A pandas DataFrame is most similar to which Python data structure?

Key Concept

Wednesday's lab combines everything from today: reading errors, understanding types, checking variable names, and verifying column lookups — all with an LLM as your pair programmer.

When debugging with an LLM, why is it important to document whether the suggestion was correct?

0 / 0 correct