This page explains one idea you meet in Lab 2: a text file can be perfectly good data and still refuse to open correctly, because of its encoding. It is a five-minute read.
A text file is not really stored as letters. It is stored as bytes — numbers. An encoding is the rulebook that maps those bytes back to characters you can read. Open a file with the wrong rulebook and the text comes out wrong: garbled, spaced out with gaps, or it fails to open at all. The data was fine; the rulebook was wrong.
JEA Detail.txt in
Lab 2 uses.They are not interchangeable. A file written as UTF-16 has to be read as UTF-16.
Some files begin with a few invisible bytes called a byte-order mark (BOM) that announce the encoding. If your editor reads the file with the wrong rulebook, that BOM is what shows up as odd symbols at the very start of the first line. It is not corruption — it is the file telling you what it is.
In Lab 2, JEA Detail.txt is UTF-16. A tool that assumes UTF-8 hits
the very first bytes, cannot make sense of them, and stops — before it has read a
single row of data. Read the same file as a wrong single-byte encoding instead and
every character comes back with a gap between it (C o m p a n y). The
file is fine. The encoding was guessed wrong.
import pandas as pd
df = pd.read_csv("JEA Detail.txt", sep="\t", encoding="utf-16")
If utf-16 does not work, try utf-16-le,
utf-8, or latin-1 until the data reads cleanly.Related: pandas primer · Lab 2 Output Validator · back to the Lab 2 page