The Problem They Solved
In 2019, Margaret Mitchell, Timnit Gebru, and seven other researchers at Google published “Model Cards for Model Reporting” at the FAT* conference. Their observation was simple but powerful: machine learning models were being shared and deployed with almost no standardized documentation. A model might achieve 95% accuracy on one dataset, but who knew how it performed across different demographics, languages, or edge cases? The paper proposed a simple solution — a short, structured document that accompanies every model, like the nutrition facts on food packaging.
What They Proposed
A model card should include: model details (who built it, when, what type), intended use (what it’s for and what it’s not for), evaluation data (how it was tested), performance metrics (broken down by relevant subgroups), ethical considerations, and caveats and recommendations. The key insight was disaggregated evaluation — reporting performance separately across demographic groups rather than just one overall number.
Key insight: Before model cards, sharing a model was like selling a car without a spec sheet. You knew it was a car, but you had no idea about fuel efficiency, safety ratings, or whether it was street-legal in your country.