Fairness Tools
Several open-source tools make fairness measurement practical: Fairlearn (Microsoft) — Python library for assessing and improving fairness. Provides fairness metrics, visualization dashboards, and mitigation algorithms. AIF360 (IBM) — AI Fairness 360, a comprehensive toolkit with 70+ fairness metrics and 10+ mitigation algorithms. Aequitas (University of Chicago) — bias audit toolkit focused on decision-making systems. What-If Tool (Google) — visual interface for exploring model fairness without code. All of these compute the metrics we’ve discussed (demographic parity, equalized odds, calibration) and provide visualizations to help communicate findings to stakeholders.
Fairlearn Example
# Fairlearn: measure fairness metrics
from fairlearn.metrics import (
MetricFrame,
demographic_parity_difference,
equalized_odds_difference,
)
from sklearn.metrics import accuracy_score
# Compute metrics by group
mf = MetricFrame(
metrics=accuracy_score,
y_true=y_test,
y_pred=y_pred,
sensitive_features=gender,
)
print(mf.by_group)
# male: 0.95
# female: 0.87
# Demographic parity difference
dp = demographic_parity_difference(
y_test, y_pred,
sensitive_features=gender)
# dp = 0.15 (15% gap)
Key insight: Fairlearn is the most practical starting point for most teams. It integrates with scikit-learn, provides clear visualizations, and includes both measurement and mitigation tools. Start by measuring — you can’t improve what you don’t measure.