Six Stages of Every ML Project
1. Data Collection: Gather raw data. A house-price model needs thousands of sales records with prices, sizes, locations, and dates.
2. Feature Engineering: Transform raw data into numbers the model can use. Convert “3 bedrooms, downtown, built 1990” into a numerical vector [3, 0.95, 34]. This is often the highest-ROI step.
3. Model Selection: Choose an algorithm. Linear regression for simple relationships, random forests for complex ones, SVMs for high-dimensional data. Each has different assumptions.
4. Loss Function: Define what “wrong” means mathematically. For regression, typically Mean Squared Error. For classification, cross-entropy loss. The loss function is the model’s report card.
5. Optimization: Adjust model parameters to minimize the loss. Gradient descent is the workhorse: compute the gradient, take a step downhill, repeat.
6. Evaluation: Test on data the model has never seen. If it performs well on training data but poorly on test data, you’ve overfit.
The Pipeline in scikit-learn
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# 1. Data: X = features, y = target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# 2. Feature engineering: scale features
scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_test_s = scaler.transform(X_test)
# 3. Model selection
model = LinearRegression()
# 4-5. Loss + Optimization (fit does both)
model.fit(X_train_s, y_train)
# 6. Evaluation
y_pred = model.predict(X_test_s)
mse = mean_squared_error(y_test, y_pred)
print(f"Test MSE: {mse:.2f}")
Key insight: The ML pipeline is like building a house. Data is the land, features are the blueprints, the model is the construction crew, the loss function is the building inspector, optimization is fixing what the inspector flags, and evaluation is the final walkthrough. Skip any step and the house falls down.