InterviewBiz LogoInterviewBiz
← Back
How Do You Evaluate Feature Importance in Machine Learning Models?
data-sciencehard

How Do You Evaluate Feature Importance in Machine Learning Models?

HardCommonMajor: data sciencemeta, microsoft

Concept

Feature importance quantifies how much each variable contributes to a model’s predictions.
It helps with interpretability, debugging, and model governance by answering a crucial question:

"Which features actually drive the model’s output?"

Evaluating importance correctly is not trivial — metrics vary by model type, data structure, and the problem domain.


1) Model-Specific vs. Model-Agnostic Approaches

A. Model-Specific Methods

These are built into certain algorithms:

  • Tree-based models (Random Forest, XGBoost, LightGBM)
    Use metrics such as information gain, Gini decrease, or split frequency.

    • Advantage: efficient and easy to extract.
    • Limitation: biased toward high-cardinality features or correlated variables.
  • Linear / Logistic Regression
    Use standardized coefficients or odds ratios to estimate direction and magnitude.

    • Example: a weight of 0.45 on “credit utilization” indicates a positive effect on default risk.
  • Neural Networks
    Feature importance can be approximated through gradient-based saliency or layer-wise relevance propagation.

B. Model-Agnostic Methods

Independent of model structure — interpretable across any algorithm.

  1. Permutation Importance
    Randomly shuffle each feature and measure drop in performance (e.g., AUC or RMSE).
    Larger drop ⇒ greater importance.

  2. Partial Dependence Plots (PDP)
    Visualize how changing one feature affects predictions while holding others constant.

  3. SHAP (SHapley Additive exPlanations)
    Based on game theory; fairly distributes a prediction’s contribution among features.

    • Additive and locally accurate.
    • Works for both global and local interpretability.
  4. LIME (Local Interpretable Model-Agnostic Explanations)
    Fits a simple surrogate model (like linear regression) around an individual prediction.


2) Quantitative Example (MDX-safe format)


Permutation Importance (ΔAccuracy)
feature_A :  -0.12
feature_B :  -0.07
feature_C :  -0.01

Interpretation:
Feature A contributes most to accuracy — its randomization reduces performance by 12 percentage points.

For SHAP values:


Mean(|SHAP|) per feature:
feature_A : 0.36
feature_B : 0.18
feature_C : 0.05

Higher absolute SHAP magnitude ⇒ higher overall impact on prediction variability.


3) Practical Applications

  • Model Debugging: Detect data leakage or redundant variables (e.g., if “zipcode” dominates income prediction).
  • Feature Engineering: Drop uninformative or correlated features to reduce noise.
  • Fairness Analysis: Reveal proxy variables that indirectly encode sensitive information (e.g., gender inferred through job title).
  • Regulatory Compliance: Explain financial or healthcare models per audit requirements (GDPR, ECOA, HIPAA).

4) Common Pitfalls

  • Multicollinearity: Importance may distribute arbitrarily among correlated variables — use SHAP interaction values or drop-one retraining to test robustness.
  • Feature Scaling: Always compare on normalized scales for linear models.
  • Non-stationary data: Importance rankings drift over time — monitor via rolling SHAP averages or retraining diagnostics.
  • Data leakage: Extremely high feature importance may indicate leakage; verify via causal inspection or pipeline audit.

5) Real-World Example

Case Study: Credit Risk Modeling at a FinTech Firm

  • Used XGBoost for credit scoring.
  • Initial feature importance ranked “account_age_days” and “zip_code” as top drivers.
  • SHAP analysis revealed “zip_code” was a proxy for income — creating geographic bias.
  • After removal and retraining, fairness metrics improved: disparate impact ratio rose from 0.73 → 0.88 while AUC remained constant at 0.81.

This example shows why feature importance must be contextualized, not blindly trusted.


6) Best Practices

  • Combine multiple techniques — tree gain, permutation, SHAP — for a complete picture.
  • Visualize feature impact distributions using SHAP summary or beeswarm plots.
  • Track drift in feature importance over time for deployed models.
  • Document findings in model cards or explainability reports.
  • Never rely on a single global ranking — examine local explanations per user or segment.

Tips for Application

  • When to discuss:
    When explaining model debugging, fairness analysis, or regulatory interpretability.

  • Interview Tip:
    Connect concept to action:

    “We used SHAP analysis to identify a leakage issue — removing one feature reduced AUC slightly but improved fairness compliance and model trustworthiness.”


Key takeaway:
Feature importance isn’t just a ranking — it’s a diagnostic and ethical tool.
Interpreting it properly transforms black-box models into transparent, reliable systems that teams can trust in production.