Why We Should Never Use Predictive Insights as Causation.
We are doing data science the wrong way (usually). Most data teams I’ve come across provide data science insights and predictive models to other teams impacting the business. Other teams may incorrectly act on these findings.
We all know from Stats 101 that correlation does not equal causation.
We also know that we cannot prove things are right, we can only prove that things are not wrong. 
Most models in DS that we use (Regression, Decision Trees, Gradient Boost models etc) use correlation and we can derive feature importances for these predictive models.
Yes, we can optimize for a performance metric in the test set, but a top feature from the model’s feature importances does not necessarily mean it is a cause of the target variable.
Let us say we are trying to figure out what features most impact or cause customer retention. What do we do? Let’s walk through an example of a traditional ML model and why this is not what you want to do to find causation.
For example, we find the top feature in our customer retention model is the number of bugs reported. This finding might be because users with high usage who value the product are more likely to report bugs and to renew their subscriptions. But imagine telling a VP to introduce new bugs to increase customer retention. This is the issue of correlation in a predictive model when trying to decide on actions to take in order to change customer retention.
This might be helpful in estimating (or predicting) customer retention. But, this does not help us in deciding on what actions to take to impact customer retention. The question is: “What causes customer retention so that we can adjust levers that we have control over to positively impact this metric for the business?”
Again, the number of bugs reported feature is helpful for prediction, but suppose a team picks up our prediction model with the new goal of determining what actions our company can take to retain more customers from our predictive model insights.
This team is actually interested in the causality of features – not the predictive weight of features. They are interested in the counterfactual scenario created when variables in the real world are dynamic and change.
We can no longer merely identify a strong correlation between variables, we need to know whether manipulating a feature will change the target (customer retention) in a significant way.
There are a few methods we can explore to find causality. I will write a follow up on how to actually do the casual modeling and what libraries are available to us for these tasks today.
 Future experimentation can change the conclusion of what is right vs wrong (see Feynman on the Scientific Method https://www.youtube.com/watch?v=EYPapE-3FRw)