Given the increasing impact of models on human daily lives, interpretability is a hot keyword in the Machine Learning community. Issues like “right of explanation” (that a person targeted by the decision or the prediction of an automated decision-maker could require) make clear that the question of interpretability goes beyond the pure academic or technical community.
A large part of the works on interpretability deals with the interpretability of a ML model. The motivation of this post is that I personally find quite odd to argue about the interpretability of a model, and a fortiori of a data-driven ML algorithmic model. To motivate my reasoning I will make a rapid digression about what, though almost forgotten nowadays, used to be an important field of AI : qualitative physics.
Qualitative physics (btw my first research topic 🙂 ) was based on the idea that most models used in physics are of no use to explain physics to laymen. The rationale is that human common sense is quite distant from complex mathematical formalism (e.g. differential equations) used to model and predict the behaviour of a physical system. Thinks about the motion of a pendulum: if you want to explain your 4-year-old child its behaviour, you will probably be much more successful by using some qualitative (and visual) swinging notion rather than a second-order differential equation including a sinusoidal term…
The model, in this case the differential equation, says very little about the phenomenon to a person with no sophisticated mathematical background: nevertheless, people who know (almost) nothing about mathematics may easily and successfully reason about physical phenomena.
The research approaches discussing about interpretable models seem to forget this aspect: by attacking the interpretablity issue through the interpretability of the models, they neglect that models have being often conceived for accuracy/optimisation sake and address mathematical minds and/or computational engines but not human common sense.
There is also another disturbing aspect in dissecting models to make automatic predictions (or decisions) interpretable: the fact that (good) models are not unique. The famous saying “all models are wrong, some are useful”, stresses that several different models may be used to describe a phenomenon and many of them (though very different) may be chosen for very disparate reasons.
If you want to explain a bank customer why he did not get the credit (and your predictive ML model is the combination of a neural network and a logistic model), it will be of no worth to enter the details (and the inductive biases) of those two algorithms. ML models are estimators: specifically in a supervised learning problem they estimate conditional means (or conditional probability) describing the relation between input and output variables. Describing the estimation machinery behind the prediction process won’t be of any use for the final user (or victim) of the prediction process.
Interpretability should focus on the phenomenon (i.e. the relation between the descriptors used by the predictor and the target variable) and not the multiple (heterogenous) ways of describing it. The more the phenomenon is high-variate, the more the interpretation should focus on representing (e.g. graphically) the relations of (in)dependence between variables.
My conclusion is that since human may reason about complex behaviours and they typically do that without use of any mathematical formalism, decorticate complex algorithms to explain phenomena is of very little use. So what? How can we provide insight to humans confronted with black-box driven decisions?
My suggestion is to forget the model(s) yet targeting the object of the modelling effort, i.e. the phenomenon. In supervised learning the phenomenon is a probabilistic (since uncertain) relation between observed variables, which (most of the time) is the consequence of a causal relationship between (some of the observed) variables. And causality is a notion that human beings seem to much better capture than algorithmic or mathematical details. Observations, experiments, statistical and machine learning models should have a single common objective: shed light on the dependence and (better) causal relationships underlying the data. Once discovered it (or more humbly) once estimated it, this should be the information to explain the decision to the client. “Dear client, our R&D department reached with high-confidence the evidence that the features x and y are causally related to the non refunding of loans and that the higher those values the lower the probability of a default: therefore, given your status…. we regret to inform you …”. Would such explanation convince the customer? Probably not, but this is the most honest interpretation that a machine learner should give to a final user. The causal mechanism is supposed to be robust and model independent.
Of course, the use of features x and y to justify the decision might be sometimes unethical but this is another domain where causality might help…