From open data to open decisions

Open data is the next (or already current) big thing. Is it enough?

We should be doing data science, not (only) for the sake of having good models or nice predictions, but for providing quantified, data driven and assessed evidence to decision makers.

Is a good data science process enough? I would say no. Whatever is the evidence data scientists will be able to provide, such evidence will be affected (or better annotated) by uncertainty, risk, confidence intervals, variance. The role of the decision maker is not to take blindly the outcome of the data science process but to weight properly the risk and the costs.

Let us take an academic example: the doctor deciding whether to prescribe or not a treatment (e.g. a chemotherapy) to a patient. It is not only about the potential success (and risk) of the treatment. It is also about the cost of a false positive (prescribe a treatment and suffer only side effects) and false negative (avoid to prescribe it and deteriorate the patient state).

Eventually, it is the doctor who decides on the basis of

  1. a model (implicit in his knowledge or made explicit for instance in a statistical model)
  2. a measure of utility or cost (associated to false positives and false negatives)

whether it is more  beneficial for the patient to deliver or not a drug.

If the data that led to the statistical model are (or will presumably be) open and then available in the future for the sake of reproducibility and scientific validation, what about the final choice of the doctor (or more generally of the decision maker)?

Decision making is either irrational or rational. In the first case let us just cross our fingers. In the second case it would deserve a description, a documentation and (why not) an open sharing. I advocate that, like for open data, a comparable (or greater) effort should be deserved to provide tools, repositories and dashboards to edit, store and disseminate open decision models describing

  1. the decision making setting (date, author, target, expected impact)
  2. the evidence it relied on (informal knowledge, literature, statistical models)
  3. in case of statistical evidence, the (open) data  that were used for inferring it
  4. the utility (or cost) function used for the decision
  5. the decision making process, specifically how the material in points 2., 3. and 4. was used to deliver the final decision

And the confidentiality? The decision model (once formalized) could be kept confidential or have a restricted access if needed. The issue here is not really about the  disclosure of sensitive information but more about the degree of reproducibility of a decision. We can only learn from our (or other) errors. Think about political decision makers, democratically required to  document and safely store their decisions, and the possibility for a citizen of rerunning their decisions (once disclosed) in a near (or far) future.


The regularity gamble

All human knowledge relies on a gamble: “regularity exists“. Equivalently, only what is  regular (e.g. a pattern), or what seems to be regular, has the right of entering our knowledge and scientific heritage.

Note that regular does not mean necessarily something boring (a constant) or shallow or deterministic. We could find regularity in the behavior of a spring, as well as in the volatility of the stock market or in the way a complex dynamics evolves itself with time.

Nevertheless, humans start to consider that they know something only when they put that something within a pattern, a model, a map. All the rest is unknown (or not yet known) and deserves labels like noise, error, uncertainty.