GDPR – sounding the death knell for self-learning algorithms?
Posted: 22 March 2017 | By Darcie Thompson-Fields
In just a few short months the European General Data Protection Regulation becomes enforceable. This regulation enshrines in law the rights of EU citizens to have their personal data treated in accordance with their wishes. The regulation applies to any organisation which is processing EU citizens’ data, and the UK government has indicated that, irrespective of Brexit, it will implement the regulation in full.
Importantly for data-driven organisations, the GDPR regulations are not just about where personal data is stored and the ability to opt out of email spam messages from organisations that you bought from years ago. Article 15 of the regulation specifically mentions the right of an individual to meaningful information about the logic involved in any automatic decision concerning them, as well as the significance and the envisaged consequences of such processing for that individual.
Furthermore, Article 22 enshrines the right of an individual not to be subject to an automated decision-making process where those decisions significantly affect the individual.
Organisations therefore need to be able to
1) provide clear information on how analytical processing is applied at an individual level; and
2) ensure that certain individuals can be exempt from that processing.
For those organisations that provide credit (e.g. mortgages, personal loans, credit cards) this is nothing new; there are already regulations in place to prevent discrimination and enforce clarity in data usage. Where the impact is spreading is into organisations such as retailers who may make discount offers to some consumers and not others based upon an algorithmic decision. If one consumer is disadvantaged as a result (i.e. gets a worse offer) then they can request an explanation and demand that the algorithm is not used in respect of their treatment by a company.
A “black box” problem
This has some profound implications for organisations. Clearly, making an automated decision needs to be within a structured framework so that how the decision is made at any one time can be understood. The analytical model involved needs to be sufficiently interpretable to then allow an explanation of how a decision was made to an individual, and what the implications were for that individual. Organisations will also need to be able to explain what data were used to reach that decision, and in the case of important decisions whether or not the entire overall decision-making process is properly controlled.
The problem is much worse where “black box” systems are deployed. These are systems that are opaque to how the data are being used; self-learning algorithms can be seen as one example of this. (A self-learning algorithm is one which adjusts its own parameters on the basis of new data and feedback with no human intervention). If the working of the model is not explainable, the requirements under the regulation cannot be met.
It may be appropriate to use a “black box” solution where there is no significant adverse impact to the individual, for example, to sort the products which are displayed first on a web page. If however, the website is offering credit at a particular APR, this would be entirely inappropriate.
Another issue with “black box” systems is that they may inadvertently become discriminatory. For example, if a group of postcodes is used as a factor in an automated decision-making algorithm, this may also divide groups along ethnic lines, whether intentionally or not. A transparent “white box” approach would include a review process to ensure that this type of issue does not occur.
Best approach checklist
The best approach is to ensure the following are true of your decision-making environment:
- It is clear which data have been used to make a decision on each individual
- Analytical models are open and interpretable
- The process of deploying models into production is clear so that which model is in use at any one time is well understood – this implies clear version control on analytical models and a careful testing process before deployment
- A history of decisions and how they have been made is available through an audit trail
- It is clear which models have the potential for a significant impact upon the individual
- Decisions are consistent across channels (for example the web, email and SMS all provide the same offer)
All of the above are perfectly achievable and are standard practice in many organisations. It’s important not to lose sight of the fact that an analytical approach has been proven to significantly increase revenues; moving away from analytics because of GDPR would be a self-destructive overreaction. There are also significant benefits to a controlled model development process, including reducing the time to deploy models and increasing the productivity of analysts.
The highest risks to business in this new world come from self-learning algorithms that are opaque to those that use them, and uncontrolled software development and deployment processes which may introduce inadvertent impacts upon consumers.
As the risks of non-compliance include fines of up to four per cent of turnover or €20M (whichever is the greater), plus the associated reputational damage, it’s clear that the control and clarity of an analytical modelling process is of greater value than ever.
Iain Brown, senior data scientist and Dave Smith, data management specialist, SAS UK & Ireland