Data as property

Jeni Tennison – Exponential View

Azeem Azhar’s Exponential View is one of the very few weekly emails which earns regular attention, and it is no disrespect to him to say that the occasional guest authors he invites add further to the attraction. This edition is by Jeni Tennison, bringing her very particular eye to the question of data ownership.

Is owning data just like owning anything else? The simple answer to that is ‘no’. But if it isn’t, what does it mean to talk about data as property? To which the only simple answer is that there is no simple answer. This is not the place to look for detailed exposition and analysis, but it is very much the place to look for a set of links to a huge range of rich content, curated by somebody who is herself a real expert in the field.

Fading out the Echo of Consumer Protection: An empirical study at the intersection of data protection and trade secrets

Guido Noto La Diega

This is by way of a footnote to the previous post – a bit more detail on one small part of the enormous ecosystem described there.

If you buy an Amazon Echo then, partly depending on what you intend to do with it, you may be required to accept 17 different contracts, amounting to close to 50,000 words, not very far short of the length of a novel. You will also be deemed to be monitoring them all for any changes, and to have accepted any such changes by default.

That may be extreme in length and complexity, but the basic approach has become normal to the point of invisibility. That raises a question about the reasonableness of Amazon’s approach. But it raises a much more important question about our wider approach to merging new technologies into existing social, cultural and legal constructs. This suggests, to put it mildly, that there is room for improvement.

(note that the link is to a conference agenda page rather than directly to the presentation, as that is a 100Mb download, but if needed this is the direct link)

Anatomy of an AI System

Kate Crawford and Vladan Joler

An Amazon Echo is a simple device. You ask it do things, and it does them. Or at least it does something which quite a lot of the time bears some relation to the thing you ask it do. But of course in order to be that simple, it has to be massively complicated. This essay, accompanied by an amazing diagram (or perhaps better to say this diagram, accompanied by an explanatory essay), is hard to describe and impossible to summarise. It’s a map of the context and antecedents which make the Echo possible, covering everything from rare earth geology to the ethics of gathering training data.

It’s a story told in a way which underlines how much seemingly inexorable technology in fact depends on social choices and assumptions, where invisibility should not be confused with inevitability. In some important ways, though, invisibility is central to the business model – one aspect of which is illustrated in the next post.

MIT taught a neural network how to show its work

Tristan Greene – The Next Web

One of the many concerns about automated decision making is its lack of transparency. Particularly (but by no means only) for government services, accountability requires not just that decisions are well based, but that they can be challenged and explained. AI black boxes may be efficient and accurate, but they are not accountable or transparent.

This is an interesting early indicator that those issues might be reconciled. It’s in the special – and much researched – area of image recognition, so a long way from a general solution, but it’s encouraging to see systematic thought being addressed to the problem.

Spoiler alert – there are now 9 tribes of digital

Catherine Howe – Curious?

The eight tribes of digital (which were once seven) have become nine.

The real value of the tribes – other than that they are the distillation of four years of observation, reflection and synthesis – is not so much in whether they are definitively right (which pretty self-evidently they aren’t, and can’t be) but as a prompt for understanding why individuals and groups might behave as they do. And of course, the very fact that there can be nine kinds of digital is another way of saying that there is no such thing as digital

Robot says: Whatever

Margaret Boden – Aeon

The phrase ‘artificial intelligence’ is a brilliant piece of marketing. By starting with the artificial, it makes it easy to overlook the fact that there is no actual intelligence involved. And if there is no intelligence, still less are there emotions or psychological states.

The core of this essay is the argument that computers and robots do not, and indeed cannot, have needs or desires which have anything in common with those experienced by humans. In the short to medium term, that has both practical and philosophical implications for the use and usefulness of machines and the way they interact with humans. And in the long term (though this really isn’t what the essay is about), it means that we don’t have to worry unduly about a future in which humanity survives – at best – as pets of our robot overlords.

Q & A with Ellen Broad – Author of Made by Humans

Ellen Broad – Melbourne University Publishing

Ellen Broad’s new book is high on this summer’s reading list. Both provenance and subject matter mean that confidence in its quality can be high. But while waiting to read it, this short interview gives a sense of the themes and approach. Among many other virtues, Ellen recognises the power of language to illuminate the issues, but also to obscure them. As she says, what is meant by AI is constantly shifting, a reminder of one of the great definitions of technology, ‘everything which doesn’t work yet’ – because as soon as it does it gets called something else.

The book itself is available in the UK, though Amazon only has it in kindle form (but perhaps a container load of hard copies is even now traversing the globe).

7 Steps to Data Transformation

Edwina Dunn – Starcount

Edwina Dunn is one of the pioneers of data science and this short paper is the distillation of more than twenty years’ experience of using meticulous data analysis to understand and respond to customers – most famously in the form of the Tesco Clubcard. It is worth reading both for some pithy insights – data is art as well as science – and, more unexpectedly, for what feels like a slightly dated approach. “Data is the new oil” may be true in the sense that is a transformational opportunity, with Zuckerberg as the new Rockefeller, but data is not finite, it is not destroyed by use and it is not fungible. More tellingly she makes the point that ‘Owning the customer is not a junior or technical role; it’s one of the most important differentiators of future winners and losers.’ You can see what she means, but shopping at a supermarket is not supposed to be a form of slavery, still less so (if that were possible) is that a good way of thinking about the users of public services.

It doesn’t sound as though the Cluetrain Manifesto has been a major influence on this school of thought. Perhaps it should be.

Basic instincts

Matthew Hutson – Science

This article is an interesting complement to one from last week which argued that AI is harder than you think. It builds a related argument from a slightly different starting point: that big data driven approaches to artificial intelligence have been demonstrably powerful in the short term, but may never break through to produce general problem solving skills. That’s because there is no solution in sight to the problem of creating common sense – which turns out not to be very common at all. Humans possess some basic instincts which are hard coded into us and might need to be hard coded into AI as well – but to do so would be to cut across the self-learning approach to AI which now dominates. If there is reason to think that babies can make judgements and distinctions which elude current AI, perhaps AI has more to learn from babies than babies from AI.

Robot Future

XKCD

A pithy but important reminder that the autonomy of AI is not what we should most worry about. Computers are ultimately controlled by humans and do what humans want them to do. Understanding the motivation of the humans will be more important than attempting to infer the motivation of the robots for a good while to come.

A.I. Is Harder Than You Think

Gary Marcus and Ernest Davis – New York Times

Coverage of Google’s recent announcement of a conversational AI which can sort out your restaurant bookings for you has largely taken one of two lines. The first is about the mimicry of human speech patterns: is it ethical for computers to um and er in a way which can only be intended to deceive their interlocutors into thinking that they are dealing with a real human being, or should it always be made clear, by specific announcement or by robotic tones, that a computer is a computer? The second – which is where this article comes in – positions this as being on the verge of artificial general intelligence: today a conversation about organising a hair cut, tomorrow one about the meaning of life. That is almost completely fanciful, and this article is really good at explaining why.

It does so in part by returning to a much older argument about computer intelligence. For a long time, the problem of AI was treated as a problem of finding the right set of rules which would generate a level of behaviour we would recognise as intelligent. More recently that has been overtaken by approaches based on extracting and replicating patterns from big data sets. That approach has been more visibly successful – but those successes don’t in themselves tell us whether they are steps towards a universal solution or a brief flourishing within what turns out to be a dead end. Most of us can only be observers of that debate – but we can guard against getting distracted by potential not yet realised.

We can’t have nice things (yet)

Alex Blandford – Medium

Data is a word which conjures up images of objectivity and clarity. It lives in computers and supports precise binary decisions.

Except, of course, none of that is true, or at least none of it is reliably true, especially the bit about supporting decisions. Decisions are framed by humans, and the data which supports them is as much social construct as it is an emergent property of reality. That means that the role of people in curating data and the decision making it supports is vital, not just in constructing the technology, but in managing the psychology, sociology and anthropology which frame them. Perhaps that’s not a surprising conclusion in a post written by an anthropologist, but that doesn’t make it any less right.

Understanding Algorithms

Tim Harford

Tim Harford recommends some books about algorithms. There’s not much more to be said than that – except perhaps to follow up on one of the implications of Prediction Machines, the book which is the main focus of the post.

One way of looking at artificial intelligence is as a tool for making predictions. Good predictions reduce uncertainty. Really good predictions may change the nature of a problem altogether. In a different sense, the purpose of strategy can also be seen as a way of reducing uncertainty: by making some choices (or bets), other choices drop out of the problem space. Putting those two thoughts together suggests that better AI may be a tool to support better strategies.

AI in the UK: ready, willing and able?

House of Lords Select Committee on Artificial Intelligence

There is something slightly disconcerting about reading a robust and comprehensive account of public policy issues in relation to artificial intelligence in the stately prose style of a parliamentary report. But the slightly antique structure shouldn’t get in the way of seeing this as a very useful and systematic compendium.

The strength of this approach is that it covers the ground systematically and is very open about the sources of the opinions and evidence it uses. The drawback, oddly, is that the result is an curiously unpolitical document – mostly sensible recommendations are fired off in all directions, but there is little recognition, still less assessment, of the forces in play which might result in the recommendations being acted on. The question of what needs to be done is important, but the question of what it would take to get it done is in some ways even more important – and is one a House of Lords committee might be expected to be well placed to answer.

One of the more interesting chapters is a case study of the use of AI in the NHS. What comes through very clearly is that there is a fundamental misalignment betweeen the current organisational structure of the NHS and any kind of sensible and coherent use – or even understanding- of the data it holds and of the range of uses, from helpful to dangerous, to which it could be put. That’s important not just in its own right, but as an illustration of a much wider issue of institutional design noted by Geoff Mulgan.

The Risk of Machine-Learning Bias (and How to Prevent It)

Chris DeBrusk – Sloan MIT Management Review

This article is a good complement to the previous post, providing some pragmatic rigour on the risk of bias in machine learning and ways of countering it. Perhaps the most important point is one of the simplest:

It is safe to assume that bias exists in all data. The question is how to identify it and remove it from the model.

There is some good practical advice on how to do just that. But there is an obvious corollary: if human bias is endemic in data, it risks being no less endemic in attempts to remove it. That’s not a counsel of despair, this is an area where good intentions really do count for something. But it does underline the importance of being alert to the opposite, that unless it is clear that bias has been thought about and countered, the probability is high that it still remains. And of course it will be hard to calibrate the residual risk, whatever its level might be, particularly for the individual on the receiving end of the computer saying ‘no’.

Computer Says No: Part 1 Algorithmic Bias and Part 2 Explainability

These two (of a planned three) posts take an interesting approach to the ethical problems of algorithmic decision making, resulting in a much more optimistic view than most who write on this. It’s very much worth reading even though the arguments don’t seem quite as strong as they are made to appear.

Part 1 essentially side steps the problem of bias in decision making by asserting that automated decision systems don’t actually make decisions (humans still mostly do that), but should instead be thought of as prediction systems – and the test of a prediction system is in the quality of its predictions, not in the operations of its black box. The human dimension is a bit of a red herring as it’s not hard to think of examples where in practice the prediction outputs are all the decision maker has to go on, even if in theory the system is advisory. More subtly, there is an assumption that prediction quality can easily be assessed and an assertion that machine predictions can be made independent of the biases of those who create them, both of which are harder problems than the post implies.

The second post goes on to address explainability, with the core argument being that it is a red herring (an argument Ed Felten has developed more systematically): we don’t really care whether a decision can be explained, we care whether it can be justified, and the source of justification is in its predictive power, not in the detail of its generation. There are two very different problems with that. One is that not all individual decisions are testable in that way: if I am turned down for a mortgage, it’s hard to falsify the prediction that I wouldn’t have kept up the payments. The second is that the thing in need of explanation may be different for AI decisions from that for human decisions. The recent killing of a pedestrian by an autonomous Uber car illustrates the point: it is alarming precisely because it is inexplicable (or at least so far unexplained), but whatever went wrong, it seems most unlikely that a generally low propensity to kill people will be thought sufficiently reassuring.

None of that should be taken as a reason for not reading these posts. Quite the opposite: the different perspective is a good challenge to the emerging conventional wisdom on this and is well worth reflecting on.

Data as photography

Ansel Adams, adapted by Wesley Goatley

“A visualisation is usually looked at – seldom looked into.” – Ansel Adams “The sheer ease with which we can produce a superficial visualisation often leads to creative disaster.” – Ansel Adams “There's nothing worse than a sharp visualisation of a fuzzy concept.” – Ansel Adams “You don't collect a data set, you make it.” – Ansel Adams “There are always two people in every data visualisation: the creator and the viewer.” – Ansel Adams “To make art with data truthfully and effectively is to see beneath the surfaces.” – Ansel Adams “A great data visualisation is a full expression of what one feels about what is being visualised in the deepest sense, and is, thereby, a true expression of what one feels about life in its entirety.” – Ansel Adams “Data visualisation is more than a medium for factual communication of ideas. It is a creative art.” – Ansel Adams “We must remember that a data set can hold just as much as we put into it, and no one has ever approached the full possibilities of the medium.” – Ansel Adams “Data art, as a powerful medium...offers an infinite variety of perception, interpretation and execution.” – Ansel Adams “Twelve significant data points in any one year is a good crop.” – Ansel Adams

The idea that the camera does not lie is as old as photography. It has been untrue for just as long.

The exposure of film or sensor to light may be an objective process, but everything which happens before and after that is malleable and uncertain. There are some interesting parallels with data in that: the same appearance – and assertion – of accurately representing the real world, the same issues of both deliberate and unwitting distortion.

This tweet simply takes some of the things Ansel Adams, the great photographer of American landscapes, has written about photography and adapts them to be about data. It’s neatly done and provides good food for thought.

Don’t believe the hype about AI in business

Vivek Wadhwa – VentureBeat

If you want to know why artificial intelligence is like teenage sex, this is the post to read. After opening with that arresting comparison, the article goes on to make a couple of simple but important points. Most real world activities are not games with pre-defined rules and spaces. And for businesses – and arguably still more so for governments – it is critically important to be able to explain and account for decisions and outcomes. More pragmatically, it also argues that competitive advantage in the deployment of AI goes to those who can integrate many sets of disparate data to form a coherent set to which AI can be applied. Most companies – and, again, perhaps even more so most governments – are not very good at that. That might be the biggest challenge of all.

YouTube, the Great Radicalizer

Zeynep Tufekci – New York Times

This article has been getting extensive and well-deserved coverage over the last few days. Essentially, it is demonstrating that the YouTube recommendation engine tends to lead to more extreme material, more or less whatever your starting point. In  short, “YouTube leads viewers down a rabbit hole of extremism, while Google racks up the ad sales.”

The reason for including it here is not because of the specific algorithm or the specific behaviour it generates. It is because it’s a very clear example of a wider phenomenon. It’s a pretty safe assumption that the observed behaviour is not the result of a cabal of fringe conspirators deep in the secret basements of Google setting out a trail to recruit people into extremist groups or attitudes. The pretty obvious motivation is that what they are actually trying to do is to tempt people into spending as long as possible watching YouTube videos, because that’s the way they can put most advertising in front of most eyeballs.

In other words, algorithmic tools can have radically unintended consequences. That’s made worse in this case because the unintended consequences are not a sign of the intended goal not being achieved; on the contrary, they are the very means by which that intended goal is being achieved. So it is not just the case that YouTube has some strong incentives not to fix the problem, the problem may not be obvious to them in the first place.

This is a clear example. But we need to keep asking the same questions about other systems: what are the second order effects, will we recognise them when we see them, and will we be ready to – and able to – address them?

A roadmap for AI: 10 ways governments will change (and what they risk getting wrong)

Geoff Mulgan – NESTA

This is a great summary of where AI stands in the hype cycle. Its focus is the application to government, but most of it is more generally relevant. It’s really helpful in drawing out what ought to be the obvious point that AI is not one thing and that it therefore doesn’t have a single state of development maturity.

The last of the list of ten is perhaps the most interesting. Using AI to apply more or less current rules in more or less current contexts and systems is one thing (and is a powerful driver of change in its own right). But the longer term opportunity is to change the nature of the game. That could be a black box dystopia, but it could instead be an opportunity to break away from incremental change and find more radical opportunities to change the system. But that depends, as this post rightly concludes, on not getting distracted by the technology as a goal in its own right, but focusing instead on what better government might look like.