This was catalyzed by this Facebook thread yesterday, which received an insane number of comments (which is consistent with my general experience that trolley problems really inspire people to push back against the problem) and sparked several offline conversations, which included pointers to other people who were apparently asking similar questions, and in general was very valuable.
This post started life as a reply to a comment in that thread and metastasized. In particular, someone commented "The fact that self-driving cars won't speed is about to lead to a lot of speed limits being raised," which got me thinking about the conflicts between what we say we want and what we actually want, and why we preserve that conflict, and what happens as our ability to affect our environment improves.
Predictable unintended consequences
A lot of human social systems depend on our saying we want X, while in fact optimizing for Y. Then, when we suffer the consequences of Y, we get to claim that we wanted X, which makes us feel better... we aren't culpable, we're either bystanders or victims.
Let me repeat the comment that catalyzed this: "The fact that self-driving cars won't speed is about to lead to a lot of speed limits being raised."
We want to drive fast, for various reasons. But we know driving fast is dangerous. So we set speed limits and say we want to drive safely, but we build systems that both permit and depend on people violating those limits. That way we don't have to say out loud that our systems encourage accidents as a cost of getting what we want; instead we can blame traffic accidents entirely on the people who get into them. The speed limits are a way of signaling virtue to one another without actually slowing traffic down, which would be annoying.
There are of course other, more controversial, examples. Much of social injustice is this sort of thing... we say we want X, but we build societies that create Y. And it's not that we're lying... for the most part we aren't. We don't have to. Instead we build complex systems in which it takes a lot of work to understand how the system optimizes for Y. Then all we have to do is not do that work, and we can go on perfectly sincerely claiming that we're optimizing for X, or at least that we intend to. And then when Y (predictably) happens, we have a number of options.
For example, we can blame Y on someone else. Usually that's pretty easy, since there's usually someone or some group who is the proximal agent for each incidence of Y. What could be more natural, after all, than blaming crime on criminals (or in some cases victims), blaming traffic accidents on drivers who get into accidents, etc. etc. etc.? That's just common sense. Only an overeducated ivory-tower fool would challenge it.
Or we can blame Y on "the system." Importantly, this is not the specific system which is actually at fault, which might force us to acknowledge the need to improve or replace that system, or worse yet to acknowledge our complicity and complacency about Y. No, it's a much vaguer and more general "system", which has no particular relationship to us, or indeed to anything at all. It's "the Law of Unintended Consequences." "Murphy's Law." "Shit happens, y'know?" "God's will." Etc.
There are other strategies as well. What they all have in common is that when it comes to us, what matters is our intent. Since I don't mean for thousands of people to die in traffic accidents every year, it's not my fault that they do. And the same goes for everyone else.
And that, in turn, depends on the system being obscure enough that its outputs, while predictable, can plausibly be unintentional.
Explicit and implicit goals
Which, well, OK. That is what it is, and it's the same as it ever was.
But we're approaching a cusp-point, as we have at various times in our history before. We are getting better and better at building autonomous nonhuman agents that use general rules to make choices about how to optimize their environments toward certain goals.
For example, we can do a bunch of data analysis and come up with a bunch of rules which, when followed by a self-driving car, will (statistically speaking) cause it to minimize the chance of injury to humans. Or to maximize fuel economy. Or to minimize trip-time. Or to minimize wear-and-tear on the car. Or some min-max optimum of all of these goals, with various weightings.
And then we can program a car with whichever rules are best-suited to optimize for whichever goals we value, weighted to reflect how much we value them. At which point we will have a choice to make. Not so much the choice of which goals we value... that comes later. First, we will have to choose how obscure we want that system to be.
C.A.R.Hoare once said:
"There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. It demands the same skill, devotion, insight, and even inspiration as the discovery of the simple physical laws which underlie the complex phenomena of nature."
I would add to this that it demands transparency. In order to make a design that obviously does what we intend, we have to own up to our intentions... we have to be willing to make them obvious. If we're unwilling to do that, we will reject simple designs in favor of a complexity that obscures our actual goals.
If we go the obscure route, we preserve the benefits of our current approach... if the car causes death and injury, for example, as part of the cost for achieving our goals, we can fail to notice the connection altogether and remain unimplicated. Again, I go back to the comment that catalyzed this: "The fact that self-driving cars won't speed is about to lead to a lot of speed limits being raised"... because self-driving cars will actually follow speed limits, which isn't really what we want and never was, it's just what we want to say (and in many cases believe) we want.
The next step
So, OK. We're doing this already, and for the most part we err on the side of obscurity.
No surprise; transparency is genuinely hard, and few system developers put as much effort into interface design as back-end design. We prefer systems that mostly give us what we want even if we don't quite understand why, to systems that we understand which don't quite give us what we want, and in the short term that isn't a senseless preference, even if it does lead us to getting trapped in local maxima.
Plus, as above, it's uncomfortable.
So that's where we are. What's next?
Well, we talked about the "do data analysis and come up with a bunch of rules" stage above. Humans are pretty good at this. But I see no reason to believe that we're unable to build systems that are better at it than we are.
And once we've built an automatic system that can take a set of goals and a set of data about the world as input, and create a bunch of rules as output that my self-driving car (or whatever) can follow, we have a different but similar choice to make.
We can give the system X as input, and get rules that optimize for X, which is not what we wanted. Again: "The fact that self-driving cars won't speed is about to lead to a lot of speed limits being raised."
Or we can give the system Y as input, and get rules that optimize for Y, which is what we wanted... but to do that, we have to admit that's what we wanted.
Or we can build an obscure wrapper around the black box, which takes lots of input and does complicated things with it and at the end of the day produces a set of goals which it feeds the black box, and we won't know what that goal actually is. (In practice, if we take this tack, we probably don't have a real abstraction barrier between the wrapper and the black box in the first place, nor between the black box and the executing agent... it's just one integrated system that makes observations and takes action, with very little transparency.)
We'll probably do that third thing... it's what we've done every single time we've been faced with this choice. Which means, as above, that we give up control over the output.
That said, much as building cars means the window in which we can change our minds when we realize we're heading towards a cliff gets smaller and smaller, building automated environment optimizers means the window in which we can change our minds about what goals it's optimizing for gets smaller and smaller. We can't give up responsibility without losing the ability to respond.
Which, OK. When we build self-driving cars and suddenly realize that we don't actually mean our speed limits, we'll have plenty of time to change them.
But that's not always true. Sometimes, by the time we notice the error and get our act together enough to fix it, it's too late. And the more efficient our systems, the faster that point comes.
It's not the fall that gets you, after all.
It's the sudden stop at the end.