You are viewing dpolicar

Previous Entry | Next Entry

...is also great, and would suffice.

but...
I've been thinking about the ethics of autonomous systems lately.

This was catalyzed by this Facebook thread yesterday, which received an insane number of comments (which is consistent with my general experience that trolley problems really inspire people to push back against the problem) and sparked several offline conversations, which included pointers to other people who were apparently asking similar questions, and in general was very valuable.

This post started life as a reply to a comment in that thread and metastasized. In particular, someone commented "The fact that self-driving cars won't speed is about to lead to a lot of speed limits being raised," which got me thinking about the conflicts between what we say we want and what we actually want, and why we preserve that conflict, and what happens as our ability to affect our environment improves.

Predictable unintended consequences
A lot of human social systems depend on our saying we want X, while in fact optimizing for Y. Then, when we suffer the consequences of Y, we get to claim that we wanted X, which makes us feel better... we aren't culpable, we're either bystanders or victims.

Let me repeat the comment that catalyzed this:  "The fact that self-driving cars won't speed is about to lead to a lot of speed limits being raised."

We want to drive fast, for various reasons. But we know driving fast is dangerous. So we set speed limits and say we want to drive safely, but we build systems that both permit and depend on people violating those limits. That way we don't have to say out loud that our systems encourage accidents as a cost of getting what we want; instead we can blame traffic accidents entirely on the people who get into them. The speed limits are a way of signaling virtue to one another without actually slowing traffic down, which would be annoying.

There are of course other, more controversial, examples. Much of social injustice is this sort of thing... we say we want X, but we build societies that create Y. And it's not that we're lying... for the most part we aren't. We don't have to. Instead we build complex systems in which it takes a lot of work to understand how the system optimizes for Y. Then all we have to do is not do that work, and we can go on perfectly sincerely claiming that we're optimizing for X, or at least that we intend to. And then when Y (predictably) happens, we have a number of options.

For example, we can blame Y on someone else. Usually that's pretty easy, since there's usually someone or some group who is the proximal agent for each incidence of Y. What could be more natural, after all, than blaming crime on criminals (or in some cases victims), blaming traffic accidents on drivers who get into accidents, etc. etc. etc.? That's just common sense. Only an overeducated ivory-tower fool would challenge it.

Or we can blame Y on "the system." Importantly, this is not the specific system which is actually at fault, which might force us to acknowledge the need to improve or replace that system, or worse yet to acknowledge our complicity and complacency about Y. No, it's a much vaguer and more general "system", which has no particular relationship to us, or indeed to anything at all. It's "the Law of Unintended Consequences." "Murphy's Law." "Shit happens, y'know?" "God's will." Etc.

There are other strategies as well. What they all have in common is that when it comes to us, what matters is our intent. Since I don't mean for thousands of people to die in traffic accidents every year, it's not my fault that they do. And the same goes for everyone else.

And that, in turn, depends on the system being obscure enough that its outputs, while predictable, can plausibly be unintentional.

Explicit and implicit goals
Which, well, OK. That is what it is, and it's the same as it ever was.

But we're approaching a cusp-point, as we have at various times in our history before. We are getting better and better at building autonomous nonhuman agents that use general rules to make choices about how to optimize their environments toward certain goals.

For example, we can do a bunch of data analysis and come up with a bunch of rules which, when followed by a self-driving car, will (statistically speaking) cause it to minimize the chance of injury to humans. Or to maximize fuel economy. Or to minimize trip-time. Or to minimize wear-and-tear on the car. Or some min-max optimum of all of these goals, with various weightings.

And then we can program a car with whichever rules are best-suited to optimize for whichever goals we value, weighted to reflect how much we value them. At which point we will have a choice to make. Not so much the choice of which goals we value... that comes later. First, we will have to choose how obscure we want that system to be.

C.A.R.Hoare once said:


"There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. It demands the same skill, devotion, insight, and even inspiration as the discovery of the simple physical laws which underlie the complex phenomena of nature."



I would add to this that it demands transparency. In order to make a design that obviously does what we intend, we have to own up to our intentions... we have to be willing to make them obvious. If we're unwilling to do that, we will reject simple designs in favor of a complexity that obscures our actual goals.

If we go the obscure route, we preserve the benefits of our current approach... if the car causes death and injury, for example, as part of the cost for achieving our goals, we can fail to notice the connection altogether and remain unimplicated. Again, I go back to the comment that catalyzed this:  "The fact that self-driving cars won't speed is about to lead to a lot of speed limits being raised"... because self-driving cars will actually follow speed limits, which isn't really what we want and never was, it's just what we want to say (and in many cases believe) we want.

If we go the transparent route, we give up those signaling benefits. What we get in exchange is more control over the output of the systems we build. If we obscure the connections between the rules we explicitly endorse and the goals we implicitly endorse, it's more likely that we'll accidentally endorse rules that actually achieve something entirely different. If we clear up that obscurity, it's more likely that we'll actually get what we want.

The next step

So, OK. We're doing this already, and for the most part we err on the side of obscurity.

No surprise; transparency is genuinely hard, and few system developers put as much effort into interface design as back-end design. We prefer systems that mostly give us what we want even if we don't quite understand why, to systems that we understand which don't quite give us what we want, and in the short term that isn't a senseless preference, even if it does lead us to getting trapped in local maxima.

Plus, as above, it's uncomfortable.

So that's where we are. What's next?

Well, we talked about the "do data analysis and come up with a bunch of rules" stage above. Humans are pretty good at this. But I see no reason to believe that we're unable to build systems that are better at it than we are.

And once we've built an automatic system that can take a set of goals and a set of data about the world as input, and create a bunch of rules as output that my self-driving car (or whatever) can follow, we have a different but similar choice to make.

We can give the system X as input, and get rules that optimize for X, which is not what we wanted. Again: "The fact that self-driving cars won't speed is about to lead to a lot of speed limits being raised."

Or we can give the system Y as input, and get rules that optimize for Y, which is what we wanted... but to do that, we have to admit that's what we wanted.

Or we can build an obscure wrapper around the black box, which takes lots of input and does complicated things with it and at the end of the day produces a set of goals which it feeds the black box, and we won't know what that goal actually is. (In practice, if we take this tack, we probably don't have a real abstraction barrier between the wrapper and the black box in the first place, nor between the black box and the executing agent... it's just one integrated system that makes observations and takes action, with very little transparency.)

We'll probably do that third thing... it's what we've done every single time we've been faced with this choice. Which means, as above, that we give up control over the output.

That said, much as building cars means the window in which we can change our minds when we realize we're heading towards a cliff gets smaller and smaller, building automated environment optimizers means the window in which we can change our minds about what goals it's optimizing for gets smaller and smaller. We can't give up responsibility without losing the ability to respond.

Which, OK. When we build self-driving cars and suddenly realize that we don't actually mean our speed limits, we'll have plenty of time to change them.

But that's not always true. Sometimes, by the time we notice the error and get our act together enough to fix it, it's too late. And the more efficient our systems, the faster that point comes.

It's not the fall that gets you, after all.
It's the sudden stop at the end.

Tags:

Comments

( 20 comments — Leave a comment )
navrins
Jun. 20th, 2014 09:00 pm (UTC)
So, the rules of human society optimize (inadvertently?) for obfuscating what it is you're optimizing for.

Or, a different way of thinking about it: If you and I are in a prisoner's dilemma situation - that is, each of us stands to do a little better by making a choice that results in the other doing a lot worse, regardless of the other's choice - I might rationally hope that you haven't noticed that. The theoretical problem is usually presented in a very simple form where it is irrational to imagine that the other player doesn't understand the implications of his or her choices. But if the rules of the scenario are sufficiently complicated that you don't know we're in a prisoner's dilemma situation, you may believe you do best by making a choice that leads to a better outcome for me. For that matter, if the rules of the scenario are sufficiently complicated, I might not be sure we're in a prisoner's dilemma situation, and so make the choice that leads to a better outcome for you. We might end up choosing the jointly-optimal outcome despite being rational enough and non-altruistic enough that if we could clearly see the dilemma we were in, we wouldn't.

That's... interesting.
dpolicar
Jun. 20th, 2014 10:00 pm (UTC)
Except that in situations of uncertainty, we're less likely to try for highly contingent benefits (rather than lesser benefits that don't depend on collaboration), which usually makes us more likely to defect in PDs even if we're unaware that they're PDs (which we usually are... they are everywhere and we mostly don't notice).
navrins
Jun. 21st, 2014 03:35 am (UTC)
Sure. But if the rules are complicated enough, we might intend to defect and even think we had after the fact, even though we've in fact accidentally cooperated.
dpolicar
Jun. 21st, 2014 05:47 am (UTC)
Well, in the sense that if we don't know what's going on, anything might happen, yes, I agree.

Is there a reason you're calling attention to that particular low-probability possibility, as opposed to all the others?
jason237
Jun. 21st, 2014 12:42 pm (UTC)
Deputizing computers to make morally dubious decisions for us seems like the most plausible path to Skynet I've heard.
dpolicar
Jun. 21st, 2014 12:57 pm (UTC)
Ayup.
xthread
Jun. 26th, 2014 05:28 pm (UTC)
You're posing a question in the space 'what happens when we direct robotic systems to make moral choices?'
This is an important philosophical question, but there is a critical component of the discussion which you overlook.
My experience in software and operations, well borne out by decades of research by others, is that the complex control systems which we devise to make those choices will not work as intended. I don't mean some day the complex system will decide to kill a baby instead of four other pedestrians, and we, as people, will be appalled that it made that choice. I mean that the software will mistake a hedge for a crowd of humans, and kill the baby to preserve the hedge, the same way that a human mis-judges obstacles and simply doesn't see thinks. When we discuss the philosophical choices, there is a strong tendency to assume that the software actually works. And works correctly. And doesn't do a bunch of other unexpected things while sort of working.

We know of no piece of complex software for which this is true. When we talk about the choices, we need to be keeping in mind that what we're talking about is what we want the software to try to do - we won't actually get the behavior we're asking for.
dpolicar
Jun. 26th, 2014 05:45 pm (UTC)
Sure.

Similarly, when we ask whether it's moral to kill one person to save twenty, we don't worry too much about the possibility that we're actually really confused and the "twenty people" we're saving are actually a hedge, although of course even in humans the possibility exists: that an algorithm is implemented in protoplasm and developed through natural selection is no guarantee of its reliability.

To put it mildly.

But, yeah, I agree that the decisions an agent (human or otherwise) ought to make given that it's operating under uncertainty are often different from the decisions an omniscient/infallible agent ought to make, so keeping uncertainty in mind is important.

After all, given a situation as we perceive it, sometimes the right decision is to doubt our perceptions rather than to act on the perceived situation. Which is one reason why I am not a prophet.

Still, even under uncertainty, an important part of our decision procedure usually involves deciding what we ought to do were we in the situation we perceive ourselves to be in. And the same will be true of the autonomous agents we develop.

Edited at 2014-06-26 05:46 pm (UTC)
xthread
Jun. 26th, 2014 06:22 pm (UTC)
You're conflating some uncertainty of perception with uncertainty of likely outcome. Not only is there the problem that we can misperceive the difference between a group of people and a hedge, but there is also the problem that 'steer away from the baby' may not be an actual choice that physics gives us. You also seem to be discounting the perception failure case in the real world - almost everyone who hits a bicycle or motorcycle says that they were surprised, they just didn't see the victim before they hit them.

In short, you're positing not only that the actor is an omniscient and infallible agent, but that they are omnipotent and infallible as well. That's a rather tall order.
dpolicar
Jun. 26th, 2014 06:35 pm (UTC)
I don't agree that I'm positing any such thing.

I agree that the uncertainty we operate under applies to our actions as well as our perceptions and that as a consequence there's a gulf between the state of the world we attempt to bring about and the state of the world we actually bring about.

And I agree that the same will be true of any automated agents we develop. (That said, you seem to be implicitly assuming that our agents will necessarily be worse at this than we are, which I agree will be true initially and doubt will be true for particularly long, but that seems tangential to the question at hand.)

Said differently: "What is likely to happen in the world when I attempt to make state-change X?" is an important part of making decisions in the real world, just as "what is the likely state of the world when I observe Y?" is.

What I'm positing is that "what ought I do, supposing that the world as I perceive it is the world as it is?" and "which of the state-changes I believe I can bring about ought I attempt?" are also important parts of making decisions in the real world even though I am neither omniscient, nor omnipotent, nor infallible.

And the same is true of non-human agents.
xthread
Jun. 26th, 2014 06:55 pm (UTC)
Yes - you nailed it here:

(That said, you seem to be implicitly assuming that our agents will necessarily be worse at this than we are, which I agree will be true initially and doubt will be true for particularly long, but that seems tangential to the question at hand.)

That is precisely the point I am attempting to make - our experience of the world is that we are absolute crap at making software that does what we intend, on those rare occasions when what we intend is simple and well-defined. In this case, what we intend is neither simple, nor well-defined, and is inherently about balancing explicitly conflicting goals. Our experience of the world is that *those* software systems fail to deliver our expected results most of the time. You expect that it will be a temporary condition, and is therefore tangential, I expect that it will be a permanent condition, and is therefor critical.

Non-human agents are fallible in, mechanically, the same ways that we humans are, but we tend to make moral decisions based upon a kind of fallibility that non-human agents are inherently more vulnerable to than humans are.
dpolicar
Jun. 26th, 2014 07:11 pm (UTC)
Ah, gotcha.

So, yes, there's a point of disagreement here, but it has nothing to do with being infallible or omniscient or omnipotent or the differences between those things, nor with the difficulty of achieving what we intend when it is complex, ill-defined, and involves trade-offs among conflicting goals.... I agree with you about all that stuff.

But, yes, what we've got in our skulls is a collection of good-enough-to-breed-with algorithms for certain kinds of decision-making, including what we call moral decision-making, and I expect that we're well on our way to improving on those algorithms.

But, sure, in the interim period while humans are much better at making such decisions than other agents, it's best to let humans make those decisions, at least in situations where the potential upside or downside is large.
xthread
Jun. 26th, 2014 07:30 pm (UTC)
in the interim period while humans are much better at making such decisions than other agents, it's best to let humans make those decisions, at least in situations where the potential upside or downside is large

Oh, no - that's not the conclusion I come to at all.
When we think about moral decision-making, we're talking to other humans who have very broadly similar goals for themselves to the goals we have for ourselves. 'Don't get yourself killed,' for example, is a rule that I don't need to tell essentially any non-child humans while discussing the execution of inherently dangerous actions. I might need to warn someone that knife is much sharper and more powered than it looks, but most of the time I don't need to worry about the possibility that you're okay with cutting off your fingers in the course of cooking. You don't want to do that, and will work to avoid it.

In the non-human machinery case, we don't have any of that implicit ruleset that we've spent human-years of effort teaching to humans before they are allowed near a wheel. The non-human machinery is only pursuing the goals that we've given it, and then it's only going to pursue them as well as we could communicate those goals and how to evaluate them.

Where that takes me to is that when we talk about setting rules for autonomous machines, we need to assume that those rules are being followed by someone who isn't very good at following rules (not because they're intentionally rule-breaking, but because they aren't very smart) and has no natural understanding of the natural world to fall back on. The rule we state for humans is 'Don't drive faster than is safe. Don't drive faster than 25 on this road whether or not it's safe.' We only add that second half of the rule because we don't want to argue with someone who is going to say 'I thought it was safe (because I wanted to go faster than 25),' and our enforcement strategy is generally 'you'll get pulled over for going faster than the officer who tickets you thinks is safe, unless they have other motivations to cite you.' Which is a much, much more complex ruleset than 'Don't drive faster than 25.'

That leads me to believe that the large scale real tradeoff will be 'how often do autonomous cars kill people by fucking up' vs 'how often do human drivers kill people by fucking up.' Which is a testable hypothesis, happily.

But there's a 100% chance that a self-drive car, somewhere, will occasionally mistake a group of people for a hedge, and plow into them in its attempt to avoid hitting the manhole cover which it has mistaken for a baby carriage.
dpolicar
Jun. 26th, 2014 07:35 pm (UTC)
I don't disagree with any of this.

In general, I don't understand what it is you think I'm saying such that you think most of what you're saying in response is a counterpoint. Mostly, I feel like I've wandered into the middle of an argument you're having with someone else.
xthread
Jun. 26th, 2014 08:05 pm (UTC)
In some sense, you have - you've started saying something to me that other people have also said to me, and we're recapitulating that argument, by and large.

When you think about the philosophical argument, 'What goals should the robot car have,' How would your conversation change if you were having the conversation with a five year old? How would you change the goals you set if you assume that they're going to be carried out by someone who has a very poor understanding of those goals?
dpolicar
Jun. 26th, 2014 08:41 pm (UTC)
I can see how you might find it tedious, in the course of this supposed "recapitulation," to wait for me to actually say what you perceive me as having "started to say" before you explain why that's wrong.

That said, I suspect I'd feel more like I was actually invited to this conversation, as opposed to a spectator for it, if you did so.

Regardless, to answer your question: when setting goals for a system whose ability to judge its environment is poor relative to my own - be it a five-year-old, an unsophisticated self-driving car, a person with poor understanding, a dog, or whatever - I would (as I suggested earlier) in general encourage that system to ask me (or some other competent supervisor) what to do in any situation where it isn't certain what to do, and in most situations where it is.

If for whatever reason that's not possible (e.g., I expect this incompetent agent to operate without supervision) I would do a bunch of up-front analysis of what sorts of situations are most likely and what responses are most likely to have good consequences, turn that analysis into a set of simple if-then rules that I think the system is capable of following, install those rules as best as I can, hope for the best, and make plans for what to do when it all goes to shit. ("Don't answer the door. Don't tell anyone you're home alone. Don't touch anything we haven't explicitly said is OK." Etc.)

If (as with the dog) my ability to install even simple if-then rules in its control system is poor, or if (as with the humans) its ability to remember such rules is poor, I would further narrow down the list of rules, which would default (roughly) to "Sit quietly and don't do anything unless I've told you exactly what to do."
xthread
Jun. 26th, 2014 08:57 pm (UTC)
I can see how you might find it tedious, in the course of this supposed "recapitulation," to wait for me to actually say what you perceive me as having "started to say" before you explain why that's wrong.

Ah. Let me clarify. You have asserted 'Sure, the robots may start out being not very good at following rules, but I'm sure they'll get better quickly, so we can ignore that consideration in what rules we tell the robots to follow.' My experience of computing is that this is a wildly false belief.

I would in general encourage that system to ask some competent supervisor what to do in any situation where it isn't certain what to do, and in most situations where it is.

I believe that essentially all of the relevant cases are ones where there isn't time for the competent supervisor to respond. Should I aim to nearly miss the baby carriage or the crowd on the street usually only comes up after physics hates you, and you the human won't really have any chance to contribute judgment to the situation.

and make plans for what to do when it all goes to shit. This. This is the big idea. In the case we're speaking of, the device's ability to remember rules is good, but our ability to communicate rules to the device is constrained entirely by our ability to incorporate planning what to do when it all goes to shit into that rule system, and by our ability to communicate clearly what the rule system is.

On an orthogonal note, how many people simply ask 'why do we expect that the robot cars won't speed? That sounds like something that someone, somewhere, would make an aftermarket jailbreak for,'
dpolicar
Jun. 26th, 2014 09:19 pm (UTC)
You have asserted 'Sure, the robots may start out being not very good at following rules, but I'm sure they'll get better quickly, so we can ignore that consideration in what rules we tell the robots to follow.'


Can you point me to where I said that?

I can see where I said (more or less) "Sure, the robots may start out being not very good at following rules, but I'm sure they'll get better quickly," (though actually, I expect them to be great at following rules, but I expect the rules they start out following to be ill-adapted to actually doing what we want them to do, which I suspect is close enough for our purposes).

But I can't see where I said "we can ignore that consideration in what rules we tell the robots to follow."

I'd like to know, because that's a damnably foolish thing for me to have said and I'd like to apologize specifically for having wasted so much of our time by saying anything that stupid.

Conversely, if I didn't say it and you nevertheless inferred that I believe something that stupid, I'd like you to apologize, or at the very least stop arguing with stuff I haven't actually said.

all of the relevant cases are ones where there isn't time for the competent supervisor to respond


I'm not sure what you consider relevant anymore. You asked me about five-year-olds, for example, and "ask a competent supervisor" is precisely the approach we adopt for five-year-olds, so it seemed pretty relevant to the question you asked.

But if that's not what we're talking about anymore, OK.

In any case, I agree that asking a supervisor isn't an option in there's-no-time scenarios.

and make plans for what to do when it all goes to shit. This. This is the big idea.


Um... OK.

Regardless of whether it's "the big idea" or not, I certainly agree that creating a rule system that incorporates plans about what to do when it all goes to shit, and communicating those rules, is an important part of programming autonomous agents.

On an orthogonal note, how many people simply ask 'why do we expect that the robot cars won't speed? That sounds like something that someone, somewhere, would make an aftermarket jailbreak for,'


I don't know how many... more than a few, fewer than all.
xthread
Jun. 26th, 2014 09:41 pm (UTC)
You have asserted 'Sure, the robots may start out being not very good at following rules, but I'm sure they'll get better quickly, so we can ignore that consideration in what rules we tell the robots to follow.'

Can you point me to where I said that?


That's how I interpreted this paragraph -

And I agree that the same will be true of any automated agents we develop. (That said, you seem to be implicitly assuming that our agents will necessarily be worse at this than we are, which I agree will be true initially and doubt will be true for particularly long, but that seems tangential to the question at hand.)

In specific that clause, and doubt will be true for particularly long, but that seems tangential to the question at hand, was what lead me down the path we can ignore that consideration in what rules we tell the robots to follow. I didn't think it was damnably foolish, probably because it's a hopelessly naive belief about technology that nonetheless many otherwise reasonable people seem to share. And I'm sorry if I've horribly misinterpreted what you intended to communicate.

I'm in vigorous agreement with your expansion of it, though actually, I expect them to be great at following rules, but I expect the rules they start out following to be ill-adapted to actually doing what we want them to do, which I suspect is close enough for our purposes - I believe the point about which we're disagreeing is over how much better they'll get, and if it will be fast enough that we can restrict ourselves to the philosophical cases of comparing first order desired outcomes. My contention is that the failure cases of 'the robot tried to follow this rule, which produced this result which is not that rule but is as close as the robot was able to get by trying to follow it,' will dominate the space of actual outcomes.
dpolicar
Jun. 26th, 2014 09:57 pm (UTC)
I believe the point about which we're disagreeing is over how much better they'll get,


Yes.

and if it will be fast enough that we can restrict ourselves to the philosophical cases of comparing first order desired outcomes.


No. I've never claimed, and do not believe, that we can (sensibly) restrict ourselves to thinking about first-order desired outcomes. (Tangentially, I really don't know what you mean to imply by calling it a "philosophical" case.)

My contention is that the failure cases of 'the robot tried to follow this rule, which produced this result which is not that rule but is as close as the robot was able to get by trying to follow it,' will dominate the space of actual outcomes.


Sure, I've agreed with this a couple of times now.

This is true of humans as well.

I also agree that in the short term, it will be significantly more true of computers ("robots") than of humans.

If I've understood you, then our primary disagreement is that you don't expect there to be a longer term where the difference becomes less significant, let alone one where computers are more reliably able to achieve their target state-change than humans are... or at least, that you consider that far enough off that spending time thinking about what state-changes such systems ought to target is basically a waste of time.

Whereas I expect that to happen relatively quickly, soon enough that it's worth our while to start thinking about what state-changes such systems ought to target.

Would you agree with that characterization of our disagreement?
( 20 comments — Leave a comment )

Latest Month

November 2014
S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829
30      
Powered by LiveJournal.com
Designed by Taylor Savvy