Chances, counterfactuals and similarity

A happy-making feature of today is that Philosophy and Phenomenological Research have just accepted my paper “Chances, Counterfactuals and Similarity”, which has been hanging around for absolutely ages, in part because I got a “revise and resubmit” just as I was finishing my thesis and starting my new job, and in part because I got so much great feedback from a referee that there was lots to think about.

The way I think about it, it is a paper in furtherance of the Lewisian project of reducing counterfactual facts to similarity-facts between worlds, which feeds into a general interest in what kinds of modal structure (cross-world identities, metrics and measures, stronger-than-modal relations etc) you need to appeal to for metaphysical purposes. Lewis has a distinctive project of trying to reduce all this apparent structure to the economical basis of de dicto modality — what’s true at this world or that — and (local) similarity facts. Counterpart theory is one element of this project: showing how cross-world identities might be replaced by similarity relations and de dicto modality. Another element is the reduction of counterfactuals to closeness of worlds, and closeness of worlds is ultimately cashed out in terms of one world’s fitting another’s laws, and there being large areas where the local facts in each world match exactly. Again, we find de dicto modality of worlds and local similarity at the base.

Lewis’s main development of this view looks at a special case, where the actual world is presupposed to have deterministic laws. But to be general (and presumably, to be applicable to the actual world!) we want to have an account that holds for the situation where the laws of nature are objective-chance-laws. Lewis does suggest a way of extending his account to the chancy case. It’s attacked by Hawthorne in a recent paper—ultimately successfully, I think. In any case, Lewis’s ideas in this area always looked (to me) like a bit of a patch-up job, so I suggest a more principled Lewisian treatment, which then avoids the Hawthorne-style objections to the Lewis original.

The basic thought (which I found in Adam Elga’s work on Humean laws of nature) is that “fitting” chancy laws of nature is not just a matter of not violating those laws. Rather, to fit a chancy law is to be objectively typical relative to the probability function those laws determine. Given this understanding, we can give a single Lewisian account of what comparative similarity of worlds amounts to, phrased in terms of fit. The ambition is that when you understand “fit” in the way appropriate to deterministic laws, you get Lewis’s original (unextended) account. And when you understand “fit” in the way I argue is appropriate to chancy laws, you get my revised suggestion. All very satisfying, if you can get it to work!

10 responses to “Chances, counterfactuals and similarity

  1. Wolfgang Schwarz

    Hi Robbie,

    great stuff, as always. A few comments:

    – Re the division problem: can’t one just say that something counts as remarkable only if it has low probability? That wouldn’t involve counterfactuals. And the reason why well-informed people *would* find the monkey dissertation not very remarkable seems to be precisely because they know that it is not very improbable.

    – Re abundant quasi-miracles: I don’t think it is intuitively right that (on the assumption that the world is slightly atypical/non-random) if you had moved your leg, the future of the world would have been more typical/random than it actually is. In fact, that sounds obviously false to me.

    – Re remarkable subpatterns and lucky runs: you argue that (K_i) should be rejected, but (L_i) accepted. But doesn’t your ‘lucky runs’ proposal render (K_i) true? For in (K_i), the region of the ith coin is especially salient, so local atypicalities here should have more weight.

    In fact, rejecting (K_i) doesn’t seem to go together with rejecting the error theory: it is extremely unlikely that if you flip N coins a million times each, the first one will land all-heads. So (by rejection of the error theory), if you were to flip N coins a million times each, the first one wouldn’t land all-heads. Likewise for the second, third, etc.

    It seems to me that agglomeration is what has to go here.

    Another somewhat obvious worry about your perspective proposal: it doesn’t help in cases where the relevant events all occur in the same region, like series of tosses of co-located ghost coins, or ‘events’ that do not happen at any particular region at all.

    — wo.

  2. Wolfgang Schwarz

    Another problem which your anonymous referee pointed out to me yesterday, when I proposed to him something similar to your proposal: we don’t want “if I tossed a coin four times, it would land heads twice” to be true, even though a world where the coin lands heads twice is arguably closer to actuality by our standards than a world where it lands heads any other number of times.

  3. can’t one just say that something counts as remarkable only if it has low probability?

    Two thoughts: (a) less serious: I doesn’t fit with what I was taking remarkableness to be; (b) more serious: I don’t think it in the end gets us out of the divsion problem.

    Suppose Lewis adopted your line. Then we can say that quasi-miracles are just the remarkable events (his “low-probability” clause now being redundant). So first, I’m worried that I lose my intuitive grip on what remarkableness is. I understand it if it’s just the notion of: being such as to generate surprise in ordinary folk (or somesuch). If now you pack in the low-probability clause, I’m wondering what notion we’re tracking. If it’s just the conjunction of the above with “low probability”, then we’ve just decided to use “remarkable event” in the way that Lewis uses “quasi-miraculous event”: and I just think the arguments against the proposal will need to be reworded.

    On that note, what about this resistant strain of the division problem? Forget about the usual description of the event (“being dealt all-spades” or “monkey typing Hamlet”); think just about each individual way of realizing that event, each of which is low-probability. Aren’t these themselves remarkable, since among other things, they realize an all-spades hand/monkey typing Hamlet? If so, each world with any event that would be remarkable except for the “low probability” clause, will still count as quasi-miraculous in virtue of the remarkableness realization of that event that the world contains. If you can make the case that the individual realizations shouldn’t count as remarkable, then that’s a way of resisting Hawthorne’s division problem independently of building in “low probability”.

    I don’t think it is intuitively right that (on the assumption that the world is slightly atypical/non-random) if you had moved your leg, the future of the world would have been more typical/random than it actually is.

    Noted! I don’t have the same reaction myself. Certainly I don’t think there’s an intuitive case *for* it. But I don’t find it intuitively repugnant. Be nice if I could explain other’s intuitions to the contrary: I say a couple of things in the paper about things going strange when (in effect) you live in a counterinductive world…

    Re remarkable subpatterns and lucky runs: you argue that (K_i) should be rejected, but (L_i) accepted. But doesn’t your ‘lucky runs’ proposal render (K_i) true? For in (K_i), the region of the ith coin is especially salient, so local atypicalities here should have more weight.

    I’m in trouble here because of the dread weasel-word “salient”. The intention was that only stuff in the antecedent should be taken into account. That’d leave the K_i with a common perspective. Moral: I need to say something more about what “salient” comes to in this context.

    rejecting (K_i) doesn’t seem to go together with rejecting the error theory

    I don’t quite follow this. I certainly don’t want to vindicate the move from (A>[B is highly probable]) to (A>B). In rejecting error-theory, all I want to do is to show that for ordinary counterfactuals, you don’t just get (A>[B is highly probable]), you get (A>B) as well.

    Another somewhat obvious worry about your perspective proposal: it doesn’t help in cases where the relevant events all occur in the same region

    That’s a really interesting thought. The obvious thought in the colocation case is to look around for something more fine-grained than regions to play the same role: perhaps even just the fusions of events themselves. I’ll think about it some more. Re events that aren’t locatable in any particular region: could you give an example or two to fix ideas?

  4. On your second comment. Yup, I’ve thought about that a bit. I guess the premise is that a two-heads/two-tails result to a 4-flip sequence is more typical than any other. And then an ordering of worlds built on typicality gives you the result you mention. (One thing I’m wondering is whether there’s a way of rejecting the premise: the formal explications of typicality/randomness I’m aware of (either Gaifman/Snir or the complexity theory approach) work well only in infinite or large-finite cases. Maybe there’s room for saying that what counts as most typical only gets determinate in longer cases? I’m far from sure this direction is a goer, but I think it’s interesting to think through. If I were wanting to be argumentative, I might try to put the burden of proof on my opponent at this point to argue *why* the 2 head/2 tail outcome is the most typical.)

    Suppose we grant the premise. Then it looks like we get the counterfactual you mention. I find it unexpected, but not something obviously crazy (as e.g. “Were I to sneeze, I’d turn into an elephant and gallop away” would be). Again: if overall best theory of counterfactuals delivers the result that this counterfactual is true, I’m happy to go with it.

    That raises the question about the nature of the enterprise. If I was to just aiming for a theory that said no more nor less than ordinary counterfactual judgements, it’d be pretty damning that in short finite cases, the theory delivers verdicts where intuitively we’re (at best) unopinionated about the counterfactuals.

    But I’m not sure that that such extreme non-revisionism should be a constraint on a theory of counterfactuals. What I’d like to see, therefore, is some theoretical costs or tasks that you’d like counterfactuals to do, which accepting the counterfactual you mention would undermine.

    For the record, here’s another couple of very nice objections to the typicality account that Stephan Leuenberger raised, claiming (reasonably enough) that the following are counterintuitive:

    “If, in the next 30 billion years, all dropped plates were to fly off sideways, the universe would go on to exist for much much more than 30 billion years.”

    “If I were to toss this fair coin nine times and always got heads, and were to toss it a tenth time, I would get tails.”

    Arguably, the typicality account renders them both true. The case for the first is that, given the actual laws of nature, to accommodating the antecedent while maintaining overall typicality, we’d have to ensure the universe is “big enough” that the event would count as the sort of local atypicality that doesn’t undermine overall typicality. A similar rationale goes for the second.

    Stephan also suggested a fix to the account that might deal with this, involving distinguishing between how we treat “matching” and “non-matching” areas.

    My sense is that to get clear on all these cases, we need to first think about the methodology of constructing a theory of counterfactuals, the sort of theoretical roles we want it to discharge. Only then can we decide which putative counterexamples need to be accommodated and which we can just accept as “features” of the account.

  5. Wolfgang Schwarz

    On “remarkable” and the division problem: suppose there is a remarkable kind of event E with fairly high probability. It divides into remarkable sub-cases E1,E2,… with low probability. By Lewis’s standards, E1,E2,… do not occur in nearby worlds. So E itself doesn’t occur in nearby worlds. But this is intuitively wrong because E has fairly high probability. That was Hawthorne’s argument. Your response, if I understand it correctly, is that his example of an allegedly remarkable event with high probability isn’t really remarkable because well-informed people wouldn’t find it remarkable. But why not? Because it doesn’t have low probability, I suppose. Moreover, for the response to work in general, it would have to apply to all alleged cases of remarkable high-probability events. So there must not be any remarkable high-probability events.

    I agree that the most straightforward analysis of “remarkable” involves counterfactuals: what people would find surprising. But then Lewis’s analysis becomes obviously circular. So that’s not a very charitable reading. The alternative is to interpret “remarkable” as expressing something like the categorical basis of the disposition to be judged remarkable by well-informed people. A good candidate for that is: being unlikely and non-random.

    On the typicality intuitions, how about this: suppose the world is slightly atypical in that within the last two years, 53% of all (fair) coin tosses on some far-away planet P landed heads. (Otherwise it’s as typical as possible; we’re not considering counterinductive worlds.) Then is it true that if Frege had scratched his head on 4 January 1921, at most 52% of all coin tosses in the last two years would have landed heads on planet P? Even if there is no causal (forward) chain leading from this world in 1921 to the recent history of planet P? Seems clearly false to me.

    On lucky runs: I don’t quite understand your position here. You want to reject

    1) if infinitely many coins were being tossed a million times and I were to toss this coin a million times, then it would land all-heads,

    but not

    2) if infinitely many coins were being tossed a million times, the first one would land all-heads?

    Then what about

    1.5) if infinitely many coins were being tossed a million times and somebody were to toss another coin a million times, then that coin would land all-heads?

    I don’t see a relevant difference between (2) and (1.5), nor between (1.5) and (1).

    On Al’s case: I agree that it is not at all obvious that the 2 heads/2 tails outcome is determinately the most typical, and maybe there really are no convincing cases where the objection works.

    Stephan’s cases sound interesting, too. But I think one can avoid them by saying that what is actually a fair coin might not be fair in nearby worlds where it lands heads most of the time: maybe the laws are slightly different at that world? If we have to hold fixed the laws, Stephan’s cases work. But Lewis doesn’t tell us that we have to do that. We’re only meant to assure that not much happens at the other worlds that contradicts the laws of our world. Contradicting a law that all Fs are Gs means having Fs that are not Gs. Having a different overall history that makes “all Fs are G” a contingent regularity rather than a law doesn’t count as contradiciting the law, I think. Otherwise rough overall similarity would after all be quite important.

    (BTW, I always have to submit my comments twice. Is this intended?)

  6. That’s actually interesting: your proposal, unlike Lewis’s, assigns high weight to a certain kind of overall similarity between nearby worlds and the actual world. A world with a fairly probable, but very undermining future (in the ‘big bad bug’ sense of “undermining”) will count as far away by your standards, but not as quasi-miraculous by Lewis’s standards.

    Lewis’s standards seem to be better in this respect: it seems that your account is potentially open to Stephan’s objections as well as all kinds of Nixon-style objections. Intuitively, there are many ways the world could easily have gone that are undermining. Many undermining futures have after all rather high probability. Suppose at some time t, there was an 0.2 chance that the world would go F, and F would have been even more likely if some other event E had taken place at t. And suppose F is undermining. Then it seems that on your account, the world would not have gone F if E had taken place. But that sounds false.

  7. Wow: following all four or five lines of thought at once is a challenge!

    On the division problem. I see your tactic, of looking for the categorical basis for the disposition to be found surprising. That seems a nice line to pursue. It’s the details that I’m worried about. Your thought is that you’ll get remarkable as: being unlikely and being F. For this to avoid the force, as well as the letter, of the division problem, you’re going to have to argue that the subcases into which the overall event divides don’t have feature F. Otherwise (since they’re each unlikely) they’ll all be unlikely and F; and so count as quasi-miraculous; and so not feature in nearby worlds.

    On typicality intuitions. One thing to mention, I guess, is that exact match across a region is going to trump avoiding localized atypicalities in that region; so if you can have the coin tosses on the planet exactly matching in the counterfactual scenario, we won’t get the counterfactual you dislike (unless the idea is that what happens on the far-away planet is enough to make the world as a whole atypical?) If you can’t get exact match in the planet, then I *really* don’t think that you can expect a Lewisian account sustain the kind of preservation of actual world facts of the kind you mention.

    More fundamentally, I guess I just don’t share the intuitions you mention here. I guess my intuitions about counterfactuals are pretty Jacksonite: before the time of the antecedent event, we should expect exact match of particular fact. Afterwards, we should expect fit with laws of nature. So I’m quite happy with the counterfactual you mention.

    On lucky runs etc: what I was thinking is that I will have to replace the talk of “salient” region with something more substantive. I didn’t say much about what that was to be in the previous comment. I still think it needs to be thought through properly.

    But here’s one thing to think about. Let the salient region be that region containing the event that antecedent is “about”. In the intended sense of “about”, “1 billion coins are flipped and the ith of these coins is flipped” is about the same event as “1 billion coins are flipped”. (I’m hoping that proposition/event “aboutness” can be defined in ways parallel to proposition/object “aboutness”)

  8. There are some things I’m not quite following. They sound really intriguing though: would welcome enlightenment!

    (1) I’m not sure I get the line on Stephan’s objection. I thought his objections worked exactly by looking at what was required to minimize lack of fit (atypicality) by the lights of *actual* laws about chances concerning balanced coins. Sure, balanced coins may be governed by other laws in counterfactual scenarios; perhaps then, the chances of them landing heads will be different. But I don’t see how that makes the scenarios we want to be closest, any more typical by the lights of the actual laws. Could you elaborate?

    (2) I’m not quite seeing the undermining case you mention. I guess I need it spelt out a bit more detail. Can you find a toy example for us to think about? It’s complicated, because chances get updated over time. Also: is the problem meant to be independent of our being Humeans about laws/chance?

    (3) I’m not seeing how to generate Nixon-style cases for my account. I see the analogy: overall simiality is in play in the Nixon case; and overall similarity of *laws* is incorporated into mine. But the Nixon cases arise if we weight heavily *approximate* overall similarity of *matters of particular fact*. And my account only weights heavily overall similarity in respect of fitting-the-laws. So some of the active ingrediants seem missing. Was there a specific worry you had in mind here?

  9. Wolfgang Schwarz

    Just on your last two comments (gotta think about the others later):

    Suppose the actual world fits the laws quite well, but its first 100 years didn’t (like the first few tosses of a fair coin sometimes all land heads). Now what if there had been a doomsday machine attached to a fair coin tossing 100 years after the Big Bang such that on tails, the world would have gotten destroyed? I think it’s definitely not true that this coin would have landed heads. It could just as well have landed tails. But the world with the doomsday future fits the laws far less than a world with a future closer to actuality. So if similarity is measured in terms of fit, we’d have to say that if there had been such a setup, the coin would definitely have landed heads.

    The general point is that if some events have a reasonably high probability of becoming realized under certain conditions, it always seems false to say that they wouldn’t have been realized had those conditions obtained. But events with reasonably high probability can be part of histories with very imperfect fit.

    Many Nixon cases seem to be of this kind: there could easily have happened something that would have radically changed the entire future of the world in matters of particular fact so that these facts would not fit the actual laws as well as they actually do.

    I don’t think any of this presupposes a Humean account of laws or chance.

    (I hope this comment doesn’t appear a dozen times. It looks like nothing happens so I keep resubmitting.)

  10. Hi Wo,

    Ok, I see it. I’m hoping the undermining case you mention will stand and fall with what Stephan calls the “counterfactual gambler’s fallacy” (If you toss a balanced coin only 10 times, and the first nine times it comes up heads, then the tenth time it’ll come up heads too). As I mentioned, there are some tweaks to the definition of closeness that seem to get me out of that (Stephan suggested them).

    But I want to think the tweaks through, and the methodology of going in for them, before setting this out! I think there’s pretty important issues about the methodology here: what weight we give intuitions, relative to different theoretical aims we might have for constructing a similarity ordering.

    One basic thought: let’s define counterfactuals* via an ordering that’s stipulated to be the one I set out in the paper. Now, at the moment I still think that counterfactuals* agree with ordinary counterfactuals on a great range of the central cases (unlike Lewis’s quasi-miracles approach, if Hawthorne is right, which goes towards error-theory). It does seem that in some (at the moment, fairly recherche) cases they come apart: Stephan’s cases illustrate this, and yours give some more (of course, you’ve also been pressing some internal tensions in the account e.g. the lucky runs stuff). One approach would be to tweak definitions so that counterfactuals* agree more and more with intuitions about counterfactuals. That’s fine, and an interesting project (though I’m worried about it just turning into an exercise in monster-barring.)

    What I want to think about is to which differences between counterfactuals and counterfactuals* would matter. I think they’d matter (a) if the differences generalize to create problems for everyday counterfactual judgements; (b) if the differences prevent us from giving the kind of constructive accounts of causation, freedom, dispositions etc in (pseudo)counterfactual terms, that we’d like.

    As an illustration of (a): the original Nixon case showed that “approximate overall similarity” just led to the wrong results all over the place, giving the wrong result for lots of common counterfactuals. As an illustration of (b): an error-theory of ordinary counterfactual judgements may well be able to explain away intuitions pragmatically. But presumably when we try to give counterfactual analyses of causality, dispositions etc, the problem will “bleed up” and give us horrible results about what causes what, what dispositions things have, and so on.

    Now, I don’t clearly see how to turn the kind of cases we’ve been talking about into things of this kind. But perhaps they can: one task is just to think this through carefully and see whether there’s anything useful to be said one way or the other.

    In sum: I think that it’s a really delicate matter, how ambitious we should be for the kind of Lewisian reduction of similarity of worlds. Think of this as the task of figuring out which putative counterfactual propositions we can leave as “spoils for the victor”, and which we really need to insist upon matching the intuive judgement. Not clear to me at all that the literature has really addressed this question, which seems fundamental.

    p.s. sorry about the problems submitting comments. I’ve just moved to blogger beta: not clear that was a good move.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s