Vagueness survey paper: II (vague metalanguages)

Ok, so here’s part II of the paper. One thing that struck me having taught this stuff over the last few years is how much the “vague metalanguages” thing strikes people as ad hoc if brought in at a late stage—it feels that we were promised something—-effectively, something that’d relate vague terms to a precise world—and we’ve failed. And so what I wanted to do is put that debate up front, so we can be aware right from the start that there’ll be an issue here.

When writing this, it seemed to me that effectively the Evans argument arises naturally when discussing this stuff. So it seemed quite nice to put in those references.

Sorry for the missing references, by the way—that’s going to be fixed once I look up what the style guide says about them. (And I’ll add some extras).

USING VAGUENESS TO HANDLE VAGUENESS

One textbook style of semantic theory assigns extensions (sets of objects) as the semantic values of vague terms (Heim & Kratzer xxxx). This might seem dubious. Sets of objects as traditionally conceived are definite totalities—each object is either definitely a member, or definitely not a member. Wouldn’t associating one of such totalities with a vague predicate force us, unjustifiably, to “draw sharp boundaries”?

On the other hand, it seems that we easily say which set should be the extension of “is red”:

[[“red”]]={x: x is red}

There’s no need for this to be directly disquotational:

[[“rouge”]]={x: x is red}

The trick here is to use vague language (“in the theorist’s metalanguage”) to say what the extension should be (“of the object-language predicate”). If this is legitimate, there’s no obstacle to taking textbook semantics over wholesale.

Perhaps the above seems unsatisfactory: one is looking for illumination from one’s semantic clauses. So, for example, one might want to hear something in the semantics about dispositions to judge things red, to reflect the response-dependent character of redness. It’s highly controversial whether this is a reasonable demand. One might suspect that it mixes up the jobs of the semanticist (to characterize the representational properties of words) and the metaphysician (to say what redness is, in more fundamental terms). But even if one wants illumination wherever possible in one’s semantic theory, there’s not even a prima facie problem here, so long one is still able to work within a vague metalanguage. Thus Lewis, in discussing his semantic treatment of counterfactuals in terms of the (admittedly vague) notion of similarity, says “I … seek to rest an unfixed distinction upon a swaying foundation, claiming that the two sway together rather than independently.” (Lewis, 1973, p.92). While Lewis recognizes that “similarity” is vague, he thinks this is exactly what we need to faithfully capture the vagueness of counterfactuals in an illuminating way. One might see trouble for the task of constructing a semantics, if one imposed the requirement that the metalanguage should (at least ideally?) be perfectly “precise”. But one would have to be very clear about why such a strong requirement was being imposed.

Let us leave this worry to those bold enough to impose such constraints on the theorist of language. Are there further problems with textbook semantics?

One worry might be that the resources it appeals to are ill-understood. Let us go back to the thought that sets (as traditionally conceived) are definite totalities. Then borderline-bald Harry (say) is either definitely in or definitely out of any given set. But Harry better be a borderline member of {x: x is bald}. Don’t we now have to go and provide a theory of these new and peculiar entities (“vague sets”) — who knows where that will lead us?

The implicit argument that we’re dealing with “new” entities here can be formulated as follows:

(1) Harry is definitely a member of S

(2) It is not the case that Harry is definitely a member of {x: x is bald}.

(3) So: S is not identical to {x: x is bald}

(Parallel considerations can be given for the non-identity of {x: x is bald} with sets Harry is definitely not a member of). The argument seems appealing, seeming to appeal only to the indiscernability of identicals. If we suppose S and {x: x is bald} to be identical, then we should be able to swap one for the other in the following context without change of truth value:

Harry is definitely a member of ….

But (1) and (2) show us this doesn’t happen. (3) follows by reductio.

The issues this raises are discussed extensively in the literature on the “Evans-Salmon” argument (and in parallel debates on the indiscernability of identicals in connection to modality and tense). One moral from that discussion is that the argument given above is probably not valid as it stands. Very roughly, “{x: x is bald}” can denote some precisely bounded set of entities, consistent with everything we’ve said, so long as it is indefinite which such totality it denotes. Interested readers are directed to (cite cite cite) for further discussion.

Vague metalanguages seem legitimate; and there’s no reason as yet to think that appeal to vaguely specified sets commits one to a novel “vague set theory”. But we still face the issue of how the distinctive puzzles of vagueness are to be explained.

Vagueness survey paper I (puzzles)

I’ve been asked to write a survey paper on vagueness. It can’t be longer than 6000 words. And that’s a pretty big topic.

I’ve been wondering how best to get some feedback on whether I’m covering the right issues, and, of course, whether what I’m doing is fair and accurate. So what I thought is that I’d post up chunks of my first draft of the paper here—maybe 1000 words at a time, and see what happens. So comments *very much* welcomed, whether substantive or picky.

Basically, the plan for the paper is that it be divided into five sections (it’s mostly organized around the model theory/semantics, for reasons to do with the venue). So first I have a general introduction to the puzzles of vagueness, of which I identify two: effectively, soriticality, and borderline cases. Then I go on to say something about giving a semantics for vagueness in a vague metalanguage. The next three sections give three representative approaches to “semantics”, and their take on the original puzzles. First, classical approaches (taking Williamsonian epistemicism as representative). Second, something based on Fine’s supervaluations. And third, many valued theories. I chose to focus on Field’s recent stuff, even though it’s perhaps not the most prominent, since it allows me to have some discussion of how things look when we have a translate-and-deflate philosophy of language rather than the sort of interpretational approach you often find. Finally, I have a wrap up that mentions some additional issues (e.g. contextualism), and alternative methodologies/approaches.

So that’s how I’ve chosen to cut the cake. Without more ado, here’s the first section.

Vagueness Survey Paper. Part I.

The puzzles of vagueness

Take away grains, one by one, from a heap of rice. At what point is there no longer a heap in front of you? It seems hard to believe that there’s a sharp boundary – a single grain of rice removing which turns a heap into a non-heap. But if removing one grain can’t achieve this, how can removing a hundred do so? It seems small changes can’t make a difference to whether or not something is a heap; but big changes obviously do. How can this be, since big changes are nothing but small changes chained together?

Call this the “little by little” puzzle.

Pausing midway through removing grains from the original heap, ask yourself: “is what I have at this moment a heap?” At the initial stages, the answer will clearly be “yes”. At the late stages, the answer will clearly be “no”. But at intermediate stages, the question will generate perplexity: it’s not clearly right to say “yes”, nor is it clearly right to say “no”. A hedged response seems better: “it sorta is and sorta isn’t”, or “it’s a borderline case of a heap”. Those are fine things to say, but they’re not a direct answer to the original question: is this a heap? So what is the answer to that question when confronted with (what we can all agree to be) a borderline case of a heap?

Call this the “borderlineness” puzzle.

Versions of the “little by little” and “borderline case” puzzles are ubiquitous. As hairs fall from the head of a man as he ages, at what point is he bald? How could losing a single hair turn a non-bald man bald? What should one say of intermediate “borderline” cases?  Likewise for colour terms: a series of colour patches may be indiscriminable from one another (put them side by side and you couldn’t tell them apart). Yet if they vary in the wavelength of light they reflect, chaining enough such cases might take one from pure red to pure yellow.

As Peter Unger (cite) emphasized, we can extend the idea further. Imagine an angel annihilating the molecules that make up a table, one by one. Certainly at the start of the process annihilating one molecule would still leave us with a table; at the end of the process we have a single molecule—no table that! But how could annihilating a single molecule destroy a table? It’s hard to see what terms (outside mathematics) do not give rise to these phenomena.

The little by little puzzle leads to the sorites paradox (from “sorites” – the Greek word for “heap”). Take a line of 10001 adjacent men, the first with no hairs, the last with 10000 hairs, with each successive man differing from the previous by the addition of a single hair (we call this a sorites series for “bald”). Let “Man N” name the man with N hairs.

Obviously, Man 0 is bald. Man 10000 is not bald. Furthermore, the following seem entirely reasonable (in fact, capture the thought that “a single hair can’t make a difference to baldness”):

No consider the following collection of horrible-sounding claims:

(1):       Man 0 is bald, and man 1 is not bald.

(2):       Man 1 is bald, and man 2 is not bald.

(10000): Man 99999 is bald, and man 10000 is not bald.

It seems that each of these must be rejected, if anything in the vicinity of the “little differences can’t make a difference” principle is right. But if we reject the above, surely we must accept their negations:

(1*):     it is not the case that: Man 0 is bald, and man 1 is not bald.

(2*):     it is not the case that: Man 1 is bald, and man 2 is not bald.

(10000*): it is not the case that: Man 99999 is bald, and man 10000 is not bald.

But given the various (N*), and the two obvious truths about the extreme cases, a contradiction follows.

One way you can see this is by noting that each (N*) is (classically) equivalent to the material conditional reading of:

(N**) if Man N-1 is bald, then Man N is bald

Since Man 0 is bald, a series of Modus Ponens inferences allow us to derive that Man 10000 was bald, contrary to our assumptions.

Alternatively, one can reason with the (N*) directly. Suppose that Man 9999 is bald. As we already know that Man 10000 was not bald, this contradicts (10000*). So, by reductio, Man 9999 is not bald. Repeat, and one eventually derives that Man 0 is not bald, contrary to our assumptions.

Whichever way we go, a contradiction follows from our premises, so we must either find some way of rejecting seemingly compelling premises, or find a flaw in the seemingly obviously valid reasoning.

We turn next to the puzzle of borderlineness: that given Harry is intermediate between clear cases and clear non-cases of baldness, “Is Harry bald?” seems to have no good, direct, answer.

There are familiar cases where we cannot answer such questions: if I’ve never seen Jimmy I might be in no position to say whether he’s bald, simply because I don’t know one way or the other. And indeed, ignorance is one model one could appeal to in the case of borderlineness. If we simply don’t know whether or not Harry is bald, that’d explain why we can’t answer the question directly!

This simply moves the challenge one stage back, however. Why do we lack knowledge? After all, it seems we can know all the relevant underlying facts (the number and distribution of hairs on a man’s head). Nor is does there seem to be any way of mounting a sensible inquiry into the question, to resolve the ignorance, unlike in the case of Jimmy. What kind of status is this, where the question of baldness is not only something we’re in no position to answer, but where we can’t even conceive of how to go about getting in a position to answer? What explains this seemingly inevitable absence of knowledge?

A final note on borderline cases. It’s often said that if Harry is a borderline case of baldness, then it’s indefinite or there’s no fact of the matter or it is indeterminate whether Harry is bald, and I’ll do so myself. Now, indeterminacy may be a more general phenomenon than vagueness (it’s appealed to in cases that seem to have little to do with sorites series: future contingents, conditionals, quantum physics, set theory); but needless to say, labeling  borderline cases as “indeterminate” doesn’t explain what’s going on with them unless one has a general account of indeterminacy to appeal to.

Nominalizing statistical mechanics

Frank Arntzenius gave the departmental seminar here at Leeds the other day. Given I’ve been spending quite a bit of time just recently thinking about the Fieldian nominalist project, it was really interesting to hear about his updating and extension of the technical side of the nominalist programme (he’s working on extending it to differential geometry, gauge theories and the like).

One thing I’ve been wondering about is how  theories like statistical mechanics fit into the nominalist programme. These were raised as a problem for Field in one of the early reviews (by Malament). There’s a couple of interesting papers recently out in Philosophia Mathematica on this topic, by Glen Meyer and Mark Colyvan/Aidan Lyon. Now, one of the assumptions, as far as I can tell, is that even sticking with the classical, Newtonian framework, the Field programme is incomplete, because it fails to “nominalize” statistical mechanical reasoning (in particular, stuff naturally represented by measures over phase space).

Now one thing that I’ll mention just to set aside is that some of this discussion would look rather different if we increased our nominalistic ontology. Suppose that reality, Lewis-style, contains a plurality of concrete, nominalistic, space-times—at least one for each point in phase space (that’ll work as an interpretation of phase space, right?). Then the project of postulating qualitative probability synthetic structure over such worlds from which a representation theorem for the quantitative probabilities of statistical mechanics looks far easier. Maybe it’s still technically or philosophically problematic. Just a couple of thoughts on this. From the technical side, it’s probably not enough to show that the probabilities can be represented nominalistically—we want to show how to capture the relevant laws. And it’s not clear to me what a nominalistic formulation of something like the past hypothesis looks like (BTW, I’m working with something like the David Albert picture of stat mechanics here). Philosophically, what I’ve described looks like a nominalistic version of primitive propensities, and there are various worries about treating probability in this primitive way (e.g. why should information about such facts constrain credences in the distinctive way information about chance seems to)? I doubt Field would want to go in for this sort of ontological inflation in any case, but it’d be worth working through it as a case study.

Another idea I won’t pursue is the following: Field in the 80’s was perfectly happy to take a (logical) modality as primitive. From this, and nominalistic formulations of Newtonian laws, presumably a nomic modality could be defined. Now, it’s one thing to have a modality, another thing to earn the right to talk of possible worlds (or physical relations between them). But given that phase space looks so much like we’re talking about the space of nomically possible worlds (or time-slices thereof) it would be odd not to look carefully about whether we can use nomic modalities to help us out.

But even setting these kind of resources aside, I wonder what the rules of the game are here. Field’s programme really has two aspects. The first is the idea that there’s some “core” nominalistic science, C. And the second claim is that mathematics, and standard mathematized science, is conservative over C. Now, if the core was null, the conservativeness claim would be trivial, but nobody would be impressed by the project! But Field emphasizes on a number of occasions that the conservativeness claim is not terribly hard to establish, for a powerful block of applied mathematics (things that can be modelled in ZFCU, essentially).

(Actually, things are more delicate than you’d think from Science without Numbers, as emerged in the JPhil exchange between Shapiro and Field. The upshot, I take it  if (a) we’re allowed second-order logic in the nominalistic core; or (b) we can argue that best justified mathematized theory aren’t quite the usual versions, but systematically weakened versions; then the conservativeness results go through).

As far as I can tell, we can have the conservativeness result without a representation theorem. Indeed, for the case of arithmetic (as opposed to geometry and Newtonian gravitational theory) Field relies on conservativeness without giving anything like a representation theorem. I think therefore, that there’s a heel-digging response to all this open to Field. He could say that phase-space theories are all very fine, but they’re just part of the mathematized superstructure—there’s nothing in the core which they “represent”, nor do we need there to be.

Now, maybe this is deeply misguided. But I’d like to figure out exactly why. I can think of two worries: one based on loss of explanatory power; the other on the constraint to explain applicability.

Explanations. One possibility is that nominalistic science without statistical mechanics is a worse theory than mathematized science including phase space formulations—in a sense relevant to the indispensibility argument. But we have to treat this carefully. Clearly, there’s all sorts of ways in which mathematized science is more tractable than nominalized science—that’s Field’s explanation for why we indulge in the former in the first place. One objective of the Colyvan and Lyon article cited earlier is to give examples of the explanatory power of stat mechanical explanations, so that’s one place to start looking.

Here’s one thought about that. It’s not clear that the sort of explanations we get from statistical mechanics, cool though they may be, are of relevantly similar kind to the “explanations” given in classical mechanics. So one idea would be to try to pin down this difference (if there is one) and figure out how they relate to the “goodness” relevant to indispensibility arguments.

Applicability. The second thought is that the “mere conservativeness” line is appropriate either where the applicability of the relevant area of mathematics is unproblematic (as perhaps in arithmetic) or where there aren’t any applications to explain (the higher reaches of pure set theory). In other cases—like geometry, there is a prima facie challenge to tell a story about how claims about abstracta can tell us stuff about the world we live in. And representation theorems scratch this itch, since they show in detail how particular this-worldly structures can exactly call for a representation in terms of abstracta (so in some sense the abstracta are “coding for” purely nominalistic processes—“intrinsic processes” in Field’s terminology). Lots of people unsympathetic to nominalism are sympathetic to representation theorems as an account of the application of mathematics—or so the folklore says.

But, on the one hand, statistical mechanics does appear to feature in explanations of macro-phenomena; and second, the reason that talking about measures over some abstract “space” can be relevant to explaining facts about ripples on a pond is at least as unobvious as the applications of geometry.

I don’t have a very incisive way to end this post. But here’s one thought I have if the real worry is one of accounting for applicability, rather than explanatory power. Why think in these cases that applicability should be explained via representation theorems? In the case of geometry, Newtonian mechanics etc, it’s intuitively appealing to think there are nominalistic relations that our mathematized theories are encoding. Even if one is a platonist, that seems like an attractive part of a story about the applicability of the relevant theories. But when one looks at statistical mechanics, is there any sense that it’s applicability would be explained if we found a way to “code” within Newtonian space-time all the various points of phase space (and then postulate relations between the codings)? It seems like this is the wrong sort of story to be giving here. That thought goes back, I guess, to the point raised earlier in the “modal realist” version: even if we had the resources, would primitive nominalistic structure over some reconstruction of configuration of phase space really give us an attractive story about the applicability of statistical mechanical probabilities?

But if representation theorems don’t look like the right kind of story, what is? Can the Lewis-style “best theory theory” of chance, applied to stat mechanical case (as Barry Loewer has suggested) be wheeled in here? Can the Fieldian nominalist just appeal to (i) conservativeness (ii) the Lewisian account of how the probability-invoking theory and laws gets fixed by the patterns of nominalistic facts in a single classical space? Questions, questions…

Error theories and Revolutions

I’ve been thinking about Hartry Field’s nominalist programme recently. In connection with this (and a draft of a paper I’ve been preparing for the Nottingham metaphysics conference) I’ve been thinking about parallels between the error theories that threaten if ontology is sparse (e.g. nominalistic, or van Inwagenian); and scientific revolutions.

One (Moorean) thought is that we are better justified in our commonsense beliefs (e.g. `I have hands’) than we could be in any philosophical premises incompatible with them. So we should always regard “arguments against the existence of hands” as reductios of the premises that entail that one has no hands. This thought, I take it, extends to commonsense claims about the number of hands I possess. Something similar might be formulated in terms of the comparative strength of justification for (mathematicized) science as against the philosophical premises that motivate its replacement.

So presented, Field (for one) has a response: he argues in several places that we exactly lack good justification for the existence of numbers. He simply rejects the premise of this argument.

A better way presentation of the worry focuses, not on the relative justification for one’s beliefs, but on conditions under which it is rational to change one’s beliefs. I presently have a vast array of beliefs that, according to Field, are simply false.

Forget issues of relative justification. It’s simply that the belief state I would have to be in to consistently accept Field’s view is very distant from my own—it’s not clear whether I’m even psychologically capable of genuinely disbelieving that if there are exactly two things in front of me, then the number of things in front of me is two. (If you don’t feel the pressure in this particular case, consider the suggestion that no macroscopic objects exist—then pretty much all of your existing substantive beliefs are false). Given my starting set of beliefs, it’s hard to see how speculative philosophical considerations could make it rational to change my views so much.

Here’s one way of trying to put some flesh on this general worry. In order to assess an empirical theory, we need to measure it against relevant phenomena to establish theory’s predictive and explanatory power. But what do we take these phenomena to be? A very natural thought is that they include platitudinous statements about the positions of pointers on readers, statements about how experiments were conducted, and whatever is described by records of careful observation. But Field’s theory says that the content of numerical records of experimental data will be false; as will be claims such as “the data points approximate an exponential function”. On a van Inwagenian ontology, there are no pointers, and experimental reports will be pretty much universally false (at least on an error-theoretic reading of his position). Sure, each theorist has a view on how to reinterpret what’s going on. But why should we allow them to skew the evidence to suit their theory? Surely, given what we reasonably take the evidence to be, we should count their theories as disastrously unsuccessful?

But this criticism is based on certain epistemological presuppositions, and these can be disputed. Indeed Field in the introduction to Realism Mathematics and Modality (preemptively) argues that the specific worries just outlined are misguided. He points to cases he thinks analogous, where scientific evidence has forced a radical change in view. He argues that when a serious alternative to our existing system of beliefs (and rules for belief-formation) is suggested to us, it is rational to (a) bracket relevant existing beliefs and (b) consider the two rival theories on their individual merits, adopting whichever one regards as the better theory. The revolutionary theory is not necessarily measured against we believe the data to be, but against what the revolutionary theory says the data is. Field thinks, for example, that in the grip of a geocentric model of the universe, we should treat `the sun moves in absolute upward motion in the morning’ as data. However, even for those within the grip of that model, when the heliocentric model is proposed, it’s rational to measure its success against the heliocentric take on what the proper data is (which, of course, will not describe sunrises in terms of absolute upward motion). Notice that on this model, there’s is effectively no `conservative influence’ constraining belief-change—since when evaluating new theories, one’s prior opinions on relevant matters are bracketed.

If this is the right account of (one form of) belief change, then the version of the Moorean challenge sketched above falls flat (maybe others would do better). Note that for this strategy to work, it doesn’t matter that philosophical evidence is more shaky than scientific evidence which induces revolutionary changes in view—Field can agree that the cases are disanalogous in terms of the weight of evidence supporting revolution. The case of scientific revolutions is meant to motivate the adoption of a certain epistemology of belief revision. This general epistemology, in application to the philosophy of mathematics, tells us we need not worry about the massive conflicts with existing beliefs that so concerned the Mooreans.

On the other hand, the epistemology that Field sketches is contentious. It’s certainly not obvious that the responsible thing to do is to measure revisionary theory T against T’s take on the data, rather than against one’s best judgement about what the data is. Why bracket what one takes to be true, when assessing new theories? Even if we do want to make room for such bracketing, it is questionable whether it is responsible to pitch us into such a contest whenever someone suggests some prima facie coherent revolutionary alternative. A moderated form of the proposal would require there to be extant reasons for dissatisfaction with current theory (a “crisis in normal science”) in order to make the kind of radical reappraisal appropriate. If that’s right, it’s certainly not clear whether distinctively philosophical worries of the kind Field raises should count as creating crisis conditions in the relevant sense. Scientific revolutions and philosophical error theories might reasonably be thought to be epistemically disanalogous in a way unhelpful to Field.

Two final notes. It is important to note what kind of objection a Moorean would put forward. It doesn’t engage in any way with the first-order case that the Field constructs for his error-theoretic conclusion. If substantiated, the result will be that it would not be rational for me (and people like me) to come to believe the error-theoretic position.

The second note is that we might save the Fieldian ontology without having to say contentious stuff in epistemology, by pursuing reconciliation strategies. Hermeneutic fictionalism—for example in Steve Yablo’s figuralist version—is one such. If we never really believed that the number of peeps was twelve, but only pretended this to be so, then there’s no prima facie barrier from “belief revision” considerations that prevents us from explicitly adopting a nominalist ontology. Another reconciliation strategy is to do some work in the philosophy of language to make the case that “there are numbers” can be literally true, even if Field is right about the constituents of reality. (There are a number of ways of cashing out that thought, from traditional Quinean strategies, to the sort of stuff on varying ontological commitments I’ve been working on recently).

In any case, I’d be really interested in people’s take on the initial tension here—and particularly on how to think about rational belief change when confronted with radically revisionary theories—pointers to the literature/state of the art on this stuff would be gratefully received!

Justifying scoring rules

In connection to this paper, I’ve been thinking some more about what grounds we might have for saying substantive things about how “scoring rules” should behave.

Quick background. Scoring rules rank either credences in a single proposition, or whole credence functions (depending on your choice of poison) against the actual truth values. For now, let’s concentrate on the single-proposition case. In the context we’re interested in, they’re meant to measure “how (in)accurate” the credences are. I’ll assume that scoring rules take the form s(x,y), where x is the credence, and y the truth value of the salient proposition (1 for truth, 0 for falsity). You’d naturally expect a minimal constraint to be:

(Minimal 1)  s(1,1)=s(0,0)=1; s(0,1)=s(1,0)=0.

(Minimal 2) s is a monotone increasing function in x when y=1. s is a monotone decreasing function in x when y=0.

Basically, this just says that credences 1 and 0 are maximally and minimally accurate, and you never decrease in accuracy by moving closer to the truth value.

But to make arguments from scoring rules for probabilism run, we need a lot more structure. Where do we get it from?

There’s a prior question: what’s the nature of a scoring rule in the first place? There’re a couple of thoughts to have here. One is that scoring rules are *preferences* of agents. Different agents can have different scoring rules, and the relevant preference-ordering aims to capture the subjective value the agent attaches to having *accurate* credences.

Now, various hedges are needed at this point. Maybe having certain credences make you feel warm and fuzzy, and you prefer to have those feelings no matter what. We need to distill that stuff out. Moreover, maybe you value having particular credences in certain situations  because of their instrumental value—e.g. enabling you indirectly to get lots of warm fuzzy stuff. One strong thesis about scoring rules is that they give the *intrinsic* value that the agent attaches to a certain credence/truth value state of affairs—her preferences given that alethic accuracy is all she cares about. However tricky the details of this are to spell out, the general story about what the scoring rule aim to describe is pretty clear—part of the preferences of individual agents.

A different kind of view would have it that the scoring rule describes a more objective beast: facts about which credences are better than which others (as far as accuracy goes). Presumably, if there are such betterness facts, this’ll provide a standard for assessing people’s alethic preferences in the first sense.

On either view, the trick will be to justify the claim that the scoring rule have certain formal features X. Then one appeals to a formal argument that shows that for every incoherent credence c, there’s a coherent credence d which is more accurate (by the lights of the scoring rule) than c no matter what the actual truth values are—supposing only that the scoring rule has feature X. Being “accuracy dominated” in this way is supposed to be an epistemic flaw (at least a pro tanto one). [I’m going to leave discussion of how *that* goes for another time]

Ok. But how are we going to justify features of scoring, other than the minimal constraints above? Well, Joyce (1998) proceeds by drawing out what he regards as unpleasant consequences of denying a series of formal constraints on the scoring rule. Though it’s not *immediately obvious* that to be a “measure of accuracy” scoring rules need to do more than satisfy *minimal*, you may be convinced by the cases that Joyce makes. But what *kind* of case does he make? One thought is that it’s a kind of conceptual analysis. We have the notion of accuracy, and when we think carefully through what can happen if a measure doesn’t have feature X, we see that whatever its other merits, it wouldn’t be a decent way to measure anything deserving the name *accuracy*.

Whether or not Joyce’s considerations are meant to be taken this way (I rather suspect not), it’s at least a very clean project to engage in. Take scoring rules to be preferences. Then a set of preferences that didn’t have the formal features just wouldn’t be preferences solely about accuracy—as was the original intention. Or take an objective betterness ordering. If it’s evaluating credence/world pairs on grounds of accuracy, again (if the conceptual analysis of accuracy was successful) it better have the features X, otherwise it’s just not going to deserve the name.

But maybe we can’t get all the features we need through something like conceptual analysis. One of Joyce’s features—convexity—seems to be something like a principle of epistemic conservativism (that’s the way he has recently presented it). It doesn’t seem that people would be conceptually confused if they took their alethic preferences didn’t violate this principle. Where would this leave us?

If we’re thinking of the scoring rule as an objective betterness relation, then there seems plenty of room for thinking that the *real facts* about accuracy encode convexity, even if one can coherently doubt that this is so (ok, so I’m setting aside open-question arguments here, but I was never terribly impressed by them). And conceptual analysis is not the only route to justifying claims that the one true scoring rule has such a feature. Here’s one alternative. It turns out that a certain scoring rule—the Brier score—meets all Joyce’s conditions and more besides. And it’s a very simple, very well behaved scoring rule, that generalizes very nicely in all sorts of ways (Joyce (2009) talks about quite a few nice features of it in the section “homage to the Brier score”). It’s not crazy to think that, among parties agreed that there is some “objective accuracy” scoring rule out there to be described, considerations of simplicity, unity, integration and other holistic merits might support the view that the One True measure of (in)accuracy is given by the Brier score.

But this won’t sound terribly good if you think that scoring rules describe individual preferences, rather than an objective feature that norms those preferences. Why should theoretical unification and whatnot give us information about the idiosyncracies of what people happen to prefer? If we give up on the line that it’s just conceptually impossible for there to be “alethic preferences” that fail to satisfy conditions X, then why can’t someone—call him Tommy—just happen to have X-violating alethic preferences? Tommy’s “scoring rule” then just can’t be used in a vindication of probabilism. I don’t see how the kind of holistic considerations just mentioned can be made relevant.

But maybe we could do something with this (inspired by some discussion in Gibbard (2008), though in a very different setting). Perhaps alethic preferences only need to satisfy the minimal constraints above, to deserve the name. But even if its *possible* to have alethic preferences with all sorts of formal properties, it might be unwise to do so. Maybe things go epistemically badly, e.g. if they’re not appropriately conservative because of their scoring rule (for an illustration, perhaps the scoring rule is just the linear one: s(x,y) is the absolute difference of x and y. This scoring rule motivates extremeism in credences: when c(p)>0.5, you minimize expected inaccuracy by moving your credence to 1. But someone who does that doesn’t seem to be functioning very well, epistemically speaking). Maybe things go prudentially badly, unless their alethic values have a certain form. So, without arguing that it’s analytic of “alethic preference”, we provide arguments that the wise will have alethic preferences that meet conditions X.

If so, it looks to me like we’ve got an indirect route to probabilism. People with sensible alethic preferences will be subject to the Joycean argument—they’ll be epistemically irrational if they don’t conform to the axioms of proability. And while people with unwise alethic preferences aren’t irrational in failing to be probabilists, they’re in a bad situation anyway, and (prudentially or epistemically) you don’t want to be one of them.It’s not that we have a prudential justification of probabilism. It’s that there are (perhaps prudential) reasons to be the kind of person such that its then epistemically irrational to fail to be a probabilist.

Though on this strategy, prudential/pragmatic considerations are coming into play, they’re not obviously as problematic as in e.g. traditional formulations of Dutch book arguments. For there, the thought was that if you fail to be a probabilist, you’re guaranteed to lose money. So, if you like money, be a probabilist! Here the justification is of the form: your view about the value of truth and accuracy is such-and-such. But you’d be failing to live up to your own preferences unless you are a probabilist. And it’s at a “second order” level, where we explain why it’s sensible to value truth and accuracy in the kind of way that enables the argument to go through, that we appeal to prudential considerations.

Having said all that, I still feel that the case is cleanest for someone thinking of the scoring argument as based on objective betterness. Moreover, there’s a final kind of consideration that can be put forward there, which I can’t see how to replicate on the preference-based version. It turns on what we’re trying to provide in giving a “justification of probabilism”. Is the audience one  of sympathetic folk, already willing to grant that violations of probability axioms are pro tanto bad, and simply wanting it explained why this is the case (NB: the pragmatic nature of the Dutch Book argument makes it as unsatisfying for such folk as it is for anyone else). Or is the audience one of hostile people, with their own favoured non-probabilistic norms (maybe people who believe in Dempster-Shafer theory of evidence)? Or the audience people who are suitably agnostic, initially?

This makes quite a big difference. For suppose the task was to explain to the sympathetic folk what grounds the normativity of the probability axioms. Then we can take as a starting point, that one (pro tanto) ought not to violate the probability axioms. We can show how objective betterness, if it has the right form, would explain this. We can show that an elegant scoring rule like the Brier score would have the right form, and so provide the explanation. And absent competitors, it looks like we’ve got all the ingrediants for a decent inference-to-the-best-explanation for the Brier Score seen as the best candidate for measuring objective (in)accuracy.

Of course, this would cut very little ice with the hostile crowd, who’d be more inclined to tollens away from the Brier score. But even they should appreciate the virtues of being presented with a package deal, with probabilism plus an accuracy/Brier based explanation of what kind of normative force the probability axioms have. If this genuinely enhances the theoretical appeal of probabilism (which I think it does) then the hostile crowd should feel a certain pressure to try to replicate the success—if only to try to win over the neutral.

Of course, the sense in which we have a “justification” of probabilism is very much less than if we could do all the work of underpinning a dominance argument by conceptual analysis, or even pointing to holistic virtues of the needed features. It’s more on the lines of explaining the probabilist point of view, than persuading others to adopt it. But that’s far from nothing.

And even if we only get this, we’ve got all we need for other projects  in which I, at least, am interested. For if, studying the classical case, we can justify Brier as a measure of objective accuracy, then when we turn to generalizations of classicism—non-classical semantics of the kind I’m talking about in the paper—then we run dominance arguments that presuppose the Brier measure of inaccuracy, to argue for analogues of probabilism in the non-classical setting. And I’d be happy if the net result of that paper was the conditional: to the extent that we should be probabilists in the classical setting, we should be analogue-probabilists (in the sense I spell out in the paper) in the non-classical setting. So the modest project isn’t mere self-congratulation on the part of probabilists—it arguably commits them to a range of non-obvious generalizations of probabilism in which plenty of people should be interested.

Of course, if a stronger, more suasive case for the features X can be made, so much the better!

Subject-relative safety and nested safety.

The paper I was going to post took off from very interesting recent work by John Hawthorne and Maria Lasonen that creates trouble from the interaction of safety constraints and a plausible looking principle about chance and close possibilities. The major moving part is a principle that tells you (roughly) that whenever a proposition is high-chance at w,t, then some world compatible with the proposition is  a member of the “safety set” relevant to any subject’s knowledge at w,t (the HCCP principle).

It’s definitely worth checking out Williamson’s reply to H&L. There’s lots of good stuff in it. Two relevant considerations: he formulates a version of safety in the paper that is subject-relativized (one of the “outs” in the argument that H&L identify, and defends this against the criticisms they offer). And he rejects the HCCP principle. The basic idea is this: take some high-but-not-1-chance proposition that’s intuitively known e.g. the ball is about to hit the floor. And consider a world in which this scenario is duplicated many times—enough so that the generalization “some ball fails to hit the floor” is high-chance (though false). Each individual conjunct seems epistemically on a par with the original. But by HCCP, there’s some failure-to-hit world in the safety set, which means at least one of the conjuncts is unsafe and so not known.

Rejecting HCCP is certainly sufficient to get around the argument as stated. But H&L explicitly mention subject-relativization of safety sets as a different kind of response, *compatible* with retaining HCCP. The idea I take it is that if safety sets (at a given time) can vary,  *different* “some ball hitting floor” possibilities could be added to the different safety sets, satisfying HCCP but not necessarily destroying any of the distributed knowledge claims.

I see the formal idea, which is kind of neat. The trouble I have with this is that I’ve got very little grip at all as to *how* subject-relativization would get us out of the H&L trouble. How can particular facts about subjects change what’s in the safety set?

I’m going to assume the safety set (for a subject, at a given time and place) is always a Lewisian similarity sphere—that is, for some formal similarity ordering of worlds, the safety sphere is closed downwards under “similarity to actuality”.  I’ll also assume that *similarity* isn’t subject-relative, though for all I’ll say it could vary e.g. with time. The assumptions are met by Lewis’s accout of counterfactual similarity—in fact, for him similarity isn’t time-relative either—but many other theories can also agree with this.

The assumption that the safety set is always a similarity sphere (in the minimal sense) seems a pretty reasonable requirement, if we’re to justify the gloss of a safety set as a set of the “sufficiently close worlds”.

But just given the minimal gloss, we can get some strong results: in particular, that safety sets for different subjects at a single time will be nested in one another (think of them as “spheres around actuality”–given minimal formal constraints, Lewis articulates, the “spheres” are nested, as the name suggests).

Suppose we have n subjects in an H&L putative “distributed knowledge” case as described earlier. Now take the minimal safety set M among those n subjects. This exists and is a subset of the safety sets of all the others, by nesting. And by HCCP, it has to include a failure-to-hit possibility within it. Say the possibility that’s included in M features ball k failing to hit. But this means that that possibility is also in the safety set relevant to the kth person’s belief that their ball *will* hit the ground, and so their actual belief is unsafe and can’t count as knowledge—exactly the situation that relativizing to subjects was supposed to save us from!

The trouble is, the sort of rescue of distributed knowledge sketched earlier relies on the thought that safety sets for subjects at a time might be “petal shaped”—overlapping, but not nested in one another. But thinking of them as “similarity spheres”, where similarity is not subject relative, simply doesn’t allow this.

Now, this doesn’t close off this line of inquiry. Perhaps we *should* make similarity itself relative to subjects or locations (if so, then we definitely can’t use Lewis’s “Time’s arrow” sense of similarity). Or maybe we could relax the formal restrictions on similarity that allow us to derive nesting (If worlds can be incomparable in terms of closeness to actuality, we get failures of nesting—weakening Lewis’s formal assumptions in this way weakens the associated logic of counterfactuals to Pollock’s SS). But I do think that it’s interesting that the kind of subject-relativity of closeness that might be motivated by e.g. interest-relative invariantism about knowledge (the idea that how “close” the worlds to be in the safety set  depends on the interests etc of the knower) simply don’t do enough to get us out of the H&L worries.  We need a much more thorough-going relativization if we’re going to make progress here.

Utility of posting papers in public

I was about to post up a draft of a new paper. And then I picked up on a rather nasty flaw in the argument. So that paper’s now under reconstruction again—until I find a way to patch the gap.

It’s a bit cold-shivery to have almost posted things in a very public way with a major quantifier-shift fallacy right in the centre of them. But I take at least this out of the experience: posting things on blogs is a *very* good way of disciplining yourself on content. At least for me, I’m shifted from a mode where I’m wanting things to work out/patch errors etc—in effect, working on the content of the paper itself—to a mode where I’m looking at it with an eye to potential readers. And it’s the second mode whereby I find I get enough critical distance to reliably spot things that need fixing (be they typos or real errors I’ve missed).

And of course, this is even before the very clever people out there pitch in to helpfully point out all the ways in which things need tightening up or amending. So hooray for academic blogs. But boo to quantifier shift fallacies.

Gradational accuracy; Degree supervaluational logic

In lieu of new blogposts, I thought I’d post up drafts of two papers I’m working on. They’re both in fairly early stages (in particular, the structure of each needs quite a bit of sorting out. But as they’re fairly techy, I think I’d really benefit from any trouble-shooting people were willing to do!

The first is “Degree supervaluational logic“. This is the kind of treatment of indeterminacy that Edgington has long argued for, and it also features in work from the 70’s by Lewis and Kamp. Weirdly, it isn’t that common, though I think there’s a lot going for it. But it’s arguably implicit in a lot of people’s thinking about supervaluationism. Plenty of people like the idea that the “proportion of sharpenings on which a sentence is true” tells us something pretty important about that sentence—maybe even serving to fix what degree of belief we should have in it. If proportions of sharpenings play this kind of “expert function” role for you, then you’re already a degree-supervaluationist in the sense I’m concerned with, whether or not you want to talk explicitly about “degrees of truth”.

One thing I haven’t seen done is to look systematically at its logic. Now, if we look at a determinacy-operator free object language, the headline news is that everything is classical—and that’s pretty robust under a number of ways of defining “validity”. But it’s familiar from standard supervaluationism that things can become tricky when we throw in determinacy operators. So I look at what happens when we add in things like “it is determinate to degree 0.5 that…” into our object-language. What happens now depends *very much* on how validity is defined. I think there’s a lot to be said for “degree of truth preservation” validity—i.e. the conclusion has to be at least as true as the premises. This is classical in the determinacy-free language. And its “supraclassical” even when those operators are present—every classically valid argument is still valid. But in terms of metarules, all hell breaks loose. We get failures of conjunction introduction, for example; and of structural rules such as Cut. Despite this, I think there’s a good deal to be said for the package.

The second paper “Gradational accuracy and non-classical semantics”  is on Joyce’s work on scoring functions. I look at what happens to his 1998 argument for probabilism, when we’ve got non-classical truth-value assignments in play. From what I can see, his argument generalizes very nicely. For each kind of truth-value assignment, we can characterize a set of “coherent” credences, and show that for any incoherent credence there is a single coherent credence which is more accurate than it, no matter what the truth-values turn out to be.

In certain cases, we can relate this to kinds of “belief functions” that are familiar. For example, the class of supervaluationally coherent credences I think can be shown to be Dempster-Shafer belief functions—at least if you define supervaluational “truth values” as I do in the paper.

As I mentioned, there are certainly some loose ends in this work—be really grateful for any thoughts! I’m going to be presenting something from the degree supervaluational paper at the AAP in July, and also on the agenda is to write up some ideas about the metaphysics of radical interpretation (as a kind of fictionalism about semantics) for the Fictionalism conference in Manchester this September.

[Update: I’ve added an extra section to the gradational accuracy paper, just showing that “coherent credences” for the various kinds of truth-value assignments I discuss satisfy the generalizations of classical probability theory suggested in Brian Weatherson’s 2003 NDJFL paper. The one exception is supervaluationism, where only a weakened version of the final axiom is satisfied—but in that case, we can show that the coherent credences must be Dempster-Shafer functions. So I think that gives us a pretty good handle on the behaviour of non-accuracy-dominated credences for the non-classical case.]

[Update 2: I’ve tightened up some of the initial material on non-classical semantics, and added something on intuitionism, which the generalization seems to cover quite nicely. I’m still thinking that kicking off the whole thing with lists of non-classical semantics ain’t the most digestable/helpful way of presenting the material, but at the moment I just want to make sure that the formal material works.]

Safety and lawbreaking

One upshot of taking the line on the scattered match case I discussed below is the following: if @ is deterministic, then legal worlds (aside from @) are really far away, on grounds of utterly flunking the “pefect match” criterion utterly. If perfect match, as I suggested, means “perfect match over a temporal segment of the world”, then legal worlds just never score on this grounds at all.

Here’s one implication of this. Take a probability distribution compatible with determinism—like the chances of statistical mechanics. I’m thinking of this as a measure over some kind of configuration space—the space of nomically poossible worlds. So subsets of this space correspond to propositions that (if we choose them right) have high probability, given the macro-state of the world at the present time. And we can equally consider the conditional probability of those on x pushing the nuclear button. For many choices of P which have high probability conditionally on button-pressing, “button-pressing>~P” will be true. The closest worlds where the button-pressing happens are going to be law-breaking worlds, not legal worlds. So any proposition only true at legal worlds will not obtain, given the counterfactual. But sets of such worlds can of course get high conditional probability.

There’s an analogue of this result that connects to recent work on safety by Hawthorne and Lasonen-Aarnio. First, presume that the safety set at w,t  (roughly set of worlds such we musn’t believe falsely that p, if we are to have knowledge that p) is a similarity sphere in Lewis’s sense. That is: any world counterfactually as close as a world in the set must be in the set. If any legal world is in the set, all worlds with at least some perfect match will also be in that set, by the conditions for closeness previously mentioned. But that would be crazy—e.g. there are worlds where I falsely believe that I’m sitting in front of my computer, on the same base as I do now, which have *some* perfect match with actuality in the far distant past (we can set up mad scientists etc to achieve this with only a small departure from actuality a few hundred years ago). So if the safety set is a similarity sphere, and the perfect match constraint is taken as I urged, then there better not be any legal worlds in the safety set.

What this means is that a fairly plausible principle has to go:  that if, at w and t, P is high probability, then there must be at least one P-world in the safety set at w and t. For as noted earlier, law-entailing propositions can be high-probability. But massive scepticisim results if they’re included in the safety set. (I should note that Hawthorne and Lasonen don’t endorse this principle, but only the analogous one where the “probabilities” are fundamental objective chances in an indeterministic world—but it’s hard to see what could motivate acceptance of that and non-acceptance of the above).

What to give up? Lewis’s lawbreaking account of closeness? The safety set as a similarity sphere? The probability-safety connection? The safety constraint on knowledge? Or some kind of reformulation of one of the above to make them all play nicely together. I’m presently undecided….

Counterfactuals and the scattered match case

One version of Lewis’s worlds-semantics for counterfactuals can be put like this: “If were A, then B” is true at @ iff all the most similar A-worlds to @ are B-worlds. But what notion of similarity is in play? Not all-in overall approximate similarity, otherwise (as Fine pointed out) a world in which Nixon pressed the button, but it was quickly covered up, and things at the macro-level approximately resemble actuality from then on, would count as more similar to @ than worlds where he pressed the button and events took their expected course: international crisis, bombings, etc. Feed that into the clause for conditionals and you get false counterfactuals coming out true: e.g. “If Nixon had pressed the button, everything would be pretty much the way it actually is”.

In “Time’s arrow”, Lewis proposed a system of weightings for the “standard ordering” of counterfactual closeness. They’re intended to apply only in cases where the laws of nature of @ are deterministic. Roughly stated, worlds are ordered around @ by the following principles:

  1. It is of the first importance to avoid big, widespread violations of @’s laws
  2. It is of the second importance to maximize region of exact intrinsic match to @ in matters of particular fact
  3. It is of the third importance to avoid even small violations of @’s laws
  4. It is of little or no importance to maximize approximate similarity to @

These, he argued, gave the right verdict on the Nixon-counterfactuals. For Nixon counterfactuals only have approximate perfect match, which counts for little or nothing. The most similar button-pushing worlds by the above lights, said Lewis, would be worlds that perfectly matched @ up to a time shortly before the button-pressing, diverged by a small law-violation, and then events ran on wherever the laws of nature took them—presumably to international crisis, nuclear war, or whatever. Such worlds are optimal as regards (1), ok as regards (2) (because of the past match). And they’re ok as regards (3) (only one violation of law needed). (Let’s suppose that approximate convergence has no weight—it’ll make life easier). Pick one such world and call it NIX.

If this is to work, it better be that no “approximate future convergence” world does better by this system of weights than NIX. It’d be pretty easy to beat NIX on grounds (3)—just choose any nomically possible world and you get this. But the key issue is (2), which trumps such considerations. Are there approximate future convergence worlds that match or beat NIX on this front?

Lewis thought there wouldn’t be. NIX already secures perfect match up until the 70’s. So what we’d need is perfect convergence in the future (after the button pressing). But Lewis thought to do this, we’d have to invoke many many violations of law, to wipe out the traces of the button-pushing (set the lightwaves back on their original course, as it were). We’d need a big and diverse miracle to get perfect future match. But such worlds are worse than NIX by point (1), which is of overriding importance.

Now *some* miracle would be needed if we’re to get perfectly match at some future time-segment. Here’s the intuitive thought. Suppose A is the button-pushing world that perfectly matches @ at some future time T.  Run the laws of nature backwards from T. If the laws are deterministic, you’ll get exact match of all times prior to T until you get some violation of law. But the button-pushing happens in @ and not in A, so they can’t be duplicates then. So there must be some miracle that happens in between T and the button-pressing.

First thought. The doesn’t yet make the case that for the reconvergence to happen, we need lots of violations all over the place. Why couldn’t there be worlds where a tiny miracle at a suitable “pressure point” effects global reconvergence?

Rejoinder: one trouble with this idea is that presumably (as Lewis notes) the knock-on-effects of the first divergence spread quickly. In the few moments it takes to get Nixon to press the button, the divergences from actuality are presumably covering a good distance (consider those light-waves!). So how could a single *local* miracle possibly undo this effect? If a beam of light is racing away from the first event, that wouldn’t otherwise be there, then changes resulting from the second (small, local) miracle aren’t going to “catch it up”. There are probably some major assumptions about locality of causation etc packed in here. But it does seem like Lewis is pretty well-justified in the claim that it’d take a big, widespread miracle to reconverge.

Second thought. Consider a world that, like NIX, diverges from actuality just at the button-pressing moment. Let it never perfectly match @ again, and let it contain no more miracles. In that case, it looks like (so far as we’ve said) it *exactly ties* with NIX for closeness. But now: couldn’t one such world have approximate match to @ in the future? That would require some *deterministic* progress from button-pushing to (somehow) the nuclear launch not happening, and a lot of (deterministic) coverup. A big ask. But to say that there is just no world meeting this description seems an equally big commitment.

Rejoinder. I’m not sure how Lewis should respond to this one. He mentions very plausible cases where slight differences would add up: slight changes of tone in the biography, influencing readers differently, changing their lives, etc. It’s very very plausible that such stuff happens. But is it *nomically impossible* that approximate similarity be maintained? I just don’t see the case here.

(A note at what’s at stake here. Unlike perfect reconvergence, if Lewis allowed such approximate reconvergence worlds, you wouldn’t get “If Nixon had pressed the button, things would be approximately the same” coming out true. For the most we’d get is that these approximate coverup worlds are as close as NIX. NIX ensures that counterfactuals like the above are false—approximate similarity wouldn’t ensue at all most similar button-pushing worlds. But the approximate convergence world would equally ensure the falsity of ordinary counterfactuals, e.g. “If Nixon had pressed the button, things would be very different”. More generally, the presence of such approximate reconvergence worlds would make lots of ordinary counterfactuals false.)

Third thought. Lewis raises the possibility of entirely legal worlds that resemble @ in the 1970’s, but feature Nixon pressing the button. As Lewis emphasizes, there can’t be perfectly match with temporal slices of @ at any time, if they involve no violation of deterministic law. Lewis really has two things to say about such worlds. First, he says there’s “no guarantee” that there’s any such world will even approximately resemble @ in the far distinct future or past. He says: “it is hard to imagine how two deterministic worlds anything like ours could possibly remain only a little bit different for very long. There are altogether too many opportunities for little differences to give to big differences”. But second, given the four-part analysis given above, such worlds, these worlds aren’t going to be good contenders for similarity, since e.g. they’ll never perfectly match @ at any time.

Let’s suppose Lewis is wrong on (1): that there are nomic possibilities approximately like ours throughout history, except for the Button Pushing. I’m not sure what exactly the case against these worlds being close is on the four-part analysis. Sure, NIX has perfect match throughout the whole of history up till the 1970’s. And the worlds just discussed don’t have that. But condition (2) just says that we have to maximize the region of perfect match—and maybe there are other ways to do that.

One idea is that worlds like these could earn credit by the lights of (2), by having large but scattered match with @. Suppose there’s a button-pushing world W, with perfect match before the button-pushing, and such that post-pressing, there are infinitely many centimetre-cubed by 1 second regions of space-time, at which the trajectories and properties of particles *within that region* exactly match those in the corresponding region of @. You might well think that in a putative case of approximate match (including approximate match of futures) there’d be lots of opportunities for this kind of short-lived, spatially limited coincidence.

So how does (2) handle these cases? It’s just not clear—it depends on what “maximizing the region of perfect match” means. Maybe we’re supposed to look at the sheer volume of the regions where there is perfect fit. But that’ll do no good if the volumes are each infinite. In a world with infinite past and infinite future, exact match from the 1970’s back “all the way” doesn’t have a greater volume than the sum of infinitely many scattered regions, if both volumes are infinite. In a world with finite past but infinite future, continued sparse scattered future match could have *infinite* volume, as opposed to the finite volume of perfect match secured for NIX.

This causes problems even without the reconvergence. We want “button pressing” worlds not to diverge too early. Divergence in the 1950’s, with things being very different from then on, ultimately ending with a Soviet stooge Nixon pressing the button, is not the kind of most-similar world we want. (2) is naturally thought to help us out—maximizing perfect match is supposed to pressure us to put the divergence event as late as possible. But if we look only at the relative volumes of perfect match, in cases of an infinite past, the volumes of perfect match will be the same. This suggests we look, not at volumes, but at subregionhood. w will be closer to @ than u (all else equal) if the region through which w perfectly matches @ is a proper superregion of that through which u perfectly matches @. But this won’t promote NIX over scattered perfect match worlds—since in neither case do the regions of perfect match completely overlap the other’s.

Perhaps there are more options. One thought is to look at something like the ratio of volume of regions of perfect match to the volume of regions of non-perfect match at each time. Scattered match clearly goes with a low density of perfect match at times, in this sense—whereas in NIX the density at a time will be 1. How to work this into a proposal for understanding the imperative “maximize perfect match!” I don’t know.

Unless we say *something* to rule out scattered perfect match worlds, then prima facie they could match the extent of match in NIX. But then, because they never violate the laws, but NIX does (albeit once), they beat out NIX on (3). So this case (unlike approximate future match given above) we’re back to a situation where there’s a danger of declaring the “future similarity” counterfactual true, as well as the ordinary counterfactuals false.

Let’s review the three cases. First, there was the possibility of getting exact reconvergence to @ at future time T, via a single miracle. Second, there was the possibility of approximate future similarity without any perfect similarity. Third, there was the possibility of approximate overall match throughout time, with local, scattered, perfect match.

In effect, Lewis in Time’s Arrow doubts whether there are possibilities matching any of these descriptions. I thought that we could give some prima facie substance to that doubt in the first case. In the other two, I can’t see what the principled position is other than agnosticism, as yet. Lewis says, for example, about the third kind of case, that it’s “hard to imagine” how two worlds could approximately resemble each other in this way, and that there’s “no guarantee” that they’ll be like this. But is this good enough? Lots of things about nomic space are hard to imagine. Have we any positive reasons for doubt that possibilities of type 3 exist? Personally, in the absence of evidence, I’ll go 50/50 on whether they exist. But that’s to go 50/50 on whether Lewis’s favoured account makes most ordinary counterfactuals false. Not a good result.

I do have one positive suggestion, that’ll fix up the third case. Again, it comes down to what we’re trying to maximize in maximizing regions of perfect fit. The proposal is that we insist on complete temporal slices perfectly matching @, before we count them towards closeness as outlined in (2). That is, (2) should be understood as saying: maximize the *temporal segment* in which you have perfect fit. Now we can appeal to determinism to show that legal worlds will *never* perfectly match with @ at any time—and so *automatically* flunk (2) to the highest possible degree.

So the state of play seems to me this. It seems to me that there are plausible grounds for having low credence in the first worry with the account. And precisifing “perfect match” in the way just suggested deals with the third one. That only leaves the second worry—perfect past match+small violation+approximate future match.

I do want to emphasize one thing here. It is significant that the remaining problem, unlike the others, doesn’t make the offending “future similarity” counterfactual *true*. Those objections, had they been successful, would have promised the result that *all* the most similar worlds have futures like ours, rather than like NIX. But all we get with the residual objection, if it’s successful, is that *some* of the most similar worlds are of the offending type—for all we’ve said, *most* of the most similar worlds would be like NIX.

This brings into play other tweaks to the setting. Some (like Bennett) want for indepedendent reasons to change Lewis’s truth-conditions from “B is true at all the closest A worlds” to “B is true at most/the vast majority of the closest A worlds”. One could make this move against the current worry, but not against the other two.

I’m not a particular fan of the revisions to the logic of counterfactuals this suggestion would induce. There’s another thought I’m more sympathetic to. That’s to go Stalnakerian on the truth conditions, viewing what Lewis thinks of as “ties for closeness” as cases of indeterminacy in a total ordering. If so, what we’d get from the above is that at most that counterfactuals like “If Nixon had pressed the button, things would have been very different” are indeterminate (because false on at least one precisification of the ordering).

It’s not clear to me that this is a bad result. It depends very much on the “cognitive role of indeterminacy” that I’ve talked about ad nauseum before on this blog. If one can perfectly rationally be arbitrarily highly confident of indeterminate propositions, then no revision to our ordinary credences in ordinary counterfactuals need be induced by admitting them to be indeterminate. If, on the other hand, you take a “rejectionist” view of indeterminacy where it acts a bit like presupposition failures, this option is no more comfortable than admitting that most counterfactuals are false.

Anyway, just to emphasize: if these options are even going to be runners, we’re going to have to do something about the scattered match case.