Category Archives: Probability

Justifying scoring rules

In connection to this paper, I’ve been thinking some more about what grounds we might have for saying substantive things about how “scoring rules” should behave.

Quick background. Scoring rules rank either credences in a single proposition, or whole credence functions (depending on your choice of poison) against the actual truth values. For now, let’s concentrate on the single-proposition case. In the context we’re interested in, they’re meant to measure “how (in)accurate” the credences are. I’ll assume that scoring rules take the form s(x,y), where x is the credence, and y the truth value of the salient proposition (1 for truth, 0 for falsity). You’d naturally expect a minimal constraint to be:

(Minimal 1)  s(1,1)=s(0,0)=1; s(0,1)=s(1,0)=0.

(Minimal 2) s is a monotone increasing function in x when y=1. s is a monotone decreasing function in x when y=0.

Basically, this just says that credences 1 and 0 are maximally and minimally accurate, and you never decrease in accuracy by moving closer to the truth value.

But to make arguments from scoring rules for probabilism run, we need a lot more structure. Where do we get it from?

There’s a prior question: what’s the nature of a scoring rule in the first place? There’re a couple of thoughts to have here. One is that scoring rules are *preferences* of agents. Different agents can have different scoring rules, and the relevant preference-ordering aims to capture the subjective value the agent attaches to having *accurate* credences.

Now, various hedges are needed at this point. Maybe having certain credences make you feel warm and fuzzy, and you prefer to have those feelings no matter what. We need to distill that stuff out. Moreover, maybe you value having particular credences in certain situations  because of their instrumental value—e.g. enabling you indirectly to get lots of warm fuzzy stuff. One strong thesis about scoring rules is that they give the *intrinsic* value that the agent attaches to a certain credence/truth value state of affairs—her preferences given that alethic accuracy is all she cares about. However tricky the details of this are to spell out, the general story about what the scoring rule aim to describe is pretty clear—part of the preferences of individual agents.

A different kind of view would have it that the scoring rule describes a more objective beast: facts about which credences are better than which others (as far as accuracy goes). Presumably, if there are such betterness facts, this’ll provide a standard for assessing people’s alethic preferences in the first sense.

On either view, the trick will be to justify the claim that the scoring rule have certain formal features X. Then one appeals to a formal argument that shows that for every incoherent credence c, there’s a coherent credence d which is more accurate (by the lights of the scoring rule) than c no matter what the actual truth values are—supposing only that the scoring rule has feature X. Being “accuracy dominated” in this way is supposed to be an epistemic flaw (at least a pro tanto one). [I’m going to leave discussion of how *that* goes for another time]

Ok. But how are we going to justify features of scoring, other than the minimal constraints above? Well, Joyce (1998) proceeds by drawing out what he regards as unpleasant consequences of denying a series of formal constraints on the scoring rule. Though it’s not *immediately obvious* that to be a “measure of accuracy” scoring rules need to do more than satisfy *minimal*, you may be convinced by the cases that Joyce makes. But what *kind* of case does he make? One thought is that it’s a kind of conceptual analysis. We have the notion of accuracy, and when we think carefully through what can happen if a measure doesn’t have feature X, we see that whatever its other merits, it wouldn’t be a decent way to measure anything deserving the name *accuracy*.

Whether or not Joyce’s considerations are meant to be taken this way (I rather suspect not), it’s at least a very clean project to engage in. Take scoring rules to be preferences. Then a set of preferences that didn’t have the formal features just wouldn’t be preferences solely about accuracy—as was the original intention. Or take an objective betterness ordering. If it’s evaluating credence/world pairs on grounds of accuracy, again (if the conceptual analysis of accuracy was successful) it better have the features X, otherwise it’s just not going to deserve the name.

But maybe we can’t get all the features we need through something like conceptual analysis. One of Joyce’s features—convexity—seems to be something like a principle of epistemic conservativism (that’s the way he has recently presented it). It doesn’t seem that people would be conceptually confused if they took their alethic preferences didn’t violate this principle. Where would this leave us?

If we’re thinking of the scoring rule as an objective betterness relation, then there seems plenty of room for thinking that the *real facts* about accuracy encode convexity, even if one can coherently doubt that this is so (ok, so I’m setting aside open-question arguments here, but I was never terribly impressed by them). And conceptual analysis is not the only route to justifying claims that the one true scoring rule has such a feature. Here’s one alternative. It turns out that a certain scoring rule—the Brier score—meets all Joyce’s conditions and more besides. And it’s a very simple, very well behaved scoring rule, that generalizes very nicely in all sorts of ways (Joyce (2009) talks about quite a few nice features of it in the section “homage to the Brier score”). It’s not crazy to think that, among parties agreed that there is some “objective accuracy” scoring rule out there to be described, considerations of simplicity, unity, integration and other holistic merits might support the view that the One True measure of (in)accuracy is given by the Brier score.

But this won’t sound terribly good if you think that scoring rules describe individual preferences, rather than an objective feature that norms those preferences. Why should theoretical unification and whatnot give us information about the idiosyncracies of what people happen to prefer? If we give up on the line that it’s just conceptually impossible for there to be “alethic preferences” that fail to satisfy conditions X, then why can’t someone—call him Tommy—just happen to have X-violating alethic preferences? Tommy’s “scoring rule” then just can’t be used in a vindication of probabilism. I don’t see how the kind of holistic considerations just mentioned can be made relevant.

But maybe we could do something with this (inspired by some discussion in Gibbard (2008), though in a very different setting). Perhaps alethic preferences only need to satisfy the minimal constraints above, to deserve the name. But even if its *possible* to have alethic preferences with all sorts of formal properties, it might be unwise to do so. Maybe things go epistemically badly, e.g. if they’re not appropriately conservative because of their scoring rule (for an illustration, perhaps the scoring rule is just the linear one: s(x,y) is the absolute difference of x and y. This scoring rule motivates extremeism in credences: when c(p)>0.5, you minimize expected inaccuracy by moving your credence to 1. But someone who does that doesn’t seem to be functioning very well, epistemically speaking). Maybe things go prudentially badly, unless their alethic values have a certain form. So, without arguing that it’s analytic of “alethic preference”, we provide arguments that the wise will have alethic preferences that meet conditions X.

If so, it looks to me like we’ve got an indirect route to probabilism. People with sensible alethic preferences will be subject to the Joycean argument—they’ll be epistemically irrational if they don’t conform to the axioms of proability. And while people with unwise alethic preferences aren’t irrational in failing to be probabilists, they’re in a bad situation anyway, and (prudentially or epistemically) you don’t want to be one of them.It’s not that we have a prudential justification of probabilism. It’s that there are (perhaps prudential) reasons to be the kind of person such that its then epistemically irrational to fail to be a probabilist.

Though on this strategy, prudential/pragmatic considerations are coming into play, they’re not obviously as problematic as in e.g. traditional formulations of Dutch book arguments. For there, the thought was that if you fail to be a probabilist, you’re guaranteed to lose money. So, if you like money, be a probabilist! Here the justification is of the form: your view about the value of truth and accuracy is such-and-such. But you’d be failing to live up to your own preferences unless you are a probabilist. And it’s at a “second order” level, where we explain why it’s sensible to value truth and accuracy in the kind of way that enables the argument to go through, that we appeal to prudential considerations.

Having said all that, I still feel that the case is cleanest for someone thinking of the scoring argument as based on objective betterness. Moreover, there’s a final kind of consideration that can be put forward there, which I can’t see how to replicate on the preference-based version. It turns on what we’re trying to provide in giving a “justification of probabilism”. Is the audience one  of sympathetic folk, already willing to grant that violations of probability axioms are pro tanto bad, and simply wanting it explained why this is the case (NB: the pragmatic nature of the Dutch Book argument makes it as unsatisfying for such folk as it is for anyone else). Or is the audience one of hostile people, with their own favoured non-probabilistic norms (maybe people who believe in Dempster-Shafer theory of evidence)? Or the audience people who are suitably agnostic, initially?

This makes quite a big difference. For suppose the task was to explain to the sympathetic folk what grounds the normativity of the probability axioms. Then we can take as a starting point, that one (pro tanto) ought not to violate the probability axioms. We can show how objective betterness, if it has the right form, would explain this. We can show that an elegant scoring rule like the Brier score would have the right form, and so provide the explanation. And absent competitors, it looks like we’ve got all the ingrediants for a decent inference-to-the-best-explanation for the Brier Score seen as the best candidate for measuring objective (in)accuracy.

Of course, this would cut very little ice with the hostile crowd, who’d be more inclined to tollens away from the Brier score. But even they should appreciate the virtues of being presented with a package deal, with probabilism plus an accuracy/Brier based explanation of what kind of normative force the probability axioms have. If this genuinely enhances the theoretical appeal of probabilism (which I think it does) then the hostile crowd should feel a certain pressure to try to replicate the success—if only to try to win over the neutral.

Of course, the sense in which we have a “justification” of probabilism is very much less than if we could do all the work of underpinning a dominance argument by conceptual analysis, or even pointing to holistic virtues of the needed features. It’s more on the lines of explaining the probabilist point of view, than persuading others to adopt it. But that’s far from nothing.

And even if we only get this, we’ve got all we need for other projects  in which I, at least, am interested. For if, studying the classical case, we can justify Brier as a measure of objective accuracy, then when we turn to generalizations of classicism—non-classical semantics of the kind I’m talking about in the paper—then we run dominance arguments that presuppose the Brier measure of inaccuracy, to argue for analogues of probabilism in the non-classical setting. And I’d be happy if the net result of that paper was the conditional: to the extent that we should be probabilists in the classical setting, we should be analogue-probabilists (in the sense I spell out in the paper) in the non-classical setting. So the modest project isn’t mere self-congratulation on the part of probabilists—it arguably commits them to a range of non-obvious generalizations of probabilism in which plenty of people should be interested.

Of course, if a stronger, more suasive case for the features X can be made, so much the better!

Gradational accuracy; Degree supervaluational logic

In lieu of new blogposts, I thought I’d post up drafts of two papers I’m working on. They’re both in fairly early stages (in particular, the structure of each needs quite a bit of sorting out. But as they’re fairly techy, I think I’d really benefit from any trouble-shooting people were willing to do!

The first is “Degree supervaluational logic“. This is the kind of treatment of indeterminacy that Edgington has long argued for, and it also features in work from the 70’s by Lewis and Kamp. Weirdly, it isn’t that common, though I think there’s a lot going for it. But it’s arguably implicit in a lot of people’s thinking about supervaluationism. Plenty of people like the idea that the “proportion of sharpenings on which a sentence is true” tells us something pretty important about that sentence—maybe even serving to fix what degree of belief we should have in it. If proportions of sharpenings play this kind of “expert function” role for you, then you’re already a degree-supervaluationist in the sense I’m concerned with, whether or not you want to talk explicitly about “degrees of truth”.

One thing I haven’t seen done is to look systematically at its logic. Now, if we look at a determinacy-operator free object language, the headline news is that everything is classical—and that’s pretty robust under a number of ways of defining “validity”. But it’s familiar from standard supervaluationism that things can become tricky when we throw in determinacy operators. So I look at what happens when we add in things like “it is determinate to degree 0.5 that…” into our object-language. What happens now depends *very much* on how validity is defined. I think there’s a lot to be said for “degree of truth preservation” validity—i.e. the conclusion has to be at least as true as the premises. This is classical in the determinacy-free language. And its “supraclassical” even when those operators are present—every classically valid argument is still valid. But in terms of metarules, all hell breaks loose. We get failures of conjunction introduction, for example; and of structural rules such as Cut. Despite this, I think there’s a good deal to be said for the package.

The second paper “Gradational accuracy and non-classical semantics”  is on Joyce’s work on scoring functions. I look at what happens to his 1998 argument for probabilism, when we’ve got non-classical truth-value assignments in play. From what I can see, his argument generalizes very nicely. For each kind of truth-value assignment, we can characterize a set of “coherent” credences, and show that for any incoherent credence there is a single coherent credence which is more accurate than it, no matter what the truth-values turn out to be.

In certain cases, we can relate this to kinds of “belief functions” that are familiar. For example, the class of supervaluationally coherent credences I think can be shown to be Dempster-Shafer belief functions—at least if you define supervaluational “truth values” as I do in the paper.

As I mentioned, there are certainly some loose ends in this work—be really grateful for any thoughts! I’m going to be presenting something from the degree supervaluational paper at the AAP in July, and also on the agenda is to write up some ideas about the metaphysics of radical interpretation (as a kind of fictionalism about semantics) for the Fictionalism conference in Manchester this September.

[Update: I’ve added an extra section to the gradational accuracy paper, just showing that “coherent credences” for the various kinds of truth-value assignments I discuss satisfy the generalizations of classical probability theory suggested in Brian Weatherson’s 2003 NDJFL paper. The one exception is supervaluationism, where only a weakened version of the final axiom is satisfied—but in that case, we can show that the coherent credences must be Dempster-Shafer functions. So I think that gives us a pretty good handle on the behaviour of non-accuracy-dominated credences for the non-classical case.]

[Update 2: I’ve tightened up some of the initial material on non-classical semantics, and added something on intuitionism, which the generalization seems to cover quite nicely. I’m still thinking that kicking off the whole thing with lists of non-classical semantics ain’t the most digestable/helpful way of presenting the material, but at the moment I just want to make sure that the formal material works.]

Counting delineations

I presented my paper on indeterminacy and conditionals in Konstanz a few days ago. The basic question that paper poses is: if we are highly confident that a conditional is indeterminate, what sorts of confidence in the conditional itself are open to us?

Now, one treatment I’ve been interested in for a while is “degree supervaluationism”. The idea, from the point of view of the semantics, is to replace appeal to a single intended interpretation (with truth=truth at that interpretation) or set of “intended interpretations” (with truth=truth at all of them) with a measure over the set of interpretations (with truth to degree d = being true at exactly measure d of the interpretations). A natural suggestion, given that setting, is that if you know (/are certain) S is true to measure d, then your confidence in S should be d.

I’d been thinking of degree-supervaluationism in this sense, and the more standard set-of-intended-interpretations supervaluationism, as distinct options. But (thanks to Tim Williamson) I realize now that there may be an intermediate option.

Suppose that S= the number 6 is bleh. And we know that linguistic conventions settle that numbers <5 are bleh, and numbers >7 are not bleh. The available delineations of “nice”, among the integers, are ones where the first non-bleh number is 5, 6, 7 or 8. These will count as the “intended interpretations” for a standard supervaluational treatment, so “6 is bleh” will be indeterminate—in this context, neither true nor false.

I’ve discussed in the past several things we could say about rational confidence in this supervaluational setting. But one (descriptive) option I haven’t thought much about is to say that you should proportion your confidence to the number of delineations on which “6 is bleh” comes out true. In the present case, our confidence that 6 is bleh should be 0.5, our confidence that 5 is bleh should come out 0.25, and our confidence that 7 is bleh should come out 0.25.

Notice that this *isn’t* the same as degree-supervaluationism. For that just required some measure or other over the space of interpretations. And even if that was non-zero everywhere apart from ones which place first non-bleh number in 5-8, there are many options available. E.g. we might have a measure that assigns 0.9 to the interpretation which makes 5 the first non-bleh number, and distributes 0.3333… to the others. In other words, the degree-supervaluationist needn’t think that the measure is a measure *of the number of delineations*. I usually think of it (in the finite case), intuitively, as a measure of the “degree of intendedness” of each interpretation. In a sense, the degree-supervaluationists I was thinking of conceive of the measure as telling us to what extent usage and eligibility and other subvening facts favour one interpretation or another. But the kind of supervaluationists we’re now considering won’t buy into that at all.

I should mention that even if, descriptively, it’s clear what proposal here is, it’s less clear how the count-the-delineations supervaluationists would go about justifying the rule for assigning credences that I’m suggesting for them. Maybe the idea is that we should seek some kind of compromise between the credences that would be rational if we took D to be the unique intended interpretation, for each D in our set of “intended interpretations” (see this really interesting discussion of compromise for a model of what we might say—the bits at the end on mushy credence are particularly relevant). And they’ll be some oddities that this kind of theorist will have to adopt—e.g. for a range of cases, they’ll be assigning significant credence to sentences of the form “S and S isn’t true”. I find that odd, but I don’t think it blows the proposal out of the water.

Where might this be useful? Well, suppose you believe in B-theoretic branching time, and are going to “supervaluate” over the various future-branches (so “there will be a sea-battle” will a truth-value gap, since it is true on some but not all). (This approach originates with Thomason, and is still present, with tweaks, in recent relativistic semantics for branching time). “Branches” play the role of “interpretations”, in this setting. I’ve argued in previous work that this kind of indeterminacy about branching futures leads to trouble on certain natural “rejectionist” readings of what our attitudes to known indeterminate p should be. But a count-the-branches proposal seems pretty promising here. The idea is that we should proportion our credences in p to the *number* of branches on which p is true.

Of course, there are complicated issues here. Maybe there are just two qualitative possibilities for the future, R and S. We know R has a 2/3 chance of obtaining, and S a 1/3 chance of obtaining. In the B-theoretic branching setting, an R-branch will exist, and an S-branch will exist. Now, one model of the metaphysics at this point is that we don’t allow qualitatively duplicate future brnaches: so there are just two future-branches in existence, the R one and the S one. On a count-the-branches recipe, we’ll get the result that we should have 1/2 credence that R will obtain. But that conflicts with what the instruction to proportion our credences to the known chances would give us. Maybe R is primitively attached to a “weight” of 2/3—but our count-the-branches recipe didn’t say anything about that.

An alternative is that we multiply indiscernable futures. Maybe there are two, indiscernable R futures, and only one S future. Then apportioning  the credences in the way mentioned won’t get us into trouble. And in general, if we think whenever the chance (at moment m) that p is k, then the proportion of p-futures to non-p-futures is k, then  we’ll have a recipe that coheres nicely with the principal principle.

Let me be clear that I’m not suggesting that we identify chances with numbers-of-branches. Nor am I suggesting that we’ve got some easy route here for justifying the principal principle. The only thing I want to say is that *if* we’ve got a certain match between chances and numbers of future branches, then two recipes for assigning credences won’t conflict.

(I emphasized earlier that count-the-precisifications supervaluationism had less flexibility than degree-supervaluationism where the relevant measure was unconstrained by counting considerations. In a sense, what the above little discussion highlights is that when we move from “interpretations” to “branches” as the locus of supervaluational indeterminacy, this difference in flexibility evaporates. For in the case where that role is played by actually existing futures, then there’s at least the possibility of mutiplying qualitatively indiscernable futures. That sort of maneuver has little place in the original, intended-interpretations settings, since presumably we’ve got an independent fix on what the interpretations are, and we can’t simply postulate that the world gives us intended interpretations in proporitions that exactly match the credences we independently want to assign to the cases.)

Indeterminate survival: in draft

So, finally, I’ve got another draft prepared. This is a paper focussing on Bernard Williams’ concerns about how to think and feel about indeterminacy in questions of one’s own survival.

Suppose that you know that you know there’s an individual in the future who’s going to get harmed. Should you invest a small amount of money to alleviate the harm? Should you feel anxious about the harm?

Well, obviously if you care about the guy (or just have a modicum of humanity) you probably should. But if it was *you* that was going to suffer the harm, there’d be a particularly distinctive frisson. From a prudential point of view, you’d be compelled to invest minor funds for great benefit. And you really should have that distinctive first-personal phenomenology associated with anxiety on one’s own behalf. Both of these de se attitudes seem important features of our mental life and evaluations.

The puzzle I take from Williams is: are the distinctively first-personal feelings and expectations appropriate in a case where you know that it’s indeterminate whether you survive as the individual who’s going to suffer?

Williams thought that by reflecting on such questions, we could get an argument against account of personal identity that land us with indeterminate cases of survival. I’d like to play the case in a different direction. It seems to me pretty unavoidable that we’ll end up favouring accounts of personal identity that allow for indeterminate cases. So if , when you combine such cases with this or that theory of indeterminacy, you end up saying silly things, I want to take that as a blow to that account of indeterminacy.

It’s not knock-down (what is in philosophy?) but I do think that we can get leverage in this way against rejectionist treatments of indeterminacy, at least as applied to these kind of cases. Rejectionist treatments include those folks who think that characteristic attitudes to borderline cases includes primarily a rejection of the law of excluded middle; and (probably) those folks who think that in such cases we should reject bivalence, even if LEM itself is retained.

In any case, this is definitely something I’m looking for feedback/comments on (particularly on the material on how to think about rational constraints on emotions, which is rather new territory for me). So thoughts very welcome!

Primitivism about indeterminacy: a worry

I’m quite tempted by the view that it is indeterminate that might be one of those fundamental, brute bits of machinery that goes into constructing the world. Imagine, for example, you’re tempted by the thought that in a strong sense the future is “open”, or “unfixed”. Now, maybe one could parlay that into something epistemic (lack of knowledge of what the future is to be), or semantic (indecision over which of the existing branching futures is “the future”) or maybe mere non-existence of the future would capture some of this unfixity thought. But I doubt it. (For discussion of what the openness of the future looks like from this perspective, see Ross and Elizabeth’s forthcoming Phil Studies piece).

The open future is far from the only case you might consider—I go through a range of possible arenas in which one might be friendly to a distinctively metaphysical kind of indeterminacy in this paper—and I think treating “indeterminacy” as a perfectly natural bit of kit is an attractive way to develop that. And, if you’re interested in some further elaboration and defence of this primitivist conception see this piece by Elizabeth and myself—and see also Dave Barnett’s rather different take on a similar idea in a forthcoming piece in AJP (watch out for the terminological clashes–Barnett wants to contrast his view with that of “indeterminists”. I think this is just a different way of deploying the terminology.)

I think everyone should pay more attention to primitivism. It’s a kind of “null” response to the request for an account of indeterminacy—and it’s always interesting to see why the null response is unavailable. I think we’ll learn a lot about what the compulsory questions the a theory of indeterminacy must answer, from seeing what goes wrong when the theory of indeterminacy is as minimal as you can get.

But here I want to try to formulate a certain kind of objection to primitivism about indeterminacy. Something like this has been floating around in the literature—and in conversations!—for a while (Williamson and Field, in particular, are obvious sources for it). I also think the objection if properly formulated would get at something important that lies behind the reaction of people who claim *just not to understand* what a metaphysical conception of indeterminacy would be. (If people know of references where this kind of idea is dealt with explicitly, then I’d be really glad to know about them).

The starting assumption is: saying “it’s an indeterminate case” is a legitimate answer to the query “is that thing red?”. Contrast the following. If someone asks “is that thing red?” and I say: it’s contingent whether it’s red”, then I haven’t made a legitimate conversational move. The information I’ve given is simply irrelevant to it’s actual redness.

So it’s a datum that indeterminacy-answers are in some way relevant to redness (or whatever) questions. And it’s not just that “it is indeterminate whether it is red” has “it is red” buried within it – so does the contingency “answer”, but it is patently irrelevant.

So what sort of relevance does it have? Here’s a brief survey of some answers:

(1) Epistemicist. “It’s indeterminate whether p” has the sort of relevance that answering “I don’t know whether p” has. Obviously it’s not directly relevant to the question of whether p, but at least expresses the inability to give a definitive answer.

(2) Rejectionist (like truth-value gap-ers, inc. certain supervaluationists, and LEM-deniers like Field, intuitionists). Answering “it’s indeterminate” communicates information which, if accepted, should lead you to reject both p, and not-p. So it’s clearly relevant, since it tells the inquirer what their attitudes to p itself should be.

(3) Degree theorist (whether degree-supervaluationist like Lewis, Edgington, or degree-functional person like Smith, Machina, etc). Answering “it’s indeterminate” communicates something like the information that p is half-true. And, at least on suitable elaborations of degree theory, we’ll then now how to shape our credences in p itself: we should have credence 0.5 in p if we have credence 1 that p is half true.

(4) Clarification request. (maybe some contextualists?) “it’s indeterminate that p” conveys that somehow the question is ill-posed, or inappropriate. It’s a way of responding whereby we refuse to answer the question as posed, but invite a reformulation. So we’re asking the person who asked “is it red?” to refine their question to something like “is it scarlet?” or “is it reddish?” or “is it at least not blue?” or “does it have wavelength less than such-and-such?”.

(For a while, I think, it was assumed that every series account of indeterminacy would say that if p was indeterminate, one couldn’t know p (think of parallel discussion of “minimal” conceptions of vagueness—see Patrick Greenough’s Mind paper). If that was right then (1) would be available to everybody. But I don’t think that that’s at all obvious — and in particular, I don’t think it’s obvious the primitivist would endorse it, and if they did, what grounds they would have for saying so).

There are two readings of the challenge we should pull apart. One is purely descriptive. What kind of relevance does indeterminacy have, on the primitivist view? The second is justificatory: why does it have that relevance? Both are relevant here, but the first is the most important. Consider the parallel case of chance. There we know what, descriptively, we want the relevance of “there’s a 20% chance that p” to be: someone learning this information should, ceteris paribus, fix their credence in p to 0.2. And there’s a real question about whether a metaphysical primitive account of chance can justify that story (that’s Lewis’s objection to a putative primitivist treatment of chance facts).

The justification challenge is important, and how exactly to formulate a reasonable challenge here will be a controversial matter. E.g. maybe route (4), above, might appeal to the primitivist. Fine—but why is that response the thing that indeterminacy-information should prompt? I can see the outlines of a story if e.g. we were contextualists. But I don’t see what the primitivist should say.

But the more pressing concern right now is that for the primitivist about indeterminacy, we don’t as yet have a helpful answer to the descriptive question. So we’re not even yet in a position to start engaging with the justificatory project. This is what I see as the source of some dissatisfaction with primitivism – the sense that as an account it somehow leaves something unimportant explained. Until the theorist has told me something more I’m at a loss about what to do with the information that p is indeterminate

Furthermore, at least in certain applications, one’s options on the descriptive question are constrained. Suppose, for example, that you want to say that the future is indeterminate. But you want to allow that one can rationally have different credences for different future events. So I can be 50/50 on whether the sea battle is going to happen tomorrow, and almost certain I’m not about to quantum tunnel through the floor. Clearly, then, nothing like (2) or (3) is going on, where one can read off strong constraints on strength of belief in p from the information that p is indeterminate. (1) doesn’t look like a terribly good model either—especially if you think we can sometimes have knowledge of future facts.

So if you think that the future is primitively unfixed, indeterminate, etc—and friends of mine do—I think (a) you owe a response to the descriptive challenge; (b) then we can start asking about possible justifications for what you say; (c) your choices for (a) are very constrained.

I want to finish up by addressing one response to the kind of questions I’ve been pressing. I ask: what is the relevance of answering “it’s indeterminate” to first-order questions? How should I alter my beliefs in receipt of the information, what does it tell me about the world or the epistemic state of my informant?

You might be tempted to say that your informant communicates, minimally, that it’s at best indeterminate whether she knows that p. Or you might try claiming that in such circumstances it’s indeterminate whether you *should* believe p (i.e. there’s no fact of the matter as to how you should shape your credences on the question of whether p). Arguably, you can derive these from the determinate truth of certain principles (determinacy, truth as the norm of belief, etc) plus a bit of logic. Now, that sort of thing sounds like progress at first glance – even if it doesn’t lay down a recipe for shaping my beliefs, it does sound like it says something relevant to the question of what to do with the information. But I’m not sure about that it really helps. After all, we could say exactly parallel things with the “contingency answer” to the redness question with which we began. Saying “it’s contingent that p” does entail that it’s contingent at best whether one knows that p, and contingent at best whether one should believe p. But that obviously doesn’t help vindicate contingency-answers to questions of whether p. So it seems that the kind of indeterminacy-involving elaborations just given, while they may be *true*, don’t really say all that much.

Chancy counterfactuals—three options

I was chatting to Rich Woodward earlier today about Jonathan Bennett‘s attitude to counterfactuals about chancy events. I thought I’d put down some of the thoughts I had arising from that conversation.

The basic thought is this. Suppose that on conditional that A were to happen, it would be overwhelmingly likely that B—but not probability 1 that B would occur. Take some cup I’m holding—if I were to drop it out the window, it’s overwhelmingly likely that it would fall to the floor and break, rather than shoot off sideways or quantum tunnel through the ground. But (we can suppose) there’s a non-zero—albeit miniscule—chance that the latter things would happen. (You don’t need to go all quantum to get this result—as Adam Elga and Barry Loewer have emphasized recently, if we have counterfactuals about macroevents, the probabilities involved in statistical mechanics also attribute tiny but nonzero probability to similarly odd things happening).

The question is, how should we evaluate the counterfactual “Drop>Break” taking into account the fact that given that Drop, there’d be a non-zero but tiny chance that ~Break?

Let’s take as our starting point a Lewisian account of of the counterfactual—“A>B” is to be true (at w) iff B is true at all the closest A-worlds to B. Then the worry many people have is that though the vast majority of closest possible Drop-worlds will be Break worlds, there’ll be a residual tiny minority of worlds where it won’t break—where quantum tunnelling or freaky statistical mechanical possibilities are realized. But since Lewis’s truth-conditions require that Break be true at *all* the closest Drop-worlds, even that tiny minority suffices to make the counterfactual “Drop>Break” false.

As goes “Drop>Break”, so goes almost every ordinary counterfactual you can think of. Almost every counterfactual would be false, if the sketch just given is right. Some people think that’s the right result. We’ll come back to it below.

Lewis’s own response is to deny that the freaky worlds are among the closest worlds. His idea is that freakiness (or as he calls it, the presence of “quasi-miracles”) itself is one of the factors that pushes worlds further away from actuality. That’s been recently criticised by John Hawthorne among others. I’m about to be in print defending a generally Lewisian line on these matters—though the details are different from Lewis’s and (I hope) less susceptible to counterexample.

But if you didn’t take that line, what should you say about the case? A tempting line of thought is to alter Lewis’s clause—requiring not truth at all the closest worlds but truth at most, or the overwhelming majority of them. (Of course, this idea presumes it makes sense to talk of relative proportions of worlds—let’s spot ourselves that).

This has a marked effect on the logic of counterfactuals—in particular, the agglomeration rule (A>B, A>C, therefore A>B&C) would have to go (Hawthorne points this out in his discussion, IIRC). To see how this could happen, suppose that there are 3 closest A-worlds, and X needs to be true at 2 of them in order for “A>X” to be true. Then let the worlds respectively be B&C, ~B&C, ~C&B-worlds. This produces a countermodel to agglomeration.

Agglomeration strikes me as a bad thing to give up. I’m not sure I have hugely compelling reasons for this, but it seems to me that a big part of the utility of counterfactuals lies in our being able to reason under a counterfactual supposition. Given agglomeration you can start by listing a bunch of counterfactual consequences (X, Y, Z), reason in standard ways (e.g. perhaps X, Y, Z entail Q) and then conclude that, under that counterfactual supposition, Q. This is essentially an inference of the following form:

  1. A>X
  2. A>Y
  3. A>Z
  4. X,Y,Z\models Q

Therefore: A>Q.

And in general I think this should be generalized to arbitrarily many premises. If we have that, counterfactual reasoning seems secure.

But agglomeration is just a special case of this, where Q=X&Y&Z (more generally, the conjunction of the various consequents). So if you want to vindicate counterfactual reasoning of the style just mentioned, it seems agglomeration is going to be at the heart of it. I think giving some vindication of this pattern is non-negotiable. To be honest though, it’s not absolutely clear that making it logically valid is obviously required. You might instead try to break this apart into a fairly reliable but ampliative inference from A>X, A>Y, A>Z to A>X&Y&Z, and then appeal to this and the premise X&Y&Z\models Q to reason logically to A>Q. So it’s far from a knock-down argument, but I still reckon it’s on to something. For example, anyone who wants to base a fictionalism on counterfactuals (were the fiction to be true then…) better take an interest in this sort of thing, since on it turns whether we can rely on multi-premise reasoning to preserve truth-according-to-the-fiction.

Jonathan Bennett is one who considers altering the truth clauses in the way just sketched (he calls it the “near miss” proposal–and points out a few tweaks that are needed to ensure e.g. that we don’t get failures of modus ponens). But he advances a second non-Lewisian way of dealing with the above clauses.

The idea is to abandon evaluations of counterfactuals being true or false, and simply assign them degrees of goodness. The degree of goodness of a counterfactual “A>B” is equal to the proportion of the closest A worlds that are B worlds.

There are at least two readings of this. One is that we ditch the idea of truth-evaluation of counterfactuals conditionals altogether, much as some have suggested we ditch truth-evaluation of indicatives. I take it that Edgington favours something like this, but it’s unclear whether that’s Bennett’s idea. The alternative is that we allow “strict truth” talk for counterfactuals, defined by a strict clause—truth at all the closest worlds—but then think that this strict requirement is never met, and so it’d be pointless to actually evaluate counterfactual utterances by reference to this strict requirement. Rather, we should evaluate them on the sliding scale given by the proportions. Really, this is a kind of error theory—but one supplemented by a substantive and interesting looking account of the assertibility conditions.

Both seem problematic to me. The main issue I have with the idea that we drop truth-talk altogether is the same issues I have with indicative conditionals—I don’t see how to deal with the great variety of embedded contexts in which we find the conditionals—conjunctions, other conditionals, attitude contexts, etc etc. That’s not going to impress someone who already believes in a probabilistic account of indicative conditionals, I guess, since they’ll have ready to hand a bunch of excuses, paraphrases, and tendancies to bite selected bullets. Really, I just don’t think this will wash—but, anyway, we know this debate.

The other thought is to stick with an unaltered Lewisian account, and accept an error theory. At first, that looks like an advance over the previous proposal, since there’s no problem in generalizing the truthconditional story about embedded contexts—we just take over the Lewis account wholesale. Now this is something of an advance of a brute error-theory, since we’ve got some positive guidance about the assertibility conditions for simple counterfactuals—they’re good to the extent that B is true in a high proportion of the closest A-worlds. And that will make paradigmatic ordinary counterfactuals like “Drop>Break” overwhelmingly good.

But really I’m not sure this is much of an advance over the Edgington-style picture. Because even though we’ve got a compositional story about truth-conditions, we don’t as yet have an idea about how to plausibily extend the idea of “degrees of goodness” beyond simple counterfactuals.

As an illustration, consider “If I were to own a china cup, then if I were to drop it out the window, it’d break”. Following simple-mindedly the original recipe in the context of this embedded conditional, we’d look for the proportion of closest owning worlds where the counterfactual “Drop>Break” is true. But because of the error-theoretic nature of the current proposal, at none (or incredibly few) of those worlds would the counterfactual be true. But that’s the wrong result—the conditional is highly assertible. So the simple-minded application of the orginal account goes wrong in this case.

Of course, what you might try to do is to identify the assertibility conditions of “Own>(Drop>Break)” with e.g. “(Own&Drop)>Break”—so reducing the problem of asseribility for this kind of embedding by way of paraphrase to one where the recipe gives plausible. But that’s to adopt the same kind of paraphrase-to-easy-cases strategy that Edgington likes, and if we’re going to have to do that all the time (including in hard cases, like attitude contexts and quantifiers) then I don’t see that a great deal of advance is made by allowing the truth-talk—and I’m just as sceptical as in the Edgington-style case that we’ll actually be able to get enough paraphrases to cover all the data.

There are other, systematic and speculative, approaches you might try. Maybe we should think of non-conditionals as having “degrees of goodness” of 1 or 0, and then quite generally think of the degree of goodness of “A>B” as the expected degree of goodness of B among the closest A-worlds—that is, we look at the closest A-worlds and the degree of goodness of B at each of these, and “average out” to get a single number we can associate with “A>B”. That’d help in the “Own>(Drop>Break)” case—in a sense, instead of looking at the expected truth value of “Drop>Break” among closest Own-worlds, we’d be looking at the expected goodness-value of “Drop>Break” among Own-worlds. (We’d also need to think about how degrees of goodness combine in the case of truth functional compounds of conditionals—and that’s not totally obvious. Jeffrey and Stalnaker have a paper on “Conditionals as Random Variables” which incorporates a proposal something like the above. IIRC, they develop it primarily in connection with indicatives to preserve the equation of conditional probability with the probability of the conditional. That last bit is no part of the ambition here, but in a sense, there’s a similar methodology in play. We’ve got an independent fix for associating degrees with simple conditionals—not the conditional subjective probability as in the indicative case—rather, the degree is fixed by the proportion of closest antecedent worlds where the (non-conditional) consequent holds. In any case, that’s where I’d start looking if I wanted to pursue this line).

Is this sort of idea best combined with the Edgington style “drop truth” line or the error-theoretic evaluation of conditionals? Neither, it seems to me. Just as previously, the compositional semantics based on “truth” seems to do no work at all—the truth value of compounds of conditionals will be simply irrelevant to their degrees of goodness. So it seems like a wheel spinning idly to postulate truth-values as well as these “Degrees of goodness”. But also, it doesn’t seem to me that the proposal fits very well with the spirit of Edgington’s “drop truth” line. For while we’re not running a compositional semantics on truth and falsity, we are running something that looks for all the world like a compositional semantics on degrees of goodness. Indeed, it’s pretty tempting to think of these “degrees of goodness” as degrees of truth—and think that what we’ve really done is replace binary truth-evaluation of counterfactuals with a certain style of degree-theoretic evaluation of them.

So I reckon that there are three reasonably stable approaches. (1) The Lewis-style approach where freaky worlds are further away then they’d otherwise be on account of their freakiness—where the Lewis-logic is maintained and ordinary counterfactuals are true in the familiar sense. (2) The “near miss” approach where logic is revised, ordinary counterfactuals are true in the familiar sense. (3) Then there’s the “degree of goodness” approach—which people might be tempted to think of in the guise of an error theory, or as an extension of the Adams/Edgington-style “no truth value” treatment of indicatives—but which I think will have to end up being something like a degree-theoretic semantics for conditionals, albeit of a somewhat unfamiliar sort.

I suggested earlier that an advantage of the Lewis approach over the “near miss” approach was that agglomeration formed a central part of inferential practice with conditionals. I think this is also an advantage that the Lewis account has over the degree-theoretic approach. How exactly to make this case isn’t clear, since it isn’t altogether obvious what the *logic* of the degree theoretic setting should be—but the crucial point is “A>X1″… “A>Xn” can all be good to a very high degree, while “A>X1&…&Xn” are good to a very low degree. Unless we restrict ourselves to starting points which are good to degree 1, then we’ll have to be wary of degradation of degree of goodness while reasoning under counterfactual suppositions, just as on the near miss proposal we’d have to be wary of degradation from truth to faslity. So the Lewisian approach I favour is, I think, the only one of the approaches currently on the table which makes classical reasoning under counterfactual suppositions fully secure.

Probabilities and indeterminacy

I’ve just learned that my paper “Vagueness, Conditionals and Probability” has been accepted for the first formal epistemology festival in Konstanz this summer. It looks like the perfect place for me to get feedback on, and generally learn more about, the issues raised in the paper. So I’m really looking forward to it.

I’m presently presenting some of this work as part of a series of talks at Arche in St Andrews. I’m learning lots here too! One thing that I’ve been thinking about today relates directly to the paper above.

One of the main things I’ve been thinking about is how credences, evidential probability and the like should dovetail with supervaluationism. I’ve written about this a couple of times in the past, so I’ll briefly set out one sort of approach that I’ve been interested in, and then sketch something that just occurred to me today.

The basic question is: what attitude should we take to p, if we are certain that p is indeterminate? Here’s one attractive line of thought. First of all, it’s a familiar thought that logic should impose some rationality constraints on belief. Let’s formulate this minimally as the constraint that, for the rational agent, probability (credence or evidential probability) can never decrease across a valid argument:

A\models B \Rightarrow p(A)\leq p(B)

Now take one of the things that supervaluational logics are often taken to imply, where ‘D‘ is read as ‘it is determinate that’:

A\models DA

Then we note that this and the logicality constraint on probabilities entails that

p(A)\leq p(DA)

So in particular, if we fully reject A being determinate (e.g. if we fully accept that it’s indeterminate) then the probability of the RHS will be zero, and so by the inequality, the probability of the RHS is zero. (The particular supervaluational consequence I’m appealing to is controversial, since it follows only in settings which seem inappropriate for modelling higher-order indeterminacy, but we can argue by adding a couple of extra assumptions for the same result in other ways. This’ll do us for now though).

The result is that if we’re fully confident that A is indeterminate, we should have probability zero in both A and in not-A. That’s interesting, since we’re clearly not in Kansas anymore—this result is incompatible with classical probability theory. Hartry Field has argued in the past for the virtues of this result as giving a fix on what indeterminacy is, and I’m inclined to think that it captures something at the heart of at least one way of conceiving of indeterminacy.

Rather than thinking about indeterminate propositions as having point-valued probabilities, one might instead favour a view whereby they get interval values. One version of this can be defined in this setting. For any A, let u(A) be defined to be 1-p(\neg A). This quantity—how little one accepts the negation of a proposition—might be thought of as the upper bound of an interval whose lower bound is the probability of A itself. So rather than describe one’s doxastic attitudes to known indeterminate A as being “zero credence” in A, one might prefer the description of them as themselves indeterminate—in a range between zero and 1.

There’s a different way of thinking about supervaluational probabilities, though, which is in direct tension with the above. Start with the thought that at least for supervaluationism conceived as a theory of semantic indecision, there should be no problem with the idea of perfectly sharp classical probabilities defined over a space of possible worlds. The ways the world can be, for this supervaluationist, are each perfectly determinate, so there’s no grounds as yet for departing from orthodoxy.

But we also want to talk about the probabilities of what is expressed by sentences such as “that man is bald” where the terms involved are vague (pick your favourite example if this one won’t do). The supervaluationist thought is that this sentence picks out a sharp proposition only relative to a precisification. What shall we say of the probability of what this sentence expresses? Well, there’s no fact of the matter about what it expresses, but relative to each precisification, it expresses this or that sharp proposition—and in each case our underlying probability measure assigns it a probability.

Just as before, it looks like we have grounds for assigning to sentences, not point-like probability values, but range-like values. The range in question will be a subset of [0,1], and will consist of all the probability-values which some precisification of the claim acquires. Again, we might gloss this as saying that when A is indeterminate, it’s indeterminate what degree of belief we should have in A.

But the two recipes deliver totally utterly different results. Suppose, for example, I introduce a predicate into English, “Teads”, which has two precisifications: on one it applies to all and only coins which land Heads, on the other it applies to all and only coins that land Tails (or not Heads). Consider the claim that the fair coin I’ve just flipped will land Teads. Notice that we can be certain that this sentence will be indeterminate—whichever way the coin lands, Heads or Tails, the claim will be true on one precisification and false on the other.

What would the logic-based argument give us? Since we assign probability 1 to indeterminacy, it’ll say that we should assign probability 0, or a [0,1] interval, to the coin landing Teads.

What would the precisification-based argument give us? Think of the two propositions the claim might express: that the coin will land heads, or that the coin will land tails. Either way, it expresses a proposition that is probability 1/2. So the set of probability values associated with the sentence will be point-like, having value 1/2.

Of course, one might think that the point-like value stands in an interesting relationship to the [0,1] range—namely being its midpoint. But now consider cases where the coin is biased in one way. For example, if the coin is biased to degree 0.8 towards heads, then the story for the logic-based argument will remain the same. But for the precisification-based person the values will change to {0.8,0.2}. So we can’t just read off the values the precisificationist arrives at from what we get from the logic-based argument. Moral: in cases of indeterminacy, thinking of probabilities in the logic-based way wipes out all information other than that the claim in question is indeterminate.

This last observation can form the basis for criticism of supervaluationism in a range of circumstances in which we want to discriminate between attitudes towards equally indeterminate sentences. And *as an argument* I take it seriously. I do think there should be logical constraints on rational credence, and if the logic for supervaluationism is as its standardly taken to be, that enforces the result. If we don’t want the result, we need to argue for some other logic. Doing so isn’t cost free, I think—working within the supervaluational setting, bumps tend to arise elsewhere when one tries to do this. So the moral I’d like to draw from the above discussion is that there must be two very different ways of thinking about indeterminacy that both fall under the semantic indecision model. These two conceptions are manifest in different attitudes towards indeterminacy described above. (This has convinced me, against my own previous prejudices, that there’s something more-than-merely terminological to the question of “whether truth is supertruth”).

But let’s set that aside for now. What I want to do is just note that *within* the supervaluational setting that goes with the logic-based argument and thinks that all indeterminate claims should be rejected, there shouldn’t be any objection to the underlying probability measure mentioned above, and given this, one shouldn’t object to introducing various object-language operators. In particular, let’s consider the following definition:

“P(S)=n” is true on i, w iff the measure of {u: “S” is true on u,i}=n

But it’s pretty clear to see that the (super)truths about this operator will reflect the precisification-based probabilities described earlier. So even if the logic-based argument means that our degree of belief in indeterminate A should be zero, still there will be object-language claims we could read as “P(the coin will land Teads)=1/2” that will be supertrue. (The appropriate moral from the perspective of the theorist in question would be that whatever this operator expresses, it isn’t a notion that can be identified with degree of belief).

If this is right, then arguments that I’m interested in using against certain applications of the “certainty of indeterminacy entails credence zero” position have to be handled with extreme care. So, for example, in the paper mentioned right at the beginning of this post, I appeal to empirical data about folk judgements about the probabilities of conditionals. I was assuming that I could take this data as information on what the folk view about credences of conditionals is.

But if, compatibly with taking the “indeterminacy entails zero credence” view of conditionals, one could have within a language a P-operator which behaves in the ways described above, this isn’t so clear anymore. Explicit probability reports might be reporting on the P-operator, rather than subjective credence. So everything becomes rather delicate and very confusing.