Iteration vs. Entrenchment

I’m going to have one more run at a form of the Lewisian derivation that justifies the strong conclusions (e.g. that the reason for believing A would be a reason for believing each of the iterated B-claims.

I’ll be using strong-indication again, though since this is the only indication relation I’ll use in this discussion, I’ll drop the superscript disambiguation:

  • p\Rightarrow_x q =_{def} \exists rR_x(r, p)\rightarrow \forall r(R_x(r,p)\supset R_x(r,q))

Remember that R is the relation of something being sufficient reason to believe, *relative to background beliefs and epistemic standards*. Let’s introduce a new operator E_x, which will say that the embedded proposition is a background belief or epistemic standard for x—or as I’ll say for short, is entrenched for x.

We have the first three premises on a strong reading of indication again. But I’ll now change the fourth premise from an indication principle to one about E:

  1. A \supset B_u(A))
  2. A\Rightarrow_u \forall yB_y(A))
  3. A \Rightarrow_u q
  4. E_u \forall y [u\sim y]

A linked change is that we abandon IITERATION for a principle that says that propositions about what indicates what to a person is part of their epistemic standards/background beliefs:

  • ENTRENCHMENT \forall c \forall x ([A \Rightarrow_x c]\supset E_x[A\Rightarrow_x c]

The core derivation I have in mind goes like this:

  1. A\Rightarrow_u \forall y B_y A. Premise 2.
  2. E_u(A\Rightarrow_u \forall yB_y A). From 1 via ENTRENCHMENT.
  3. E_u \forall y [u\sim y]. Premise 4.
  4. E_u \forall z(A\Rightarrow_z \forall yB_y A). From 2,3 by NEWSYMMETRY+.
  5. A\Rightarrow_u\forall z B_z \forall yB_y A. From 1,4 by NEWCLOSURE+.

What then are these new principles of NEWSYMMETRY+ and NEWCLOSURE+ and how should we think about them? NEWSYMMETRY+ is another perspectival form based on the validity of strong symmetry:

  • SYMMETRY-S \forall c \forall x ([A \Rightarrow_x c]\wedge \forall y [x\sim y]\supset \forall y[A\Rightarrow_y c])

NEWSYMMETRY+ is then an instance of a principle that propositions that are entrenched for an individual are closed under valid arguments, with SYMMETRY-S providing the relevant valid argument:

  • NEWSYMMETRY+ \forall c \forall x\forall z[E_z[A \Rightarrow_x c]]\wedge [E_z\forall y[x\sim y]]\supset [E_z \forall y[A\Rightarrow_y c]]]

NEWCLOSURE+ is based again validity of closure for the B-operator under strong indication, which is again something that really just reduces to modus ponens for the counterfactual condition hidden inside the indication relation:

  • CLOSURE-S \forall a,c (\forall x B_x (a)\wedge \forall x[a \Rightarrow_x c]\supset \forall x B_x(c)))

But the principle we use isn’t just the idea that some operator or other is closed under closure. The thought is instead a principle about reason-transmission that goes as follows. Suppose two propositions entail a third, and r is sufficient reason (given one’s background beliefs and standards) to believe the first proposition. Then, if the second proposition is entrenched (part of those background beliefs and standards), r is a also sufficient reason (given one’s background beliefs and standards) to believe the third proposition. The underlying valid argument relevant to this is CLOSURE-S, which makes this, in symbols:

  • NEWCLOSURE+ \forall a,b,c\forall x ([a \Rightarrow_x \forall y B_y(b)]\wedge [E_x(\forall y[b \Rightarrow_y c])]\supset [a\Rightarrow_x \forall yB_y(c)])

NEWCLOSURE+ seems to me pretty well motivated. NEWSYMMETRY+ just as good as anything we’ve worked to so far. STANDARDS now replaces ITERATION. Unlike ITERATION, there’s no chance of deriving it from principles about counterfactuals and the transparency of whatever B stands for. Instead, it simply represents it’s own transparency assumption: that true propositions about the epistemic standards and background beliefs of an agent are themselves part of an agent’s epistemic background. It is weaker than a transparency assumption about beliefs or reasons to believe used in motivating ITERATION since it has a more restricted domain of application. It is stronger than earlier transparency assumptions insofar as it requires that the propositions to which it applies are not merely believed (or things we have reason to believe) but have the stronger status of being entrenched.

NEWCLOSURE+ is quite close in form to Cubitt and Sugden’s A6, except their principle used (what I notate as) the B operator throughout, where at a crucial point I have an instance of the E operator. An advantage that this gives me is that the E-operator doesn’t feature in the conclusion of the argument, so we are free to reinterpret it however we like to get the premises to come out true—trying to do reinterpret B would change the meaning of the conclusions we are deriving. So, for example, I complained against theirs that crucial principles seemed bad because some of your beliefs or reasons for beliefs might not be resilient under learning new information. But we are free to simply build into E that it applies only to propositions that are resiliently part of one’s background beliefs/standards (or maybe being resilient in that way is part of what it is for something to be treated as a standard/be background).

Having walked through this, let me illustrate the fuller form of the derivation, using all the premises.

  1. A\Rightarrow_u \forall y B_y A. Premise 2.
  2. A\Rightarrow_u q. Premise 3.
  3. E_u(A\Rightarrow_u q). From line 2 via ENTRENCHMENT.
  4. E_u \forall y [u\sim y]. Premise 4.
  5. E_u \forall z(A\Rightarrow_z q). From lines 3,4 by NEWSYMMETRY+.
  6. A\Rightarrow_u\forall z B_z q. From 1,5 by NEWCLOSURE+.
  7. E_u(A\Rightarrow_u\forall z B_z q). From line 6 via ENTRENCHMENT.
  8. E_u \forall y(A\Rightarrow_y \forall z B_z q). From lines 4,7 by NEWSYMMETRY+.
  9. A\Rightarrow_u\forall y B_y \forall z B_z q. From 1,8 by NEWCLOSURE+.
  10. ….

The pattern of the last few lines loops to get that A indicates each of the iterations of B-operator applied to q. And we can then appeal to Premise 1, A and CLOSURE to “detach” the consequents of lines 6,9, etc.

But for our purposes here and now, the more significant thing is lines 6 and 9 (and 12, 15 etc) prior to detachment. For these tell us that a sufficient reason for believing A is itself a sufficient reason for believing each of these iterated B propositions.

So to sum up: if we are content to work with weak indication relations, we can get away with the premises I used in other posts, including ITERATION and previous versions of SYMMETRY+ and CLOSURE+. If we want to work with strong indication, and get information about what is a reason for what, then we need to make changes, and the above is my best shot (especially in the light of the utter mess we got into in the last post!). Interestingly, while NEWSYMMETRY+ and NEWCLOSURE+ it seems to me are more or less equally plausible with the older analogues, the replacement for ITERATION (the principle I’m here calling ENTRENCHMENT) isn’t directly comparable to the earlier, though it’s still broadly a principle of transparency.

There is a delicate dialetical interplay between ENTRENCHMENT and the analysis of the indication relation. The stronger and more demanding indication is, the more plausible ENTRENCHMENT becomes, since the fewer instances fall under it. If we read indication as weak indication throughout, then ENTRENCHMENT would say that every counterfactual relating reasons for belief to reasons for other beliefs is part of the background beliefs/epistemic standards. That’s wildly strong! It’s pretty strong in strong indication version too. It becomes much more plausible if this were restricted to, for example, epistemic connections between propositions that are obvious to the agent.

In the settings I have considered in the previous posts, the counterfactual analysis earned its keep in part because ITERATION (which is here replaced by ENTRENCHMENT) could be treated as an iterated counterfactual. That’s no longer a consideration. The other advantage of having the counterfactual analysis is that it made CLOSURE an instance of modus ponens. But that’s not a reason for accepting the analysis of indication as a counterfactual—it’s just a reason for accepting that indication entails the counterfactual. The final reason for offering the counterfactual analysis is simply that it allows a reduction in the number of primitive notions around: in the original setting, it allows a reduction to just the B operator. That’s a consideration, but in the current context we’re having to work with E’s as well as B’s, so ideological purity is lost.

Once we need ENTRENCHMENT, it seems to me that it would be easier to defend the package presented here if we abandoned the counterfactual analysis of indication, and used it as a primitive notion, while adding as a premise the validity of the following principle which links a now-primitive indication relation to what we were previously calling strong indication:

  • p\Rightarrow^s_x q \supset \exists rR_x(r, p)\rightarrow \forall r(R_x(r,p)\supset R_x(r,q))

The soundness of the overall argument now turns on whether there exists a triple: of reason-relation, indication relation and entrenchment relation that makes true all the premises.

As a final note: the link between the counterfactual and primitive indication has two roles. One is simply a matter of reading off the significance of the final results. The other is to make CLOSURE valid. But it only makes CLOSURE valid if the B-operator is defined in the Lewisian way as having-reason-to-believe. As per that earlier post, a different counterfactual–concerning commitments to believe–matters for CLOSURE in that setting. So one would add that entailment as an extra premise about the now-primitive indication relation.

Strong and weak indication relations

[warning: it’s proving hard to avoid typos in the formulas here. I’ve caught as many as I can, but please exercise charity in reading the various subscripts].

In the Lewisian setting I’ve been examining in the last series of posts, I’ve been using the following definition of indicates-to-x (I use the same notation as in previous posts, but add a w-subscript to distinguish it from an alternative I will shortly introduce):

  • p\Rightarrow^w_x q =_{def} B_x p\rightarrow B_x q

The arrow on the right is the counterfactual conditional, and the intended interpretation of the B-operator is “has a reason to believe”. This fitted Lewis’s informal gloss “if x had reason to believe p, then x would thereby have reason to believe q”, except for one thing: the word thereby. Let’s call the reading above weak indication. Weak indication, I submit, gives an interesting version of the Lewisian derivation of iterated reason-to-believe from premises that are at least plausibly true in many paradigmatic situations of common belief.

But there is a cost. Lewis’s original gloss, combined with the results he derives, entail that each group member’s reasons for believing A obtains (say: the perceptual experience they undergo) are at the same time reasons for them to believe all the higher order iterations of reason-to-believe. That is a pretty explanatory and informative epistemology–we can point to the very things that (given the premises) justify us in all these apparently recherche comments. If we derive the same formal results on a weak reading of indication, we leave this open. We might suspect that the reasons for believing A are the reasons for believing this other stuff. But we haven’t yet pinned down anything that tells us this is the case.

I want to revisit this issue of the proper understanding of indication. I use R_x(r, p) to formalize the claim that r is a sufficient reason for x to believe that p (relative to x’s epistemic standards and background beliefs).  With this understood, B_x(p) can be defined as \exists r B(r,p).  Here is an alternative notion of indication—my best attempt to capture Lewis’s original gloss:

  • p\Rightarrow^s_x q =_{def} \exists rR_x(r, p)\rightarrow \forall r(R_x(r,p)\supset R_x(r,q))

In words: p strongly indicates q to x iff were x to have a sufficient reason for believing p, then all the sufficient reasons x has for believing p are sufficient reasons for x to believe q. (My thinking: in Lewis’s original the “thereby” introduces a kind of anaphoric dependence in the consequent of the conditional on the reason that is introduced by existential quantification in the antecedent. Since this sort of scoping isn’t possible given standard formation rules, what I’ve given is a fudged version of this).

Notice that the antecedent of the counterfactual here is identical to that used in the weak reading of indication. So we’re talking about the same “closest worlds where we have reason to believe p”. The differences only arise in what the consequent tells us. And it’s easy to see that, at the relevant closest worlds, the consequent of weak indication is entailed by the consequent of strong indication. So overall, strong indication entails weak indication.

If all the premises of my Lewis-style derivation were true under the strong reading, then the strong reading of the conclusion would follow. But some of the tweaks that I introduced in fixing up the argument seem to me implausible on the strong reading—more carefully, it is implausible that they are true on this reading in all the paradigms of common knowledge. Consider, for example, the premise:

  • A\Rightarrow_x \forall y (x\sim y)

In some cases the reason one has for believing A would be reason for believing that x and y are relevantly similar (as the conclusion states). I gave an example, I think, of a situation where the relevant manifest event A reveals to us both that we are members of the same conspiratorial sect. But this is not the general case. In the general case, we have independent reasons for thinking we are similar, and all that we need to secure is that learning A, or coming to have reason to believe A, wouldn’t undercut these reasons. (It was the possibility of undercutting in this way that was the source of my worry about the Cubitt-Sugden official reconstruction of Lewis, which doesn’t have the above premise, but rather than premise that x has reason to believe that x is similar to all the others).

So now we are in a delicate situation, if we want to derive the conclusions of Lewis’s argument on a strong reading of indication. We will need to run the argument with a mix of weak and strong indication, and hope that the mixed principles that are required will turn out to be true.

Here’s how I think it goes. First, the first three premises are true on the strong reading, and the final premise on the weak reading.

  1. A \supset B_u(A))
  2. A\Rightarrow^s_u \forall yB_y(A))
  3. A \Rightarrow^s_u q
  4. A\Rightarrow^w_u \forall y [u\sim y]

Of the additional principles, we appeal to strong forms of symmetry and closure:

  • SYMMETRY-S \forall c \forall x ([A \Rightarrow^s_x c]\wedge \forall y [x\sim y]\supset \forall y[A\Rightarrow^s_y c])
  • CLOSURE-S \forall a,c (\forall x B_x (a)\wedge \forall x[a \Rightarrow^s_x c]\supset \forall x B_x(c)))

In the case of closure, strong indication features only in the antecedent of the material conditional, so this is in fact weaker than closure on the original version I presented. These are no less plausible than the originals. As with those, the assumption is really not just that these are true—it is that they are valid (and so correspond to valid inference patterns). That is used in motivating the truth of principles that piggyback upon them are that are also used.

The “perspectival” closure principle can be used in a strong form:

  • CLOSURE+-S \forall a,b,c\forall x ([a \Rightarrow^s_x \forall y B_y(b)]\wedge [a \Rightarrow^s_x(\forall y[b \Rightarrow^s_y c])]\supset [a\Rightarrow^s_x \forall yB_y(c)])

The action in my vierw comes with the remaining principles, and in particular, the “perspectival” symmetry principle. Here it is in mixed form:

  • SYMMETRY+-M \forall a \forall c \forall x\forall z[a\Rightarrow^s_z[A \Rightarrow^s_x c]]\wedge [a \Rightarrow^w_z\forall y[x\sim y]]\supset [a\Rightarrow^s_z \forall y[A\Rightarrow^s_y c]

The underlying thought behind this perspectival principles (as with closure) is that when you have a valid argument, then if you have reason to believe the premises (in a given counterfactual situation), then you have reason to believe the conclusion. That’s sufficient for the weak reading we used in the previous posts. In a version where all the outer indication relations are strong, as with the strong CLOSURE+ above, it relies more specifically on the assumption that where r is a sufficient reason to believe each of the premises of a valid argument, it is sufficient reason to believe the conclusion.

We need a mixed version of symmetry because we only have a weak version of premise (4) to work with, and yet we want to get out a strong version of the conclusion. Justifying a mixed version of symmetry is more delicate than justifying either a purely strong or purely weak version. Abstractly, the mixed version says that if r is sufficient reason to believe one of the premises of a certain valid argument, and there is some reason or other to believe the second premise of that valid argument, then r is sufficient reason to believe the conclusion. This can’t be a correct general principle about all valid arguments. Suppose the reason to believe the second premise is s. Then why think that r alone is sufficient reason to believe the conclusion? Isn’t the most we get that r and s together are sufficient for the conclusion?

So we shouldn’t defend the mixed principle here on general grounds. Instead, the idea will have to be that with the specific valid argument in question (an instance of symmetry), assumptions about who I’m epistemically similar to (in epistemic standards and background beliefs) itself counts as a “background belief”. If that is the case, then we can argue that the reason for believing the first premise of the valid argument (in a counterfactual situation) is indeed sufficient relative to the background beliefs to entail the conclusion. One of the prerequisites of this understanding will be that either we assume that other agents will believe propositions about who they’re epistemically sensitive to in counterfactual situations where they have reason to believe those propositions; or else that talk of “background beliefs” is loose talk for background propositions that we have reason to believe. I think we could go either way.

In order to complete this, we will need iteration, and in the following, strong version:

  • ITERATION-S \forall c \forall x ([A \Rightarrow^s_x c]\supset [A \Rightarrow^s_x [A\Rightarrow^s_x c]]

I’ll come back to this.

Let me exhibit how the utmost core of a Lewisian argument looks in this version. I’ll compress some steps for readability:

  1. A\Rightarrow_u^s \forall y B_y A. Premise 2.
  2. A\Rightarrow^s_u(A\Rightarrow_u^s \forall yB_y A). From 1 via ITERATION-S.
  3. A\Rightarrow^w_u \forall y [u\sim y]. Premise 4.
  4. A\Rightarrow^s_u \forall z(A\Rightarrow_z^s \forall yB_y A). From 2,3 by SYMMETRY+-M.
  5. A\Rightarrow^s_u\forall z B_z \forall yB_y A. From 1,4 by CLOSURE+-S.

This style of argument—which can then be looped—is the basic core of a Lewis-style derivation. You can add in premise 3 and use CLOSURE+, and get something similar with q as the object of iterated B-operators, to get the original. And of course you can appeal to premise 1 and CLOSURE to “discharge” the antecedents of interim conclusions like 5 (this works with strong indication relations because it works for weak indication, and strong indication entails weak).

There’s an alternative way of mixing strong and weak indication relations. On this version we use a mixed form of ITERATION, the original weak SYMMETRY+, and then a mixed form of CLOSURE+

  • ITERATION-M \forall c \forall x ([A \Rightarrow^s_x c]\supset [A \Rightarrow^w_x [A\Rightarrow^s_x c]]
  • SYMMETRY+-W \forall a \forall c \forall x\forall z[a\Rightarrow^w_z[A \Rightarrow^s_x c]]\wedge [a \Rightarrow^w_z\forall y[x\sim y]]\supset [a\Rightarrow^w_z \forall y[A\Rightarrow^s_y c]
  • CLOSURE+-M \forall a,b,c\forall x ([a \Rightarrow^s_x \forall y B_y(b)]\wedge [a \Rightarrow^w_x(\forall y[b \Rightarrow^s_y c])]\supset [a\Rightarrow^s_x \forall yB_y(c)])
  1. A\Rightarrow_u^s \forall y B_y A. Premise 2.
  2. A\Rightarrow^w_u(A\Rightarrow_u^s \forall yB_y A). From 1 via ITERATION-M.
  3. A\Rightarrow^w_u \forall y [u\sim y]. Premise 4.
  4. A\Rightarrow^w_u \forall z(A\Rightarrow_z^s \forall yB_y A). From 2,3 by SYMMETRY+-W.
  5. A\Rightarrow^s_u\forall z B_z \forall yB_y A. From 1,4 by CLOSURE+-M.

The main advantage of this version of the argument would be that the version of ITERATION it requires is weaker. Otherwise, we are simply moving the bump in the rug from mixed SYMMETRY+ to mixed CLOSURE+. And that seems to me a damaging shift. We use mixed SYMMETRY+ many times, but the only belief we have ever to assume is “background” to justify the principle is the belief that all are similar to me. In the revised form, to run the same style of defence, we would have to assume that belief about indication relations of more and more complex contents are backgrounded. And that simply seems less plausible. So I think we should stick with the original if we can. (On the other hand, the principle we would need here is close to the sort of “mixed” principle that Cubitt and Sugden use, and they are officially reading “indication” in a strong way. So maybe this should be acceptable).

So what about the ITERATION-S, the principle that the argument now turns on? As a warm up, let me revisit the motivation for the original, ITERATION-W, which fully spelled out would be:

  • [\exists r R_u(r, A)\rightarrow \exists r R_u(r,c))]
    \supset[\exists s R_u(s,A)\rightarrow
    \exists s R_u(s,[\exists r R_u(r, A)\rightarrow \exists r R_u(r,c)])]

Assume the first line is the case. Then we know that at the worlds relevant for evaluating the second and third lines, we have both \exists r R_u(r,c) and \exists r R_u(r, A). By an iteration principle for reason-to-believe, \exists s_1R_u(s_1,\exists r R_u(r,c)) and \exists s_2 R_u(s_2,\exists r R_u(r, A)). And by a principle of conjoining reasons (which implicitly makes a rather strong consistency assumption about reasons for belief) \exists s R_u(s,\exists r R_u(r,A)\wedge \exists r R_u(r, c)). But a conjunction entails the corresponding counterfactual in counterfactual logics for strong centering, and so plausibly the reason to believe the conjunction is a reason to believe the counterfactual: \exists s R_u(s,\exists r R_u(r,A)\rightarrow \exists r R_u(r, c)). That is the rationale for the original iteration principle.

Unfortunately, I don’t think there’s a similar rationale for the strong iteration principle. The main obstacle is the following: point: one particular sufficient reason for believing A to be the case (call it s) is unlikely to be one’s reason for believing a counterfactual generalization that covers all reasons to believe that A is the case. In the original version of iteration, this wasn’t at issue at all. But the rationale I offered uses a strategy of finding a reason to believe a counterfactual by exhibiting a reason to believe the corresponding conjunction, which entails the counterfactual. In order to find a reason to believe the conjunction of the relevant counterfactual below (the one appearing in the third line) But an essential part of that strategy was arguing that a certain thing was a reason to believe a counterWhen you write down what strong iteration means in detail, you see (in the third line below) that this is going to have to be argued for. I can’t see a strategy for arguing for this, and I the principle itself seems likely to be false to me, as stated.

  • [\exists r R_u(r, A)\rightarrow \forall r (R_u(r, A)\supset  R_u(r,c))]
    \supset[\exists s R_u(s,A)\rightarrow
    \forall s( R_u(s,A)\supset R_u(s,[\exists r R_u(r, A)\rightarrow \forall r (R_u(r, A)\supset R_u(r,c))]]

That’s bad news. Without this principle, the first mixed version of the argument I presented above doesn’t go through. I think there’s a much better chance of mixed iteration being argued for, which is what was needed for the second version of the argument. But that was the version of the argument that required the dodgy mixed closure principle. Perhaps we should revisit that version?

I’m closing this out with one last thought. The universal quantifier in the consequent of the indication counterfactual is the source of the trouble for strong ITERATION. But that was introduced as a kind of fudge for the anaphor in the informal description of the indication relation. One alternative is use a definite description in the conclusion of the conditional—which on Russell’s theory introduces the assumption that there is only one sufficient reason (given background knowledge and standards) for believing the propositions in question. This would give us:

  • p\Rightarrow^d_x q =_{def}
    \exists rR_x(r, p)\rightarrow \exists r(R_x(r,p)\wedge \forall s (R_x(s, p)\supset r=s) \wedge R_x(r,q))

Much of the discussion above can be rerun with this in place of strong indication. And I think the analogue of the strong ITERATION has a good chance of being argued for here, provided that we have a suitable iteration priciple for reason-to-believe. For weak iteration, we needed only to assume that when there is reason to believe p, there is reason to believe that there is reason to believe p. In the rationale for a new stronger version of ITERATION that I have in mind we will need that when s is a reason to believe that p, then s is a reason to believe that s is a reason to believe that p. Whether this will fly, however, turns both on being able to justify that strong iteration principle and on whether indication in the d-version, with its uniqueness assumption, finds application in the paradigmatic cases.

For now, my conclusion is that the complexities involved here justifies the decision to run the argument in the first instance with weak indication throughout. We should only dip our toes into these murky waters if we have very good reason to do so.

Identifying the subjects of common knowledge

Suppose that it’s public information/common belief/common ground among a group G that the government has fallen (p). What does this require about what members of G know about each other?

Here are three possible situations:

  1. Each knows who each of the other group members is, attributing to (de re) to each whatever beliefs (etc) are required for it to be public information that p.
  2. Each has a conception corresponding to each member of the group. One attributes, under that conception, whatever beliefs (etc) are required for it to be public information that p.
  3. Each has a concept of the group as a whole. Each generalizes about the members of the group, to the effect that every one of them has the beliefs (etc) required for it to be public information that p.

Standard formal models of common belief suggest the a type 1 situation (though, as with all formal models, they can be reinterpreted in many ways). The models index accessibility relations by group members. One advantage of this is that once we fix which world is actual, we’re in a position to unambiguously read off the model what the beliefs of any given group member is—one looks at the set of worlds accessible according to their accessibility relation. What it takes in these models for A to believe that B believes that p is for all the A-accessible worlds to be such that all worlds B-accessible from them are ones where p is true. So also: once we as theorists have picked our person (A), it’s determined what B believes about A’s beliefs—there’s no further room in the model for further qualifications or caveats about the “mode of presentation” under which B thinks of A.

Stalnaker argues persuasively this is not general enough, pointing to cases of type 2 in our classification. There are all sorts of situations in which the mode of presentation under which a group member attributes belief to other group members is central. For example (drawing on Richard’s phone booth case) I might be talking to one and the same individual by phone that I also see out the window, without realizing they are the same person. I might attribute one set of beliefs to that person qua person-seen, and a different set of beliefs to them qua person-heard. That’s tricky in the standard formal models, since there will be just one accessibility relation associated with the person, where we need at least two. Stalnaker proposes to handle this by indexing the accessibility relations not to an individual but to an individual concept—a function from worlds to individuals—which will draw the relevant distinctions. This comes at a cost. Fix a world as actual, and in principle one and the same individual might fall under many individual concepts at that world, and those individual concepts will determine different belief sets. So this change needs to be handled with care, and more assumptions brought in. Indeed, Stalnaker adapts the formal model in various ways (e.g. he ultimately ends up working primarily with centred worlds). These details needn’t delay us, since my concern here isn’t with the formal model directly.  Rather, I want to point to the  desiderata that it answers to: that we make our theory of common belief sensitive to the ways in which we think about other individual group-members. It illustrates that the move to type 2 cases is a formally (and philosophically) significant step.

The same goes for common belief of type 3, where the subjects sharing in the common belief are characterized not individually but as members of a certain group. Here is an example of a type-3 case (loosely adapted from a situation Margaret Gilbert discusses in Political Obligation). We are standing in the public square, and the candidate to be emperor appears on the dais. A roar of acclaim goes up from the cloud—including you and I. It is public information among the crowd that the emperor has been elected by acclimation. But the crowd is vast—I don’t have any de re method of identifying each crowd member, nor do I have an individualized conception of each one. This situation is challenging to model in either the standard or Stalnakerian ways. But it seems (to me) a paradigm of common belief.

Though it is challenging to model in the multi-modal logic formal setting, other parts of the standard toolkit for analyzing common belief cover it smoothly. Analyses of common belief/knowledge like Lewis’s approach from Convention (and related proposals, such as Gilbert’s) can take it in their stride. Let me present it using the assumptions that I’ve been exploring in the last few posts. I’ll make a couple of tweaks: I’ll consider instances of the assumptions as they pertain to a specific member of the crowd (you, labeling u). I’ll make explicit the restriction to members of the crowd, C. The first four premises are then:

  1. A \supset B_u(A))
  2. (A\Rightarrow_u [\forall y: Cy] B_y(A))
  3. (A \Rightarrow_u q)
  4. ([A\Rightarrow_u [\forall y: Cy](x\sim y)]

For “A”, we input a neutral description of the state of affairs of the emperor receiving acclaim on the dais in full view of everyone in the crowd. q is the proposition that the emperor has been elected by acclimation. The first premise says that it’s not the case that the following holds: the emperor has received acclaim on the dais in full view of the crowd (which includes you) but you have no reason to believe this to be the case. In situations where you are moderately attentive this will be true. The second assumption says that you would also have reason to believe that everyone in the crowd has reason to believe that the emperor has received acclaim on the dais in full view of the crowd, if you have reason to believe that the emperor has received such acclaim in the first place. That also seems correct. The third says if you had reason to believe this situation had occurred, you would have reason to believe that the emperor had been elected by acclimation. Given modest background knowledge of political customs of your society (and modest anti-sceptical assumptions) this will be true too. And the final assumption says that you’d have reason to believe that everyone in the crowd had relevantly similar epistemic standards and background knowledge (e.g. anti-sceptical, modestly attentive to what their ears and eyes tell them, aware of the relevant political customs), if/even if you have reason to believe that this state of affairs obtained.

All of these seem very reasonable: and notice, they are perfectly consistent with utter anonymity of the crowd. There are a couple of caveats here, about the assumption that all members of the crowd are knowledgable or attentive in the way that the premises presuppose. I come back to that later

Together with five other principles I set out previously (which I won’t go through here: the modifications are obvious and don’t raise new issues) these deliver the following results (adapted to the notation above):

  • A \Rightarrow_u q
  • A\Rightarrow_u [\forall y: Cy] B_y(q)
  • A\Rightarrow_u [\forall z : Cz] B_z([\forall y: Cy] B_y(q))
  • \ldots

And each of these with a couple more of the premises entails:

  • B_u q
  • B_u [\forall y : Cy] B_y(q)
  • B_u [\forall z : Cz] B_z([\forall y: Cy] B_y(q))
  • \ldots

It’s only at this last stage that we then need to generalize on the “u” position, reading the premises as holding not just for you, but schematically for all members of the crowd. We then get:

  • [\forall x : Cx] B_x q
  • [\forall x : Cx] B_x [\forall y :Cy] B_y(q)
  • [\forall x : Cx] B_x [\forall z : Cz] B_z([\forall y\in C] B_y(q))
  • \ldots

If this last infinite list of iterated crowd-reasons-to-believe is taken to characterize common crowd-belief, then we’ve just derived this from the Lewisian assumptions. And nowhere in here is any assumption about identifying crowd members one by one. It is perfectly appropriate for situations of anonymity.

(A side point: one might explore ways of using rather odd and artificial individual concepts to apply Stalnaker’s modelling to this case. Suppose, for example, there is some arbitrary total ordering of people, R. Then there are the following individual concepts: the R-least member of the crowd, the next-to-R-least member of the crowd, etc. And if one knows that all crowd members are F, then in particular one knows that the R-least crowd member is F. So perhaps one can extend the Stalnakerian treatment to the case of anonymity through these means. However: a crucial question will be how to handle cases where we are ignorant of the size of the crowd, so ignorant about whether “the n-th crowd member in the crowd” fails to refer. I don’t have thoughts to offer on this puzzle right now, and it’s worth remembering that nobody’s under any obligation to extend this style of formal modelling to the case of anonymous common belief.)

Type-3 cases allow for anonymity among the subjects of common belief. But remember  that it needs to be assumed that all members of the crowd are knowledgable and attentive. In small group settings, where we can monitor the activities of each other group member, each can be sensitive to whether others have the relevant properties.  But this seems in principle impossible in situations of anonymity. On general grounds, we might expect most of the crowd members to have various characteristics, but as the numbers mount up, the idea that the characteristics are universally possessed would be absurd. We would be epistemically irresponsible not to believe, in a large crowd, that some will be distracted (picking up the coins they just dropped and unsure what the sudden commotion was about) and some will lack the relevant knowledge (the tourist in the wrong place at the wrong time). The Lewisian conditions for common belief will fail; likewise, the first item on the infinite list characterizing common belief itself will fail—the belief that q will not be unanimous.

So we can add to earlier list a fourth kind of situation. In a type-4 situation, the crowd is not just anonymous, but also contains the distracted and ignorant. More generally: it contains unbelievers.

A first thought about accommodating type 4 situations is to replace the quantifiers, replacing the universal quantifiers “all” with “most” (or: a certain specific fraction). We would then require that the state of affairs indicates to most crowd members that the emperor was elected by acclimation; that it indicates to most that most have reason to believe that the emperor was elected by acclimation, and so on. (This is analogous to the kind of hedges that Lewis imposes on the initially unrestricted clauses characterizing convention in his book). But the analogue of the Lewis derivation won’t go through. Here’s one crucial breaking point. One of the background principles that is needed in getting from Lewis’s premises to the infinite lists was the following: If all have reason to believe that A, and for all, A indicates that q, then all have reason to believe that q. Under the intended understanding of “indication”, this is underwritten by modus ponens, applied to an arbitrary member of the group in question–and then universal generalization. But if we replace the “all” by “most”, we have something invalid: If most have reason to believe that A, and for most, A indicates that q, then most have reason to believe that q. The point is that if you pool together those who don’t have reason to believe that A, and those for whom A doesn’t indicate that q, you can find enough unbelievers that it’s not true that most have reason to believe that q.

A better strategy is the analogue of one that Gilbert suggests in similar contexts (in her book Political Obligation). We run the original unrestricted analysis not for the crowd but for some subgroup of the crowd: the attentive and knowledgeable. Let’s call this the core crowd. You are a member of the core crowd, and the Lewisian premises seem correct when restricted to the core crowd (for example: the public acclaim indicates to you that all attentive and knowledgable members of the crowd have reason to believe that he public acclaim occurred). So the derivation can run on as before, and established the infinite list of iterated reason-to-believe among members of the core crowd.

(Aside: Suppose we stuck with the original restriction to members of the crowd, but replaced the quantifiers for “all” not with some “most” or fractional quantifier, but with a generic quantifier. The premises become something like: given A,  crowd members believe A; A indicates to crowd members that crowd members believe A; A indicates to crowd members that q; crowd members have reason to believe that crowd members are epistemically similar to themselves, if/even if they have reason to believe A. These will be true if generically, crowd members are attentive and knowledgable in the relevant respects. Now, if the generic quantifier is aptly represented as a restricted quantifier—say, restricted to “typical” group members—then we can derive an infinite list of iterated reason-to-believe principles by the same mechanism as with any other restricted quantifier that makes the premises true. And the generic presentation makes the principles seem cognitively familiar in ways in which explicit restrictions do not. I like this version of the strategy, but whether it works turns on issues about the representation of generics that I can’t explore here.)

Once we allow arbitrary restrictions into the characterization of common belief, it makes it potentially pretty cheap (I think this is a point Gilbert makes—she certainly emphasizes the group-description-sensitivity of “common knowledge” on her understanding of it). For an example of cheap common belief, consider the group: those in England who have reason to believe sprouts are tasty (the English sprout-fanciers). All English sprout fanciers have reason to believe that sprouts are tasty. That is analytically true! All English sprout fanciers have reason to believe that all English sprout fanciers have reason to believe that sprouts are tasty, since they have reason to believe things that are true by definition. And all English sprout fanciers have reason to believe this last iterated belief claim, since they have reason to believe things that follow from definitions and platitudes of epistemology. So on, all the way up the hierarchy. 

So there seems to be here a cheap common belief among the English sprout fanciers that sprouts are tasty. It’s cheap, but useless, given that I, as an English sprout fancier, am not in a position to coordinate with another English sprout fancier—we can meet one in any ordinary context and not have a clue that they are one of the subjects involved in this common belief is shared. (Contrast if the information that sprouts are tasty were public among a group of friends going out to dinner). It seems very odd to call the information that sprouts are tasty public among the English sprout fanciers, since all that’s required on my part to acquire all the relevant beliefs in this case is one idiosyncractic belief and a priori reflection. Publicity of identification of subjects among whom public information is possessed seems part of what’s required for information to be public in the first place. Type 1 and type 2 common beliefs build this in. Type 3 common beliefs, if applied to groups membership of which is easy to determine on independent grounds, don’t raise many concerns about this. But once we start using artificial, unnatural, restrictions under pressure from type 4 situations, the lack of any publicity constraint on identification becomes manifest, dramatized by the cases of cheap common belief.

Minimally, we need to pay attention to whether the restrictions that we put into the quantifiers that characterize type 3 or 4 common belief undermine the utility of attributing common belief among the group so-conceivedBut it’s hard to think of general rules here. For example, in the case characterized above of the emperor-by-acclamation, the restriction to the core crowd–the attentitive and knowledgeable crowd members—seems to me harmless, illuminating and useful. On the other hand, the same restrictions in the case in the next paragraph gives us common belief that while not as cheap as the sprout case earlier, is prima facie just as useless.

Suppose that we’re in a crowd milling in the public square, and someone stands up and shouts a complex piece of academic jargon that implies (to those of us with the relevant background) that the government has fallen. This event indicates to me that the government has fallen, because I happened to be paying attention and speak academese. I know that the vast majority of the crowd either weren’t paying attention to this speech, and haven’t wasted their lives obtaining the esoteric background knowledge to know what it means. Still, I could artificially restrict attention to the “core” crowd, again defined as those that are attentive and knowledgable in the right ways. But now this “core” crowd are utterly anonymous to me, lost among the rest of the crowd in the way that English sprout fanciers are lost among the English more generally. The core crowd might be just me, or it could consist of me and one or two others. I don’t have a clue. Again: it is hardly public between all the core crowd (say, three people) that they share this belief, if for all each of them know, they might be the only one with the relevant belief. And again: this case illustrates that the same restriction that provides useful common belief in one situation gives useless common belief in another.

The way I suggest tackling this is to start with the straightforward analysis of common belief that allows for cheap common belief, but then start building in suitable context-specific anti-anonymity requirements as part of an analysis of an account of the conditions under which common belief is useful. In the original crowd situation for example, it’s not just that the manifest event of loud acclaim indicated to all core crowd members that all core crowd members have reason to believe that the emperor was elected by acclaim. It’s also that it indicated to all core crowd members that most of the crowd are core crowd. That means that in the circumstances, it is public among the core crowd that they are the majority among the (easily identifiable) crowd. Even though there’s an element of anonymity, all else equal each of us can be pretty confident  that a given arbitrary crowd member is a member of the core crowd, and so is a subject of the common belief. In the second scenario given in the paragraph above, where the core crowd is a vanishingly small proportion of the crowd, it will be commonly believed among the core that they are a small minority, and so, all else equal, they have no ability to rationally ascribe these beliefs to arbitrary individuals they encounter in the crowd.

We can say: a face to face useful common belief is one that where there are face-to-face method of categorizing the people we encounter (independently of their attitudes to the propositions in question) within a certain context as a G*, where we know that most G*s are members of the group among which common belief prevails.

(To tie this back to the observation about generics I made earlier: if generic quantifiers allow the original derivation to go through, then there may be independent interest in generic common belief among G*s, where this only requires the generic truth that G* members belief p, believe that G* members belief p, etc. The truth of the generic then (arguably!) licenses default reasoning attributing these attitudes to an arbitrary G*. So generic common belief among a group G*, where G*-membership is face-to-face recognizable, may well be a common source of face-to-face useful common belief).

Perhaps only face-to-face useful common beliefs are decent candidates to count as information that is “public” among a group. But face-to-face usefulness isn’t the only kind of usefulness. The last example I discuss brings out a situation in which the characterization we have of a group is purely descriptive and detached from any ability to recognize individuals within the group as such, but is still paradigmatically a case in which common beliefs should be attributed.

Suppose that I wield one of seven rings of power, but don’t know who the other bearers are (the rings are invisible so there’s no possibility of visual detection–and anyway, they are scattered through the general population). If I twist the ring in a particular way, then in the case that all other ring bearers do likewise, then the dark lord will be destroyed, if he has just been reborn. If he has not just been reborn, or if not all of us twist the ring in the right way, everyone will suffer needlessly. Luckily, there will be signs in the sky and in the pit of our stomachs that indicate to a ring bearer when the dark lord has been reborn. All of us want to destroy the dark lord, but avoid suffering. All of us know these rules. When the distinctive feelings and signs arise, it will be commonly believed among the ring bearers that the dark lord has been reborn. And this then sets us up for the necessary collective action: we twist each ring together, and destroy him. This is common belief/knowledge among an anonymous group where there’s no possibility of face-to-face identification. But it’s useful common belief/knowledge, exactly because it sets us up for some possible coordinated action among the group so-characterized.

I don’t know whether I want to say that the common knowledge among the ring-bearers is public among them (if we did, then clearly face to face usefulness can’t be a criterion for publicity…). But the case illustrates that we should be interested in common beliefs in situations of extreme anonymity—after all, there’s no sense in which I have de re knowledge even potentially of the other ring-bearers. Nor have I even any way getting an informative characterization of larger subpopulations to which they belong, or even of raising my credence in the answer to such questions. But despite all this, it seems to be a paradigmatic case of common belief subserving coordinated action—one that any account of common belief should provide for. Many times, cooperative activity between a group of people requires they identify each other face-to-face, but not always, and the case of the ring bearers reminds us of this.

Stepping back, the upshot of this discussion I take to be the following:

  • We shouldn’t get too caught up in the apparent anti-anonymity restrictions in standard formal models of common belief, but we should recognize that they directly handle on a limited range of cases.
  • Standard iterated characterizations generalize to anonymous groups directly, as do Lewisian ways of deriving these iterations from manifest events.
  • We can handle worries about inattentive and unknowledgable group members by the method of restriction (which might include as as special case: generic common belief).
  • Some common belief will be very cheap on this approach. And cheap common belief is a very poor candidate to be “public information” in any ordinary sense.
  • We can remedy this by analyzing the usefulness of common belief (under a certain description) directly. Cheap common belief is just a “don’t care”.
  • Face-to-face usefulness is one common way in which common belief among a restricted group can be useful. This requires that it be public among the restricted group that they are a large part (e.g. a supermajority, or all typical members) of some broader easily recognizable group.
  • Face-to-face usefulness is not the only form of usefulness, as illustrated by the extreme anonymity of cases like the ringbearers.





Reinterpreting the Lewis-Cubitt-Sugden results

In the last couple of posts, I’ve been discussing Lewis’s derivation of iterated “reason to believe” q from the existence of a special kind of state of affairs A. I summarize my version of this derivation as follows, with the tilde standing for “x and y are similar in epistemic standards and background beliefs”.

We start from four premises:

  1. \forall x (A \supset B_x(A))
  2. \forall x (A\Rightarrow_x \forall yB_y(A))
  3. \forall x (A \Rightarrow_x q)
  4. \forall x ([A\Rightarrow_x \forall y [x\sim y]]

Five additional principles are either used, or are implicit in the motivation for principles that are used:

  • ITERATION \forall c \forall x ([A \Rightarrow_x c]\supset [A \Rightarrow_x [A\Rightarrow_x c]]
  • SYMMETRY \forall c \forall [A \Rightarrow_x c]\wedge \forall y[x\sim y]]\supset [\forall y[A\Rightarrow_y c]]
  • CLOSURE \forall a,c (\forall x B_x (a)\wedge \forall x[a \Rightarrow_x c]\supset \forall x B_x(c)))
  • SYMMETRY+ \forall a \forall c \forall x\forall z[a\Rightarrow_z[A \Rightarrow_x c]]\wedge [a \Rightarrow_z\forall y[x\sim y]]\supset [a\Rightarrow_z [\forall y[A\Rightarrow_y c]]
  • CLOSURE+ \forall a,b,c\forall x ([a \Rightarrow_x \forall y B_y(b)]\wedge [a \Rightarrow_x(\forall y[b \Rightarrow_y c])]\supset [a\Rightarrow_x \forall yB_y(c)])

In the last post, I gave a Lewis-Cubitt-Sugden style derivation of the following infinite series of propositions, using (2-4), SYMMETRY+, CLOSURE+, ITERATION:

  • A \Rightarrow_x q
  • A\Rightarrow_x \forall y B_y(q)
  • A\Rightarrow_x (\forall z B_z(\forall y B_y(q)))
  • \ldots

A straightforward extension of this assumes (1) and CLOSURE, obtaining the following results in situations where A is the case:

  • \forall x B_x(q)
  • \forall x B_x(\forall y B_y(q))
  • \forall _x B_x(\forall z B_z(\forall y B_y(q)))
  • \ldots

The proofs are valid, so each line in these two infinite sequences hold no matter how one reinterprets the primitive symbols, so long as the premises are true under that reinterpretation.

As we’ve seen in the last couple of posts, for Lewis, “indication” was a kind of shorthand. He defined it as follows:

  • p\Rightarrow_x q := B_x(p)\rightarrow B_x(q)

where \rightarrow is the counterfactual conditional.

Now, this definition is powerful. It means that CLOSURE needn’t be assumed as a separate premise—it follows from the logic of counterfactuals. And if “reason to believe” is closed under entailment, then we also get CLOSURE+ for free. As noted in edits to the last post, it means that we can get ITERATION from the logic of counterfactuals and a transparency assumption, viz. B_x(p)\supset B_x(B_x(p)).

The counterfactual gloss was also helpful in interpreting what (4) is saying. The word “indication” might suggest that when A indicates p, A must be something that itself gives the reason to believe p. That would be a problem for (4), but the counterfactual gloss on indication removes that implication.

Where Lewis’s interpretation of the primitives is thoroughly normative, we might try running the argument in a thoroughly descriptive vein (see the Stanford Encyclopedia for discussion of an approach to Lewis’s results like this.).

To read the current argument descriptively, we might start by reinterpreting B_x(p) as saying: x believes that p, and indication to be defined out of this notion counterfactually just as before. The trouble with this is some of the premises look false, read this way. For example, CLOSURE+ asks us to consider scenarios where x’s beliefs are thus-and-such, where the propositions x believes in that scenario entails (by CLOSURE) the proposition that the conclusion tells us x believes. unless the agent actually believes all the consquences of things she believes, it’s not clear why we should assume the condition in the consequent of CLOSURE+ holds. Similar issues arise for SYMMETRY+ and ITERATION.

One reaction at this point is to argue for a “coarse grained” conception of belief that makes it closed under entailment. That’s a standard modelling assumption in the formal literature on this topic, and something that Lewis and Stalnaker both (to a first approximation) accept. It’s extremely controversial, however.

If we don’t like that way of going, then we need to revisit our descriptive reinterpretation of the primitives. We could define them so as to make closure under such principles automatic. So, rather than have B_x(p) say that x believes p, we might read it as saying that x is committed to believe p, where x is committed to believe something when it follows from the things they believe (in a fuller version, I’d refine this characterization to allow for circumstances in which a person’s beliefs are inconsistent, without her commitments being trivial, but for now, let’s idealize away that possibility and work with the simpler version). Indication becomes: were x to be committed to believe that p, then they would be committed to believe that q.

If you read through the premises under this descriptive reinterpretation, then I contend that you’ll find they’ve got as good a claim to be true as the analogous premises on the original normative interpretation.

These interpretations need not compete. Lewis’s normative interpretation of the argument may be sound, and the commitment-theoretic reinterpretation may also be sound. In paradigmatic cases where there is a basis for common knowledge in Lewis’s sense, we may have an infinite stack of commitments-to-believe, and a parallel infinite stack of reasons-to-believe.

But notice! What the first Lewis argument gives us is reason to believe that others have reason to believe such-and-such. It doesn’t tell us that we have reason to believe that others are committed to believe so-and-so. So for some of the commitments that people take on in such situations (commitments about what others are committed to believe) might be unreasonable, for all these two results tell us. This will be my focus in the rest of this post, since I am particularly interested in the derivation of infinite commitment-to-believe. I think that the normative question: are these commitments epistemically reasonable? is a central one for a commitment-theoretic way of understanding what “public information” or “common belief” consists in.

Let me first explore and expose a blind alley. Lewis himself extracts descriptive predictions about belief from his account of iterated reasons for belief in situations of common knowledge, he adds assumptions about all people being rational, i.e. believing what they have reason to believe. He further adds assumptions about us having reason to believe each other to be rational in this sense, and so on. Such principles of iterated rationality are thought by Lewis to only be true for the first few iterations. They generate, for a few iterations, that we believe that q, believe that we believe q, believe that we believe that we believe q, etc. And in parallel, we can show that we have reason to believe each of these propositions about iterated belief—so all the belief we in fact have will be justified.

But while (per Lewis) these predictions are by designed supposed to run out after a few iterations, we need to show how everything we are committed to believing we have reason to believe. One might try to parallel Lewis’s strategy here, adding the premise that people are committed to believing what they have reason to believe. One might hope that such bridge principles will be true “all the way up”, and so allow us to derive the analogue of Lewis’s result for all levels of iteration. But this is where we hit the end of this particular road. If someone (perhaps irrationally) fails to believe that the ball is red despite having reason to believe that the ball is red, the ball being red need not follow from what they believe. So we do not have the principles we’d need to to convert Lewis’s purely normative result into one that speaks to the epistemic puzzle about commitment to believe.

Now for a positive proposal. To address the epistemic puzzle, I propose a final reinterpretation of the primitives of Lewis’s account. This time, we split the interpretation of indication and of the B-operator. The B-operator will express commitment-to-believe, just as above. But the indicates-for-x relation does not simply express counterfactual commitment, but has in addition a normative aspect. p will indicate q, for x, iff (i) were x to be committed to believing p, then x would be committed to believing q; and (ii) if x had reason to believe p, then x would have reason to believe q.

Before we turn to evaluating the soundness of the argument, consider the significance of the consequences of this argument under the new mixed-split reinterpretation. First, we would have infinite iterated commitment-to-believe, just as in the pure descriptive interpretation (that’s fixed by our interpretation of B). But second, for each level of iteration of mutual commitment-to-believe, we can derive that A indicates (for each x) that proposition. But indication on this reading, unlike on the pure descriptive reading,  has normative implications. It says that when the group members have reason to believe that A, they will have reason to believe that all are committed to believe that all are committed… that all are committed to believe q. So on the split reading of the argument, we derive both infinite iterated commitment to believe, and also that group members have reason to believe that propositions that they are are committed to believe.

An instance of indication, on this reading, is a conjunction of what indication signified on the two earlier readings of the argument. This means that, for example, ITERATION on the new split reading is good just in case ITERATION on those two earlier readings was good. It can be argued for from the logic of counterfactual conditionals so long as we have transparency both of reasons for belief and of commitment to believe. It requires no separate discussion, therefore. SYMMETRY has the same status, and CLOSURE on the split reading follows from CLOSURE on the descriptive reading alone. I contend that SYMMETRY+ AND CLOSURE+ on the split reading are also perfectly acceptable. You might think there is a special issue with CLOSURE+, since it features the B-operator within an indication operator–but really, what really drives CLOSURE+ is not the details of the particular B-operator, but the general principle that indication is closed under valid arguments. And that motivates CLOSURE on the new reading as well as both of the old ones. Of all the premises, it’s only (2-4) that make claims that are stronger than those on the earlier readings. For example, (2) now says, in part, that were one to have reason to believe that A, then one would have reason to believe that everyone is committed to believe A. This is genuinely new, compared to the assumptions made in the earlier readings. But these are assumptions that are pretty plausibly met by paradigms of bases for common knowledge.

What I’ve argued is that if the pure-descriptive version of Lewis’s argument is sound, and the pure-normative version of Lewis’s argument is sound, then the mixed-split-interpretation version of Lewis’s argument is sound. The conclusion of the argument under this mixed reading scratches an epistemological itch that neither the pure descriptive reading nor the pure epistemological reading (even supplemented with assumptions of iterated rationality) could help with.

That matters to me, in particular, because I’m interested in iterated commitment-to-believe as an analysis of public information/common belief, and I take the epistemological challenge as a serious one. At first, I thought that I could wheel in the Lewis-Cubitt-Sugden proof to address my concerns. But I had two worries. One was about the soundness of that proof, given its reliance on the dubious premise (A6). That worry was expressed two posts ago, and addressed in the last post. But another was the worry raised in the current post: that on the intended reading, the Lewis-Cubitt-Sugden proof really doesn’t show that we have reason to believe all those propositions we are committed to, if we have common belief in the commitment-theoretic sense. But—I hope–all is now well, since the split reinterpretation of the fixed up proof delivers everything I need: both infinite iterated commitment to believe, and the reasonability of believing each of those propositions we are committed to believing.



An alternative derivation of common knowledge

In the last post I set out a puzzling passage from Lewis. That was the first part of his account of “common knowledge”. If we could get over the sticking point I highlighted, we’d find the rest of the argument would show us how individuals confronted with a special kind of state of affairs A—a “basis for common knowledge that Z”—would end up having reason to believe that Z, reason to believe that all others have reason to believe Z, reason to believe that all others have reason to believe that all others have reason to believe Z, and so on for ever.

My worry about Lewis in the last post was also a worry about the plausibility of a principle that Cubitt and Sugden appeal to in reconstructing his argument. What I want to do now is give a slight tweak to their premises and argument, in a way that avoids the problem I had.

Recall the idea was that we had some kind of “manifest event” A—in Lewis’s original example, a conversation where one of us promises the other they will return (Z).

The explicit premises Lewis cited are:

  1. You and I have reason to believe that A holds.
  2. A indicates to both of us that you and I have reason to believe that A holds.
  3. A indicates to both of us that you will return.

I will use the following additional premise:

  • A indicates to me that we have similar standards and background beliefs.

On Lewis’s understanding of indication, this says that if I had reason to believe that A obtained, I’d have reason to believe we are similar in the way described. It is compatible with my not having any reason to believe, antecedent to encountering A, that we are similar in this way. On the other hand, if I have antecedent and resilient reason to believe that we are similar in the relevant respects, the counterfactual will be true

That the reason to believe needs to be resilient is an important caveat. It’s only when the reasons to believe we’re similar are not undercut by coming to have reason to believe that A that my version of the premise will be true. So Lewis’s premise can be true in some cases mine is not.

But mine is also true in some cases his is not, and that seems to me a particular welcome feature, since these include cases that are paradigms of common knowledge. Assume there is a secret handshake known only to members of our secret society. The handshake indicates membership of the society, and allegiance to its defining goal: promotion of the growing of large marrows. But the secret handshake is secret, so this indication obtains only for members of the society. Once we share the handshake, and intuitively, establish common knowledge that each of us intends to to promote the growing of large marrows. But we lacked reason to believe that we were similar in the right way independent of the handshake itself.

Covering these extra paradigmatic cases is an attractive feature. And I’ve explained that we can also cite it in the other paradigmatic cases, the cases where our belief in similarity is independent of A, so this looks to me strictly preferable to Lewis’s premise.

(I should note one general worry however. Lewis’s official definition of indication wasn’t just that when one had reason to believe the antecedent, one would have reason to believe the consequent. It is that one would thereby have reason to believe the consequent. You might read into that a requirement that the reason one has to believe the antecedent has to be a reason you have for believing the consequent. That would mean that in cases where one coming to have reason to believe that A was irrelevant to your reason to believe that you were similar, we did not have an indication relation. I’m proposing to simply strike out the “thereby” in Lewis’s definition to avoid this complication–if that leads to trouble, at least we’ll be able to understand better why he stuck it in).

I claim that my premise allows us to argue for the following, for various relevant p:

  • If A indicates to me that p then A indicates to me that (A indicates to you that p).

The case for this is as follows. We start by appealing to the inference pattern that I labelled I in the previous post, and that Lewis officially declared his starting point:

  1. A indicates to x that p
  2. x and y share similar standards and background beliefs.
  3. Conclusion: A indicates to y that p.

I claim this supports the following derived pattern:

  1. A indicates to x that A indicates to x that p
  2. A indicates to x that x and y share similar standards and background beliefs
  3. Conclusion: A indicates to x that A indicates to y that p.

This seems good to me, in light of the transparent goodness of I.

A bit of rearrangement gives the following version:

  1. A indicates to x that x and y share similar standards and background beliefs
  2. Conclusion: if A indicates to x that A indicates to x that p, then A indicates to x that A indicates to y that p.

The premise here is my first bullet point. Given Lewis’s counterfactual gloss on indication, the conclusion is equivalent to my second bullet point, as required. To elaborate on the equivalence: “If x had reason to believe that A, then if x had reason to believe A, then…” is equivalent to “If x had reason to believe that A, then…”, just because in standard logics of counterfactuals “if were p, then if were p, then…” is generally equivalent to “if were p, then…”. In the present context, that means that “A indicates to x that A indicates to x that…” is equivalent to “A indicates to x that”.

[edit: wait… that last move doesn’t quite work does it? “A indicates that (A indicates B)” translates to: “If x had reason to believe A, then x would have reason to believe (if A had reason to believe A, then A would have reason to believe B)”. It’s not just the counterfactual move, because there’s an extra operator running interference. Still, it’s what I need for the proof….

But still, the counterfactual gloss may allow the transition I need. For consider the closest worlds where x has reason to believe that B. And let’s stick in a transparency assumption: that in any situation x has reason to believe p, x has reason to believe x has reason to believe p. Given transparency, at these closest worlds, x has reason to believe that she has reason to believe A, ie reason to believe that the closest world where she has reason to believe A is the world in which she stands. But in the world in which she stands Transparency entails she has reason to believe she has reason to believe p. So she has reason to believe the relevant counterfactual is true, in those worlds. And that means we have derived the double iteration of indication from the single iteration. Essentially, suitable instances of transparency for “reason to believe” gets us analogous instances of transparency for “indication”.  ]

The final thing I want to put on the table is the good inference pattern VI from the previous post. That is:

  1. A indicates that [y has reason to believe that A holds] to x.
  2. A indicates that [A indicates Z to y] to x.
  3. Conclusion: A indicates that [y has reason to believe that Z] to x.

This looked good, recall, because the embedded contents are just an instance of modus ponens when you unpack them, and it’s pretty plausible in worlds where x has reason to believe the premises of modus ponens, then x has reason to believe the conclusion—which is what the above ends up saying. (As you’ll see, I’ll actually use a form of this in which the embedded clauses are generalized, but I think that doesn’t make a difference).

This is enough to run a variant of the Lewis argument. Let me give it to you in a formalized version. I use \Rightarrow_x for the “indicates-to-x” relation, and B_x for “x has reason to believe”.  I’ll state it not just for the two-person case, but more generally, with quantifiers x and y ranging over members of some group, and a,b,c ranging over propositions. Then we have:

  1. \forall x (A\Rightarrow_x \forall yB_y(A)) (the analogue of Lewis’s second premise, above).
  2. \forall x (A \Rightarrow_x Z) (the analogue of Lewis’s third premise, above)
  3. \forall x ([A \Rightarrow_x Z]\supset [A \Rightarrow_x(\forall y[A\Rightarrow_y Z])] (an instance of the formalization of the bullet point I argued for above).
  4. \forall x [A \Rightarrow_x(\forall y[A\Rightarrow_y Z])] (by logic, from 2,3).
  5. \forall x [A \Rightarrow_x(\forall yB_y (Z))] (by inference pattern VI, from 1,4).

Line 5 tells us that not only does A indicate to each of us that Z (as Lewis’s premise 2 assures us) but that A indicates to each of us that each has reason to believe Z. The argument now loops, by further instances of the bullet assumption and inference pattern VI, showing that A indicates to each of us that each has reason to believe that each has reason to believe that Z, and so on for arbitrary iterations of reason-to-believe.

As in Lewis’s original presentation, the analogue of premise 1 allows us to detach the consequent of each of these indication relations, so that in situations where we all have reason to believe that A holds, we have arbitrary iterations of reason to believe Z.

(To quickly report the process by which I was led to the above. I was playing around with versions of Cubitt and Sugden’s formalization of Lewis, which as mentioned used the inference pattern that I objected to in the last post. Inference pattern VI is what looked to me the good inference pattern in the vicinity—the thing that they label A6, and the bullet pointed principle is essentially the adjustment you have to make to another premise they attribute to Lewis—one they label C4—in order to make their proof go through with VI rather than the problematic A6. From that point, it’s simply a matter of figuring out whether the needed change is a motivated or defensible one). 

So I commend the above as a decent way of fixing up an obscure corner of Lewis’s argument. To loop around to the beginning, the passage I was finding obscure in Lewis, had him endorsing the following argument (II):

  1. A indicates that [y has reason to believe that A holds] to x.
  2. A indicates that Z to x.
  3. x has reason to believe that x and y share standards/background information.
  4. Conclusion: A indicates that [y has reason to believe that Z] to x.

The key change is to replace II.3 with the cousin of it introduced above: that A indicates to x that x and y share standards/background information. Once we’ve done this, I think the inference form is indeed good. Part of the case for this is indeed the argument that Lewis cites, labelled I above. But as we’ve seen, there’s seems to be quite a lot more going on under the hood.

Lewis on common knowledge

The reading for today is chapter II, section 1 of Convention.

In it, Lewis discusses a state of affairs, A, “you and I have met, we have been talking together, you must leave before our business is done; so you say you will return to the same place tomorrow.” Lewis notes that this generates expectations and higher order expectations: “I expect you to return. You will expect me to expect you to return. I will expect you to expect me to expect you to return. Perhaps there will be one or two orders more”. His task is to explain how these expectations are generated.

We’ll just be looking at the first few steps of his famous proposal, which are framed in terms of reasons to believe. It has three premises:

  1. You and I have reason to believe that A holds.
  2. A indicates to both of us that you and I have reason to believe that A holds.
  3. A indicates to both of us that you will return.

“Indication” is defined counterfactually: A indicates to someone x that Z iff if x had reason to believe that A held, x would thereby have reason to believe that Z. Lewis notes that indication depends on “background information and inductive standards” of the agent in question. The appeal to inductive standards might suggest a somewhat subjective take on epistemic reasons is in play here, but even if you think epistemic reasons are pretty objective, the presence or absence of a belief in defeaters to inductive generalizations, for example, will matter to whether that counterfactuals of this form are true.

(I’m not sure about the significance of the “thereby” in this statement. Maybe Lewis is saying that the reason for believing that A held would also be the reason for believing that Z is the case. I’m also not sure whether or not this matters).

There follows a passage that I have difficulty following. Here it is in full.

“Consider that if A indicates something to x, and if y shares x’s inductive standard and background information, then A must indicate the same thing to y. Therefore, if A indicates to x that y has reason to believe that A holds, and if A indicates to x that Z, and if x has reason to believe that y shares x’s information, then A indicates to x that y has reason to believe that Z (this reason being y’s reason to believe that A holds)”.

In this passage, we first get the following inference pattern (I):

  1. A indicates p to x.
  2. x and y share standard/background information.
  3. Conclusion: A indicates p to y.

That seems fair enough.

Following the “therefore”, we get the following inference (II):

  1. A indicates that [y has reason to believe that A holds] to x.
  2. A indicates that Z to x.
  3. x has reason to believe that x and y share standards/background information.
  4. Conclusion: A indicates that [y has reason to believe that Z] to x.

This is a complex piece of reasoning, and it’s relation to the earlier inference pattern is not at all clear. For example, in the first inference pattern, facts about shared standards are mentioned. In the second, what we have to work with is x having reason to believe that there are shared standards. This prevents us directly applying argument I to derive argument II. Some work needs to be done to connect these two.

Given the validity of the first pattern you can plausibly argue for the goodstanding of the following derived pattern (III):

  1. x hass reason to believe that A indicates p to x.
  2. x has reason to believe that x and y share standard/background information.
  3. Conclusion: x has reason to believe that A indicates p to y.

Now III.2 is also II.3, so we now hope to connect the arguments. But the other two premises are not facts about what x has reason to believe, as they would have to be in order to apply III directly. Rather, they are facts about what A indicates to x.

We need to start attributing enthymetic premises. Perhaps there is a transparency assumption, namely that IV is valid:

  1. A indicates p to x.
  2. x has reason to believe that A indicates p to x.

IV allows us to get from II.2 to the claim that x has reason to believe that A indicates Z to x. And you can then use II.3 to supply the remaining premise of inference pattern III. What we get is the following: x has reason believe that A indicates Z to y. And so we could argue that argument II was a good one, if the following inference pattern was good  (V):

  1. A indicates that [y has reason to believe that A holds] to x.
  2. x has reason to believe that [A indicates Z to y].
  3. Conclusion: A indicates that [y has reason to believe that Z] to x.

The conclusion of V is the same as that of II. V.1 is simply II.1, and we have seen that III and IV get us from II.2 and II.3 to V.2. So the validity of V suffices for the validity of II. So what do we think about inference pattern V?

V is, in fact, an inference pattern that Cubitt and Sugden, in their very nice analysis of Lewis’s argument, take as one of the basic assumptions (they give it as a material conditional, and label it A6). It seems really dubious to me however.

The reason that it looks superficially promising is because the three embedded claims  constitute a valid argument, and the embedding contexts looks like we’re reporting the validity of this argument “from x’s perspective”. The embedded argument is simply the following: If y has reason to believe that A holds, and A indicates Z to y, then y will have reason to believe that Z holds. Given the way Lewis defined indication in terms of the counterfactual condition, this is just a modus ponens inference.

Now this would be exactly the right diagnosis if we were working not with V but with VI:

  1. A indicates that [y has reason to believe that A holds] to x.
  2. A indicates that [A indicates Z to y] to x.
  3. Conclusion: A indicates that [y has reason to believe that Z] to x.

VI really does look good, because each premise tells us that the respective embedded clause is true in all the closest worlds where x has reason to believe that A holds. And since the final embedded clause follows logically from the first two, it must hold in all the closest worlds where x has reason to believe A. And that’s what the conclusion of VI tells us is the case.

But this is irrelevant to V. V.2 doesn’t concern what x has reason to believe some counterfactual worlds, but what they have reason to believe in the actual world. And for all we are told, in the closest worlds where x has reason to believe that A is the case, they may not have reason to believe some of the things they actually have reason to believe. That is: A might be the sort of thing that defeats x’s reason to believe that A indicates Z to y. So this way of explaining what’s going along fails.

So I’m not sure how best to think about Lewis’s move here. The transition he endorses between I and II really isn’t transparently good. A natural line of thought leads us to think of him resting on Cubitt and Sugden’s reconstructed premise A6, V above. But that really doesn’t look like something we should be relying on.

Is there some other way to understand what he has in mind here?

Conventions vs. ideal theory

I’m often in the market for a metaphysics of semantic properties of language. And what I’m shopping for is the best instance I can find of “top down” or “interpretationist” accounts. These have a two-step pattern. First: you identify a target set of pairings of sentences with some kind of semantic property. Second: you give a story about how those target pairings select the correct interpretation of all linguistic expressions.

One instance of this is a form of Lewis’s conventionalism. On this story, there is a collection of sentences-in-use, X, and for each of these, there are conventions of truthfulness and trust that link each S in X to a specific proposition p (construed as a set of possible worlds). That gives you the target pairings that we need for Fit. Truthfulness is a conventionally-entrenched regularity of uttering S only if one believes p, and trust is a similarly entrenched regularity of forming the belief p if one hears S uttered.

In the selection step, the correct interpretation of the whole language—not just sentences but individual words, and sentences that are never used—is fixed as the *simplest* interpretation that “fits with” the pairings.

Clearly, there’ll be many difficulties and nuances in spelling this out, but for my purposes here, I’ll assume that to fit with a set of sentence-propositions pairs, the selected interpretation needs to mapping the first element of each such pair to the second element—so that the sentence, according to the interpretation, expresses the proposition it is conventionally associated with. I’ll also assume that considerations of fit take lexical priority over considerations of simplicity, so simplicity’s role is just to select among fitting interpretations.

Here’s another instance of the two-step pattern. On this story, from language-use we find a privileged set of sentences, an “ideal theory”. The pairing of sentences with semantic properties this induces is just the following: every sentence in the ideal theory is paired with The True. The second step is as before: the correct overall interpretation is the simplest theory that fits with these pairings.

The latter is the kind of story that sometimes goes under the label “global descriptivism” and is associated with Lewis’s response to Putnam. There’s controversy about whether it ever was a view that Lewis endorsed. In that context, the appeal to simplicity in the selection story is replaced by an appeal to naturalness or eligibility. I think that these amount to the same thing, given Lewis’s understanding of simplicity. But I won’t argue or further explain that here.

Are these stories compatible? Might they amount to the same thing? (I’m grateful to discussions here with Fraser MacBride that prompted these questions). This all depends on what ideal theory amounts to. Consider the set of sentences-in-use, X. At every world w, the set of sentence-proposition pairs induces a map from X to truth values. Let D_w be a complete world-description of w in some privileged world-making sublanguage L. Let the I_w be a set of sentences of the form “Necessarily, if D_w, then S”, for each S which is mapped to the True at w. Let I be the union of the I_w for arbitrary w.

Consider an interpretation that fits with the sentence-proposition pairs. This will make a S sentence in X true at w iff  “Necessarily, if D_w, then S” is in I—that’s guaranteed by the way we constructed I. Does that interpretation make the sentence true, as truth-maximization would demand? Yes it does, on the condition that the interpretation is correct for the world-making sublanguage and “necessarily” and the connectives “if… then…” and “not”. In those circumstances the antecedent of the conditional is true only at world w,  since S is true at w, the strict conditional is true.  Conversely, suppose that we have an interpretation that makes true all of I, and which is again faithful to the world making language, “necessarily” and “if then”. Since the “Necessarily, if D_w, then S” is I, it is made true, and that requires that the sentence S is true at w. In sum: making this particular “ideal theory” true is equivalent to fitting with the sentence-proposition pairs from which it is built—though only among a restricted range of interpretations that are already guaranteed to be faithful to the worldmaking language, etc.

I’m not sure that the need to be faithful to the worldmaking language is too big a deal, on this repackaging. One way of thinking about this is that we start with a set of expressions to interpret—a certain signature \sigma. The sentences-in-use S are included within this set. Then we as theorists consider an expanded signature \sigma^+, which we get by adjoining a new set of terms (the necessity operator, the conditional, the worldmaking vocabulary) for which we explicitly stipulate an interpretation. Using the expanded signature, we build the ideal theory, and then indirectly get a fix on the correct interpretation of the original signature by requiring that the ideal theory in the expanded signature be true. Since we have stipulated the interpretation of the added vocabulary, we introduce no new parameters.

In the above, I started with sentence-proposition pairs fixed by convention and extracted an “ideal theory”. One could reverse the process, if our ideal theory already consists in a bunch of strict conditionals whose antecedents are world descriptions and whose consequents are sentences in a set X. It’ll help if we assume the ideal theory is X-complete in the sense that for each world-description D_w, and sentence S in X, either “Necessarily, if D_w, then S”  or “Necessarily, if D_w, then not-S” is in the set. Each S in X can now be paired with the proposition consisting of all the worlds w such that  “Necessarily, if D_w, then S” occurs in the ideal theory. The same reasoning as before will show that maximizing the truth of ideal theory is equivalent to fitting the sentence-proposition pairs.

If we wanted to give a metasemantics that incorporated this second direction of explanation, then we have some additional challenges. We need an independent fix on what it takes for a conditional “Necessarily, if D_w, then S” to be included in ideal theory. Answers could be given: for example, we could say that such a conditional is included in the ideal theory relavant to interpreting agent x iff that agent is disposed to endorse sentence S, conditional on believing the world to satisfy D_w. I think of this as a Chalmers-style approach to these questions, though I haven’t yet done the work of going back to pin down how it relates to the manifold distinctions and projects included in his book “Constructing the World”. Here again, the actual language to be interpreted might not include the world-making vocabulary—that could be reserved to the theorist. But in this case, in giving the story about constructing the ideal theory, the theorist needs to use that vocabulary in specifying part of a psychological state of an individual–a possible belief state. So to apply this, even in principle, we would need some independent fix what it takes for an individual to have propositional attitudes with contents corresponding to elements of the world-making language.

In Chalmers, we find the suggestion that there is a privileged set of concepts whose meaning is fixed by acquaintance, in a thick, Russellian sense. So one option would be to run the above acquaintance-based story as the metasemantics for a basic chunk of the language of thought, and then run (scare-quotes) “global” descriptivism for the rest of that same language.

There is a more Lewisian way to run the story though. Here we will firmly distinguish between interpreting public language and ascribing mental content. The convention-based story notoriously leans heavily on this, anyway. Our starting point for the metasemantics of public language will be a fix on the psychological states of individuals sufficiently rich to make sense of psychological states whose content we theorists describe using the world-making language. Dispositions of subjects to endorse public-language sentences under those conditions then look like legitimate resources for us to use. And using them, we can give a principled characterization of an ideal theory which (as argued previously) will be equivalent to fitting with certain sentence-proposition pairs.

So truth-maximization (of certain strict conditional sentences) and proposition-fit maximization do seem compatible if the targets are related in the right way–even equivalent. And it may even be that what we get from looking to the sentence-proposition pairs fixed by Lewisian conventions is the same as sentence-proposition pairs extracted from an ideal theory constructed by the above method, and vice versa—at least for subjects who were within a community where conventions of truthfulness and trust prevailed in the first place. That would, however, take further argument.

An interpretation of Lewis that I’ve favoured elsewhere was that he really believed in the convention-based metasemantics, and the stuff about global descriptivism and truth-maximization was just something adopted for dialectical purposes in the context of a discussion with Putnam (this is something that Wo Schwarz has pushed). A lot of time in the literature one finds the global descriptivist/truth-maximizing theory being worked with, but with “ideal theory” being handled fairly loosely—when I do this myself, for example, I think of it as something like a global folk theory of the world. But given the above, I guess one interpretative option here is that Lewis had in mind the sort of equivalences described above, and so was happy to discuss the account in either formulation.

And here’s a final thought about this. Though dispositions-to-endorse might line up with conventions where such conventions exist, it’s pretty clear that subjects can have dispositions to endorse sentences even where the conditions for conventionality are violated. So one way of presenting this is that the ideal theory characterization, grounded in dispositions-to-endorse, is a general metasemantics for language that coincides *in the limit where there are conventions* with Lewis’s convention-based story, but which has far wider application. The prospect of that kind of generalization seems to me a good reason to look closer at ways to characterize this kind of ideal theory metasemantics and study its relation to convention.

What’s functionalism anyway?

In reading up for my new project on Group Thinking, I’ve found that people attaching a certain label to a view of the metaphysics of group belief and desire that I find quite attractive. That label is “functionalism”. I’ve found myself very confused about what that common label means, so what follows is where I’ve got to in sorting that out.

Now, at a really rough level, I expect anything deserving the name “functionalism” to have at least two theoretical categories: roles and realizers. For example, if you’re going to be a functionalist about the property being in pain, you’ll be committed to (i) the idea that there is a functional role associated with pain; (ii) if anything is to be in pain, then it needs to have a realizor property i.e. to instantiate a property that plays the functional role.

That allows us a lot of flexibility on how we flesh out the details beyond this. We might have various accounts of what sort of theories of functional roles to give. We might have various accounts of what the realization relation is—and whether we need to allow for multiple realisors, imperfect realizers, etc etc. We might differ in whether we identify the original property of being in pain with the role, the realizor, or something else. But unless we have an account that has the two part structure, it isn’t functionalism as I was taught it or as I teach it.

Okay, with that as the setup, let me say something about the kind of functionalism that I understand best. This starts with Lewis’s story about how to find explicit definitions of theoretical terms. We start with a theory that neologizes—that introduces a set of terms for the first time. That theory will also reuse some old vocabulary. Lewis assumed that the theory is regimented so that all the new terms are names. The old vocabulary will include predicates like “…has the property…” or “…stands in relation …. to …”, if necessary, so that we can do the work of new predicates by means of new names for the relevant properties. If we start with a theory T(t_1,...,t_n), where t_i are the old terms, then the following is the unique-realization sentence for T:

\exists y_1\ldots \exists y_n \forall x_1\ldots x_n(T(x_1,...,x_n)\leftrightarrow (x_1=y_1\wedge \ldots \wedge x_n=y_n))

The following one-place predicate is then what we’ll mean by “the theoretical role of t_1“, or the “t_1“-role:

\exists y_n\ldots \exists y_n \forall x_1\ldots x_n(T(x_1,...,x_n)\leftrightarrow (x_1=y_1\wedge \ldots \wedge x_n=y_n))

The explicit definition of the new terms in old vocabulary that Lewis offered was just as the property that played the relevant theoretical role. Using an iota for the definite description operator, for t_1 the definition is:

t_1:=\iota y_1\exists y_2 \ldots \exists y_n \forall x_1\ldots x_n(T(x_1,...,x_n)\leftrightarrow (x_1=y_1\wedge \ldots \wedge x_n=y_n))

Informally, the definition says that t_1 is the property that plays the t_1-role.

Now, Lewis proves several nice results about these definitions and their relation to the original theory T, using a certain understanding of the definite description operator. I won’t get into that here.

One last thing that will be important: the definite description on the right hand side of the definition sentences is, in general, a non-rigid designator. Since T may be uniquely realized by definite tuples of properties in different worlds, the definite description will in general pick out different properties at different worlds. And sometimes—with empirical investigation—we will be able to say something informative about the property that happens to be picked out at the actual world. For some name N in our old vocabulary, rigidly designating a property, we may discover:

\exists y_2 \ldots \exists y_n \forall x_1\ldots x_n(T(N,x_n,...,x_n)\leftrightarrow (x_1=y_1\wedge \ldots \wedge x_n=y_n))

From this and the definition sentence, it will follow that:


So here we have a model for how the identification of new theoretical terms with old, familiar terms could go. In these circumstances we would call N the realizer of the $t_1$-role at the actual world. In general, N_w will be the realizer of this role at world w iff the following holds at w: \exists y_2 \ldots \exists y_n \forall x_1\ldots x_n(T(N_w,x_n,...,x_n)\leftrightarrow (x_1=y_1\wedge \ldots \wedge x_n=y_n))

It’s up for debate whether t_1 is a rigid or non-rigid designator. If it’s a rigid designator, then t_1=N will be necessary if true, but the definition sentence will be contingent (presumably, an example of the contingent a priori). t_1 could equally be taken to be non-rigid, allowing the definition sentence to be necessarily true (as well as apriori). In that case, t_1=N will be non-rigid (as well as a posteriori). It seems we could go either way on this, consistent with the rest of the framework.

I’ve introduced both role and realizer terminology in connection to the Lewis account of the definitions of theoretical terms. It is the model for how I understand role and realizor terminology in the context of functionalism. However, discussion of theoretical neologisms is one thing, and discussion of “functional” vocabulary is another. Lewis’s topic in “how to define theoretical terms” is the former, and comes, and that gives us a particular take on the way that theory and definition sentences relate. For Lewis, the definitions are “implicitly asserted” when we put forward T as a term-introducing theory—presumably we’re doing something that’s equivalent to stipulating that they are to be (a priori) true. This is not an account that can be directly applied to terms—theoretical or otherwise—that are already in common currency. It is not an account, for example, of “pain”. In the case of pain, if “definitions” are to be offered, they have to be offered as a product of analysis, not as the product of stipulation. 

Let’s turn, therefore, to a context where we are working only with terms that are already common currency. And let’s suppose that we have found a theory T, such that for a suitable set of target vocabulary t_1,\ldots,t_n, both T(t_1,\ldots,t_n) and the unique realization sentence is true. The following will be true:

t_1:=\iota y_1\exists y_2 \ldots \exists y_n \forall x_1\ldots x_n(T(x_1,...,x_n)\leftrightarrow (x_1=y_1\wedge \ldots \wedge x_n=y_n))

We shouldn’t call these “definition sentences” since it’s not clear in what sense if any they are “definitions”. To highlight this, note that as a limiting case, our “theory” could simply consist in saying “Red is Arnold’s favourite colour”, with Red as the target vocabulary . The unique realization sentence is then that there is an y such that for all x, x is Arnold’s favourite colour, iff x=y—which is true enough. And the putative “definition sentence” would say: Red is the y such that for all x, x is Arnold’s favourite colour iff y=x. But though this is is a true identity, this is quite clearly not a “definition” of the term Red, and is obviously contingent and a posteriori.

Not any old uniquely-realized theory of target old vocabulary will do, therefore. I take it that the step to an “analytic” functionalism of a Lewisian sort imposes the following constraint: we take an analytic/apriori T(t_1,\ldots,t_n). Now if, in addition, the unique realization sentence for this vocabulary is analytic/apriori, then the “definition sentences” will be analytic/apriori. Even if the unique realization sentence is not analytic/apriori, then the conditional whose antecedent is the unique realization sentence and whose consequent is a definition sentence will be analytic/apriori. So we could plausibly claim the definition sentences as “an analysis” of the relevant target vocabulary–perhaps an analysis modulo the assumption of unique realization.  The conjecture, for the special case of analytic functionalism about pain, etc, will be that we could pull off this trick by letting T be systematization of a set of a priori “platitudes” that uniquely characterize the typical causal role of the property of being in pain in causing distinctive kinds of behaviour, and being caused by distinctive kinds of stimuli, and which interacts with other (targeted) mental states in typical kinds of ways.

The assumption that we can find an (a priori) theory T that does the job just described is a major one. But if we can do it, then we can import all the distinctions and terminology from the theoretical terms case. We will have a one-place predicate that is a “theoretical role” for the target term “pain”—which given the nature of the T we’re envisaging we could aptly call a causal-functional role of “pain”. We would be up for discovering that the role is satisfied by a property rigidly designated by some N—say, C-fibres firing. And we could reason, in the fashion Lewis and Armstrong taught us, from the “definition sentence” for pain, plus the putative empirical fact that C-fibres play the pain role, to an identification of the property of being in pain with having one’s C-fibres fire.

So that’s the way I understand analytic functionalism. And I can understand other forms of functionalism as variations on the theme. For example, we could start with a metaphysically necessary (but not analytic or a priori) theory which necessarily uniquely characterizes a set of target vocabulary, and extract definition sentences from it, obtaining necessarily true (but not analytic or a priori) “definition sentences” that we might go on to present as counting as “metaphysical analyses”. We could take a scientific theory—a theory which uniquely characterizes a set of target terms with nomic necessity, and then extract “nomic analyses”, and so forth. In each case, distinctive functionalist structure of role and realizer, and the relation between them, will be well understood. If functionalism is to be amended (e.g. to allow for imperfect realization, or non-unique realization) then I will want to figure out how to adjust the above theory to make the necessary changes.

It’s one thing to say that functionalisms can be represented as an instance of the how-to-define-theoretical-terms model of extracting definitions from theories. It’s quite another to say that every successful application of that model to common currency terms would be a functionalism. That further claim seems false to me.

For example, suppose we applied this kind of account to a term that for which we already have an analysis ready-to-hand: the property of being a bachelor. An a priori uniquely characterizing theory  says (let’s suppose): bachelorhood is the property of being male and being umarried. So the “definition sentence” here is: bachelorhood is the y such that for all x, x is the intersection of being male and being unmarried iff y=x. What of the role and realizor properties here? The role property is being a y such that for all x, x is the intersection of being male and being unmarried iff y=x. What’s the realizer property?

Well, here’s a way of specifying a property that realizes the role in the minimal sense in which I introduced the terminology earlier: being a bachelor. Here’s another: the property that is the intersection of being unmarried and being male. But this seems dreadfully fishy. It doesn’t seem illuminating in the way typical identifications of realizors of functional roles would and should be. It might be true to say that pain realizes the pain role, and that the property of actually playing the pain role realizes the pain role. But in that paradigm case of functionalism what we are really interested in, and trust to be available, is some more illuminating characterization: e.g. that C-fibres firing plays the pain-role. And what we see from the bachelorhood case, I think, is that it’s entirely possible to apply all this analysis and for there to be no such illuminating identification of the realizor to be given at the end of the day.

To sum this up. In the paradigm cases of functionalism, we expect a two-step methodology. There’s first the step of identifying a relevant uniquely characterizing theory, from which by turning a crank we can extract “functional roles”. And then, we expect a second stage, where we or others do further non-trivial work (in the paradigm cases, empirical work) that gives us an illuminating way of identifying the realizors of those roles, using a vocabulary that differs from that used in characterizing the role itself. The realizors will be some relatively natural “kind” or natural enough property, relative to a somehow-privileged vocabulary. In the paradigm functionalisms, there’s also a suitable distance between the vocabulary used to specify the role, and the vocabulary used in the illuminating identification of the realizor.

Here’s a way of thinking about all this. There’s a genus-level notion of role and realizor here, which we find in functionalism, in understanding theoretical neologisms, and so forth. But in order to have a functionalism worthy of the name, we need more than such minimal roles and realizors—we need roles that are genuinely “functional” and which contrast sufficiently with their natural-enough “realizors”. That vague characterization is probably enough for us to get on with the hard work of finding examples that fit this bill. 

But if this is the right way to think of things, then we should resist the thought that whenever we extract definitions  from a theory in the Lewis-style, that we’re engaged in functionalist analysis. And I definitely want to resist the thought that in undertaking that first kind of project, we are committed to there being “realizors” of the theoretical roles used in those definitions in a more-than-minimal sense. Sometimes, perhaps, it will follow from the content of the characterizing theory that realizors of the roles will be more-than-minimal—e.g. perhaps that role is a causal one, and we are independently committed to thinking that only sufficiently natural properties can stand in causal relations. Perhaps part of the characterizing theory itself is the claim that the relevant property is natural enough. That might guarantee that if successful, the analysis will turn out to be a functionalist one. But this needs to be argued out on a case by case basis.

To go back to the beginning: when people talk about functionalist analyses of believing that p and desiring that q, whether in application to groups or individuals, I think that often what they’re picking out are definitions of belief and desire that are extracted from an overall theory of belief and desire in the “theoretical role” way. But it’s a huge step from that to assume that one is committed to full-blown functionalism about belief and desire, with its more-than-minimal realizors of the roles so-characterized. I think it’s misleading to label accounts that aren’t committed to more-than-minimal realizors as kinds of functionalism, and I think that’s one reason that I got myself puzzled at the way the terminology is (sometimes) used in this area.



Nature of Representation book draft

… is now fully in being. This is a much reworked version of the themes of the series of blog posts below, themselves a distillation of work over the last five years.

NoR 4.5: the base–words, population, convention.

This is one of a series of posts setting out my work on the Nature of Representation. You can view the whole series by following this link

Previous posts in this subsequence have taken convention as basic, and worked forward from that to an account of languages in use, correct compositional interpretation, attitude expressed, and the like. In this post, I’m going to outline the account of base facts, to hook this account of layer-3 representational facts back in to the facts about mental representation established at layer-2. I’ll also sketch how (in joint work with Gail Leckie) we have proposed extending this account to give a treatment of some other elements of the “base” for selecting the correct linguistic interpretation: the words that are interpreted, and the language-using population.

Lewis’s account of convention was as follows. A regularity R is a convention in a population P iff within P, the following hold, with at most a few exceptions:

  1. Everyone in P conforms to R.
  2. Everyone in P believes that everyone in P conforms to R.
  3. This belief gives everyone in P a good reason to conform to R himself.
  4. There is a general preference in P for general conformity to R rather than slightly-less-than-general conformity to R
  5. There is an alternative possible regularity R’ such that if it met (1) and (2), it would also meet (3) and (4)
  6. All of (1-5) are common knowledge.

The relevant regularities, generalized to allow for states of acceptance of enriched content, are the following:

  • (Truthfulness) Members of P utter s only if they accept p, where L(s)=p.
  • (Trust) If a member of P hears another member of P utter s, she tends to come to accept p, where L(s)=p.

And so what naturally suggests itself is the following account the linguistic “source intentionality”, the language-in-use appealed to in our previous discussions:

  • (Lewis) Given an exogenously fixed specification of population P1 and typing of sentences, T1, L is the language of P1 for T1 iff there are conventions of (Truthfulness) and (Trust) in L in P1 for T1

Let me note some features of this. First, the characterization of convention is full of appeals to attitudes of members of the population: their beliefs and preferences, together with normative facts about reasons for conforming to a regularity. Together with Truthfulness and Trust and the way that they appeal to psychological facts about agents, clearly the work done to ground belief/desire and other facts about mental representation are being drawn on heavily at this point.

I am not going to engage in detail with the various worries one might have about this account of convention, or the modifications one might introduce to evade it. It doesn’t really matter to me whether this is a good account of convention in general, so long as it’s a good characterization of the features of regularities in language use that feed into linguistic source intentionality. And any other characterization of convention that appealed to intentional resources and delivered the same results on our target cases would do just as well, at least to this point. But just as my previous handling of mental source intentionality, my interest will be on extending the scope of the appeals to convention.

The need for extension is prompted by the appeal, in the account as currently formulated, for exogeneous typing of sentences and identification of language-using populations. But we don’t get facts about sentence-types of language using populations for free. But what grounds facts concerning when two blasts of sound are of the same sentence type, or when two people belong to a single language-using population? As is familiar in the specialist literature on this, it’s extremely implausible that we have any way to identify sentence-types by types of shapes or sounds (for an excellent review of problem cases and the relevant literature, Nick Tasker’s PhD thesis and papers should be a first port of call). The worry is that there’s no way to pick out sentence-types independently of semantic facts. What other than semantic facts makes ambiguous homophones/homographs “bank” and “bank” distinct words? It is no easier to imagine independent way of picking out a population that uses a single language, except by the fact that they are all users of that very language. But of course, the latter is a semantic fact that could not feature in a exogeneous characterization of populations (I’m grateful to Leeds’ Roger White for alerting me to that to several years ago).

Leckie and I suggest a different model:

  • (Endogenous) Given an utterance u, <P, T, L> is a language in use in utterance u iff P is a population and T a typing relation relative to which there are conventions of (Truthfulness) and (Trust) in L, and the speaker/hearer of u is a member of the population P; and u is a member of some equivalence class of the typing relation T.

Instead of determining L after fixing a particular population and typing relation, (Endogenous) treats the population and typing relation as variables whose values are fixed however is necessary to produce conventions of (Truthfulness) and (Trust). The correct word typing for English is as described by the T role in a language-in-use for a population that includes the utterance I am presently making. The membership of the language-using population of which I am a part is as described by the P-role in that same language-in-use. And finally, the content-sentence pairings that constitute linguistic source intentionality for English can be read off that same triple.

It is important to understand that endorsing this account does not foreclose saying other, more immediately illuminating things about words and populations. If you thought you had an exogeneous way of specifying a language-using population and a typing relation that feature in linguistic conventions, then all the better for (Endogeneous)—that typing relation and population will be an illuminating independent specification of a typing relation that features in a language-in-use, according to our formulation. But of course, pessimism on that front motivated the shift to this one. But it’s much more plausible is that one could, via appeal to semantic facts, give a more illuminating characterization of the language-using population and a typing relation. For example, Nick Tasker’s PhD dissertation an intriguing account of the nature of words is offered, building on work in the metaphysics of artefacts by Amie Thomasson. An account of word-individuation (or at least, various necessary and sufficient conditions) is offered as part of the package, built on the more general model of individuation of artefact-kinds. But Tasker is clear from the start that among the determinants of word-individuation for him are facts about the semantic properties of the individual word tokens, their recognizability to a certain audience, and so forth. Tasker’s account might be exactly what we need to understand how words work, but also entirely unsuitable to be slotted in as an “exogeneous” account of word-individuation as per the original model. But so long as word-types as he characterizes them figure in linguistic conventions, his account is consistent with (Endogenous).

In sum: since the reductive characterization of words and populations is given by (Endogenous) and not by an exogenous characterization, the project of saying interesting things about types and populations that figure in languages in use doesn’t have to be burdened by any reductive constraint. Metaphysically speaking, the bounds of the population, the relevant types, and the contents conventionally associated with sentences, are all jointly and simultaneously grounded in facts about patterns of linguistic usage and attitudes of speakers and hearers.

The worry about this kind of account is not that it’ll fail to count genuine sentences and language-using populations as sentences and populations. The worry to have is that it will overgenerate. After all, by choosing crazy typing relations and gerrymandered populations, we may be able to find all sorts of dubious regularities connecting uses of sentences (so typed) to attitudes. In the Leckie/Williams paper, we consider a number of different ways this might happen, for example, by subdividing genuine populations and types (typing utterances by brown-eyed people separately from blue-eyed people); merging separate types together, or tailoring the population or typing so as to bias the resulting regularity (e.g. by restricting it to population who apply “red” to more orangey things than is the norm). Our strategy in response is to work through such examples, and argue that none of them produces a genuine example of overgeneration. They are gerrymandered regularities of truthfulness and trust, sure—but we argue, they each violate one or more clauses of the characterization of convention Lewis gave.

Suppose the Leckie/Williams project succeeds. Then revised characterization of “language in use” means that we remove the need to list in addition the typing of sentences and the identification of populations as among our the base facts of the metaphysics of linguistic representation. And with that, the last tie between the layers of representation has been put in place.