Comparative conventionality

The TL;DR summary of what follows is that we should quantify the conventionality of a regularity (David-Lewis-style) as follows:

A regularity R in the behaviour of population P in a recurring situation S, is a convention of depth x, breadth y and degree z when there is a recurring situation T that refines S, and in each instance of T there is a subpopulation K of P, such that it’s true and common knowledge among K in that instance that:

(A) BEHAVIOUR CONDITION: everyone in K conforms to R
(B) EXPECTATION CONDITION: everyone in K expects everyone else in K to conform to R
(C) SPECIAL PREFERENCE CONDITION: everyone in K prefers that they conform to R conditionally on everyone else in K conforming to R.

where x (depth) is the fraction of S-situations which are T, y (breadth) is the fraction of all Ps involved who are Ks in this instance, and z is the degree to which (A-C) obtaining resembles a coordination equilibrium that solves a coordination problem among the Ks.

From grades of conventionality so defined, we can characterize in the obvious way a partial ordering of regularities by whether one is more of a convention than another. What I have set out differs in several respects from what Lewis himself proposed along these lines. The rest of the post spells out why.

The first thing to note is that in Convention Lewis revises and re-revises what it takes to be a convention. The above partial version is a generalization of his early formulations in the book. Here’s a version of his original:

A regularity R in the behaviour of a population P in a recurring situation S is a convention if and only it is true that, and common knowledge in P that:

(A) BEHAVIOUR CONDITION: everyone conforms to R
(B) EXPECTATION CONDITION: everyone expects everyone else to conform to R
(C) SPECIAL PREFERENCE CONDITION: everyone prefers that they conform to R conditionally on everyone else conforming to R.

where (C) holds because S is a coordination problem and uniform conformity to R is a coordination equilibrium in S.

A clarificatory note: in some conventions (e.g. a group of friends meeting in the same place week after week) the population in question are all present in instances of the recurring situation. But in others—languages, road driving conventions—the recurring situation involves more or less arbitrary selection of pairs, triples, etc of indiviuduals from a far larger situation. When we read the clauses, the intended reading is that the quantifiers “everyone” be restricted just to those members of the population who are present in the relevant instance of the recurring situation. The condition is then that it’s common knowledge instance-by-instance *between conversational participants* or *between a pair of drivers* what they’ll do, what they expect, what they prefer, and so on. That matters! For example, it might be that strictly there is no common knowledge at all among *everyone on the road* about what side of the road to drive on. I may be completely confident that there’s at least one person within the next 200 miles not following the relevant regularity. Still, I may share common knowledge with each individual I encounter, that in this local situation we are going to conform, that we have the psychological states backing that up, etc. (For Lewis’s discussion of this, see his discussion of generality “in sensu diviso” over instances).

Let me now tell the story about how Lewis’s own proposal arose. First, we need to see his penultimate characterization of a convention:

A regularity R in the behaviour of P in a recurring state S, is a perfect convention when it’s common knowledge among P in any instance of S that:

(A) BEHAVIOUR CONDITION: everyone conforms to R
(B) EXPECTATION CONDITION: everyone expects everyone else to conform to R
(C) SPECIAL PREFERENCE CONDITION: everyone prefers that they conform to R conditionally on everyone else conforming to R.
(D) GENERAL PREFERENCE CONDITION: everyone prefers that anyone conform to R conditionally on all but one conform to R.
(E) COOPERATION CONDITION: everyone has approximately the same preferences regarding all possible combinations of actions
(F) There exists an alternative regularity R* incompatible with R, which also meets the analogue of (C) and (D).

The explicit appeal to coordination problems and their solution by coordination equilibria has disappeared. Replacing them are the three clauses (D-F). In (D) and (E) Lewis ensures that the scenario resembles recurring games of pure cooperation in a two specific, independent respects. Games of pure cooperation have exact match of preferences over all possible combinations of outcomes (cf. (E)’s approximate match). And because of this perfect match, if any one person prefers to conform conditionally on others conforming, all others share that preference too (cf (D)). So by requiring (D) we preserve a structural feature of coordination problems, and by requiring (C) we require some kind of approximation to a coordination problem. (F) on the other hand is a generalization of the condition that these games have more than one “solution” in the technical sense, and so are coordination *problems*.

It’s striking that, as far as I can see, Lewis says nothing about what further explanatory significance (beyond being analytic of David Lewis’s concept of convention) these three features enjoy. That contrasts with the explanatory power of (A-C) being true and common knowledge, which is at the heart of the idea of a rationally self-sustaining regularity in behaviour. I think it’s well worth keeping (A-C) and (D-F) separate in one’s mind when thinking through these matters, if only for this reason.

Here’s the Lewisian proposal to measure degree of conventionality:

A regularity R in the behaviour of P in a recurring situation S, is a convention to at least degree <z,a,b,c,d,e,f> when it’s common knowledge among P in at least fraction z of instances of S that:

(A*) BEHAVIOUR CONDITION: everyone in some fraction a of P conforms to R
(B*) EXPECTATION CONDITION: everyone in some fraction b of P expects a fraction of at least a of P else to conform to R
(C*) SPECIAL PREFERENCE CONDITION: everyone in some fraction c of P prefers that they conform to R conditionally on everyone in fraction a of P conforming to R.
(D*) GENERAL PREFERENCE CONDITION: everyone in some fraction d of P prefers that anyone conform to R conditionally on everyone in fraction a of P conforming to R.
(E*) COOPERATION CONDITION: everyone on some fraction e of P has approximately the same preferences regarding all possible combinations of actions
(F*) there exists an alternative regularity R* incompatible with R in fraction f of cases, which also meets the analogue of (C) and (D).

The degree of conventionality of R is then defined to be the set of tuples such that R is a convention to degree at least that tuple. A partial order of comparative conventionality can then be defined in the obvious way.

While measuring the degree to which the clauses of the characterization of perfect conventionality are met is a natural idea, there’s just no guarantee that it tracks anything we might want from a notion of partial conventionality, e.g. “resemblance to a perfect convention”. I’ll divide my remarks into two clusters: first on (A-C), and then on (D-F).

One the original conception, the (A-C) clauses work together in order to explain what a convention explains. That’s why, after all, Lewis makes sure that in clause C* the conditional preference is condition on the obtaining of the very fraction mentioned in clause (A*) and (B*). But more than this is required.

On that original conception, the rationality of conformity to (A) is to be explained by (common knowledge of) the expectations and preferences in (B) and (C). Where everyone has the expectations and preferences, the rationalization story roles along nicely. But once we allow exceptions, things break down.

Consider, first, the limit case where nobody at all has the expectation or preference (so (B,C) are met to degree zero). A regularity in conforming to the regularity can then be entirely accidental, obtaining independently of the attitudes prevailing among those conforming. Such situations lack the defining charactistics of a convention. But (holding other factors equal) Lewis’s definition orders them by how many people in the situation conform to the regularity. So, Lewis finds an ordering where this is really none to be had. That’s bad.

Consider, second, a case where the population divides evenly into two parts: those who have the preference but no expectation, and those who have the expectation but no preference. No person in any instance will have both the expectation and preference that in the paradigm cases work together to rationality support the regularity. To build a counterexample to Lewis’s analysis of comparative conventionality out of this, consider a situation where the expectation and preference clause are met to degree 0.4, but by the same group, which rationalizes 0.4 conformity. Now we have a situation where expectations and preferences do sustain the level of conformity, and so (all else equal) it deserves to be called a partial convention. But on Lewis’s characterization it is less of a convention than a situation where 50% of people have the preference, a non-overlapping 50% have the expectation, and 40% irrationally conform to the regularity. The correct view is that the former regularity is more conventional than the latter. Lewis says the opposite. I conclude Lewis characterized the notion of degree of convention in the wrong way.

Let me turn to the way he handles (D-F). What’s going on here, I think, is that he’s picking up three specific ways in which what’s going on can resemble a solution to a coordination problem. But there are again multiple problems. For a start, there are the kind of lack-of-overlap problems we just saw above. A situation where 40% of the people conform, and meet the relevant expectations and preference clause, and perfectly match in preferences over all relevant situations, is ranked *below* situations where 40% of people conform, meet the relevant expectations and preference clause, and are completely diverse in their preferences *but* the remaining 60% of the population has perfectly matched preferences against conformity to R. That’s no good at all!

But as well as the considerations about overlap, the details of the respects of similarity seem to me suspect. For example, consider a scenario where (A-C) are fully met, and everybody has preferences that diverge just too much to count as approximately the same, so (E) is met to degree zero. And compare that to a situation where two people have approximately the same preferences, and 98 others have completely divergent preferences. Then (E) is met to degree 0.02. The first is much more similar to perfect match of preferences than the second, but Lewis’s ranking gives the opposite verdict. (This reflects the weird feature that he loosens the clause from exact match to approximate match, and then on *top* of that loosening, imposes a measure of degree of satisfaction. I really think that the right thing here is to stick with a measure of similarity of preference among a relevant group of people, rather than counting pairwise exact match).

I’d fold in clause F into the discussion at this point, but my main concerns about it would really turn into concerns about whether Lewis’s model of conventions as equilibria is right, and that’d take me too far afield. So I’ll pass over it in silence.

To summarize. Lewis’s characterization of degrees of conventionality looks like it misfires a lot. The most important thing wrong with it that it doesn’t impose any sort of requirement that its clauses to be simultaneously satisfied. And that leaves it open to the kind of problems below.

My own proposal, which I listed at the start of this post, seems to me to be the natural way to fix this problem. I say: what we need to do is look for “kernals” of self-sustaining subpopulations, where we insist that each member of the kernal meets the conformity, expectation and preference conditions perfectly. The size of this kernal, as a fraction of those in the population involved in the situation, then measures how closely we approximate the original case. That fraction I called the “depth” of the convention, where a convention with depth 1 involves everyone involved in any instance of the situation pulling their weight, and a convention with depth 0.5 being one where only half are involved, but where that is still just as rationally self-sustaining as a case of perfect convention. We might introduce the neologism “depth of subconvention” to articulate this:

A regularity R in the behaviour of P in a recurring situation S, is a sub-convention of depth x when in every instance of S there is kernal K of the members of P such that it’s true and common knowledge among K in this instance of S that:

(A**) BEHAVIOUR CONDITION: everyone in K conforms to R
(B**) EXPECTATION CONDITION: everyone in K expects everyone in K to conform to R
(C**) SPECIAL PREFERENCE CONDITION: everyone in K prefers that they conform to R conditionally on everyone in K conforming to R.

and x is the fraction of P in the instance of S who are in K.

(These clauses contain a free variable K embedded in specifications of preference and expectation. So what is the content of the preferences and the expectations we’re here requiring? Do the people in the kernal satisfying the conditions need to conceive of the others in K who they expect to conform as being large enough (size k?) Or is it enough that they form preferences and expectations about a subgroup of those involved in the present instance, where that subgroup happens to be of size k? I go with the former, more liberal understanding. In cases where participants interests are grounded in the kind of success that requires k people to cooperate, then (C**) will likely not be met unless all participants have the belief that there are at least k of them. But that isn’t written into the clauses—and I don’t think it should be. Size might matter, but there’s no reason to think it always matters.)

To see why “breadth” as well as “depth” matters, consider the following setup. Suppose that our overall population P divides into conformers C (90%) and the defectors D (10%). The conformers are such that in any instance of S they will satisfy (A-C), whereas the defectors never do (for simplicity, suppose they violate all three conditions). So, if you’re a conformer, you always conform to R whenever you’re in S, because you prefer to do so if 90% of the others in that situation do, and you expect at least 90% of them to do so.

If everyone in P is present in each instance of S, this will be a straightforward instance of a partial subconvention, to degree 0.9. The biggest kernal witnessing the truth of the above clauses is simply the set of conformers, who are all present in every case.

But now consider a variantion where not all members of P are present in every case. Stipulate that the members of P present in a given instance of S are drawn randomly from the population as a whole. This will not be a partial convention to degree 0.9. That is because there will be instances of S where by chance, too many defectors are present, and the set of conformers is less than the fraction 0.9 of the total involved in that situation. So the set of conformers present in a given instance is sometimes but not always a “kernal” that meets the conditions laid down. Indeed, it is not a convention to any positive degree, because it could randomly be that only defectors are selected for an instance of S, and in that instance there is no kernal of size >0 satisfying the clauses. So by the above definition it won’t be a partial convention to any positive degree, even if such instances are exceptionally rare.

What we need to avoid this is to provide for exceptions to the “breadth” of the convention, i.e. the instances of S where the clauses are met, as Lewis does:

A regularity R in the behaviour of population P in a recurring situation S, is a convention of depth x, breadth y when there is a recurring situation T that refines S, and in each instance of T there is a subpopulation K of P, such that it’s true and common knowledge among K in that instance that:

(A**) BEHAVIOUR CONDITION: everyone in K conforms to R
(B**) EXPECTATION CONDITION: everyone in K expects everyone in K to conform to R
(C**) SPECIAL PREFERENCE CONDITION: everyone in K prefers that they conform to R conditionally on everyone in K conforming to R

and x is the fraction of S situations that are T situations and y is the fraction of P in the instance of T that are in K.

(I’ve written this in terms of a new recurring state T, rather than (per Lewis) talking about a fraction of the original recurring state type, to bring out the following feature. In the special case I’ve been discussing, where the largest kernal witnessing the truth of these clauses is simply those conformers present in C, then when the clauses are met with depth x and breadth y with respect to S and P, they will be met with depth 1 and breadth 1 with respect to T and C. That is: in this special case, the clauses in effect require there be a perfect subconvention with respect to some subpopulation and substitution of the population and situation we start from. Depth and Breadth of subconventionality is then measuring the fraction of the overall population and state that these “occupy”.

What do we now think about the remaining clauses of Lewis’s definition? I think there’s no obvious motive for extending the strategy I’ve pursued to this point, of requiring these clauses be satisfied perfectly by the kernal K. After all, (common knowledge of) the satisfaction of (A-C) already provides for the rational stability of the pattern of conformity. But equally (as we saw in one of my earlier objections to Lewis) we don’t want to measure the fraction of all those involved in the recurring situation who satisfy the clauses, else we’ll be back to problems of lack of overlap. What we want to do is take the kernal we have secured from subconvention conditions already set down, and look at the characteristics of the regularity that prevails among them. To what extent is that rationally stable regularity a convention? And that brings us right up to my official proposal, repeated here:

A regularity R in the behaviour of population P in a recurring situation S, is a convention of depth x, breadth y and degree z when there is a recurring situation T that refines S, and in each instance of T there is a subpopulation K of P, such that it’s true and common knowledge among K in that instance that:

(A) BEHAVIOUR CONDITION: everyone in K conforms to R
(B) EXPECTATION CONDITION: everyone in K expects everyone else in K to conform to R
(C) SPECIAL PREFERENCE CONDITION: everyone in K prefers that they conform to R conditionally on everyone else in K conforming to R.

where x (depth) is the fraction of S-situations which are T, y (breadth) is the fraction of all Ps involved who are Ks in this instance, and z is the degree to which (A-C) obtaining resembles a coordination equilibrium that solves a coordination problem among the Ks.

The key thing to note here, compared to the previous version, is that I’ve declined to unpack the notion of “resembling a coordination equilibrium that solves a coordination problem”. For all that’s been said here, you could look at the implicit analysis that Lewis’s (D*-E*) gives of this notion (now restricted to the members of the kernal), and plug that in. But earlier I objected to that characterization–it doesn’t seem to me to that the fraction of people with approximately matching preferences is a good measure of similarity to the original. In the absence of a plausible analysis, better to keep the notion as a working primitive (and if it doesn’t do much explanatory work, as is my current working hypothesis, analyzing that working primitive will be low down the list of priorities).

A closing remark. Lewis’s official position is neither the unrestricted (A-F) nor the quantative (A*-F*) above. Rather, he gives a version of (A-F) in which quantifiers throughout are replaced by ones that allow for exceptions (“almost everyone…”). But as far as I can see, the same kinds of worries arise for this case—for example, given any threshold for how many count as “almost everyone”, almost everyone can have the relevant conditional preference, almost everyone can have the relevant expectation, but it be not the case that almost everyone have both the preference and expectation, and so if almost everyone conforms to a regularity, at least some of that conformity is not rationalized by the attitudes guaranteed by the other clauses. To fix this, we can extract a “threshold” variant from the quantative proposal I have proposed, which would look like this:

A regularity R in the behaviour of population P in a recurring situation S, is a convention when there is a recurring situation T that refines S, and in each instance of T there is a subpopulation K of P, such that it’s true and common knowledge among K in that instance that:

(A) BEHAVIOUR CONDITION: everyone in K conforms to R
(B) EXPECTATION CONDITION: everyone in K expects everyone else in K to conform to R
(C) SPECIAL PREFERENCE CONDITION: everyone in K prefers that they conform to R conditionally on everyone else in K conforming to R.

where almost all S-situations are T, almost all P involved in the instance of T are in K, and (A-C) obtaining is almost a coordination equilibrium that solves a coordination problem among the Ks.

Here “almost a coordination equilibrium” is to be read as “having a high enough degree of similarity to a coordination equilibrium”.