Monthly Archives: October 2020

Proximal desires.

How might we measure the proximity or similarity of two belief states? Suppose they are represented in each case as a function from propositions to real numbers between 0 and 1, representing their respective degrees of belief. Is it possible to find a sensible and formally tractable measure of how similar these two states are?

How might we measure the proximity or similarity of two desire states? Suppose they are represented in each case as a function from propositions to real numbers, representing how desirable the agent finds each proposition being true. Is it possible to find a sensible and formally tractable measure of how similar these two states are?

The TL;DR of what follows is: I think we can find a measure of both (or better, a measure of the proximity of pairs of combined belief-desire states). And this idea of proximity between belief-desire psychologies is key to explaining the force of theoretical rationality constraints (probabilism) and means-end practical rationality constraints (causal expected utility theory). Furthermore, it’s the notion we need to articulate the role of “principles of charity” in metasemantics.

The first question above is one that has arisen prominently in accuracy-first formal epistemology. As the name suggests, the starting point of that project is a measure of the accuracy of belief states. Richard Pettigrew glosses the accuracy of a credence function at a world as its “proximity to the ideal credence at that world” (Accuracy and the laws of credence, p.47). If you buy Pettigrew’s main arguments for features of belief-proximity in chapter 4 of this book, then it’s a mathematical consequence that belief-proximity is what’s known as an “additive Bregman divergence”, and if you in addition think that the distance from belief b to belief b* is always the same as the distance from belief b* to belief b (i.e. proximity is symmetric) then one can prove, essentially, that the right way to measure the proximity of belief states Alpha and Beta is by taking the “squared Euclidean distance”, i.e. to take each proposition, take the difference between the real number representing Alpha’s credence in it and that representing Beta’s credence in it, take the square of this difference, and sum up the results over all propositions.

Now, once you have this measure of proximity to play with, accuracy-firsters like Pettigrew can put it to work in their arguments for the rational constraints on belief. Accuracy of a belief in w is proximity to the ideal belief state in w; if the ideal belief state for an agent x in w is one that matches the truth values of each proposition (“veritism”) then one can extract from the measure of proximity a measure of accuracy, and go on to prove, for example, that a non-probabilistic belief state b will be “accuracy dominated”, i.e. there will be some alternative belief state b* which is *necessarily* more accurate than it.

So far, so familiar. I like this way of relating theoretical rational constraints like probabilism to what’s ultimately valuable in belief–truth. But I’m also interested in notion of proximity for other reasons. In particular, when working in metasemantics, I want to think about principles of interpretation that take the following shape:

(I) On the basis of the interpreter’s knowledge of some primary data, and given constraints that tie possible belief states to features of that primary data, the interpreter is in a position to know that the target of interpretation has a belief state within a set C.

(II) The interpreter attributes to the target of interpretation that belief state within C which is closest to belief state m.

To fix ideas: the set C in (I) might arise out of a process of finding a probability-utility pair which rationalizes the target’s choice behaviour (i.e. always makes the option the target chooses the one which maximizes expected utility, by their lights, among the options they choose between). The magnetic belief state “m” in (II) might be the ideal belief state to have, by the interpreter’s lights, given what they know about the target’s evidential setting. Or it might be the belief state the interpreter would have in the target’s evidential setting.

There are lots of refinements we might want to add (allowing m to be non-unique, catering for situations in which there are several elements in C that are tied for closeness to m). We might want to clarify whether (I) and (II) are principles of practical interpretation, somehow mapping the processes or proper outputs of a real-life flesh and blood interpreter, or whether this is intended as a bit of theory of ideal interpretation, carried out on the basis of total “knowledge” of primary facts about the target. But I’ll set all that aside.

The thing I want to highlight is that step (II) of the process above makes essential use of a proximity measure. And it’s pretty plausible that we’re here shopping in the same aisle as the accuracy-first theorists. After all, a truth-maximizing conception of principles of interpretation would naturally want to construe (II) as attributing to the subject the most accurate belief state within the set C, and we’ll get that if we set the “ideal” credence (in a given world) to be the credal state that matches the truth values at that world, in line with Pettigrew’s veritism, and understand proximity in the way Pettigrew encourages us to. Pettigrew in fact defends his characterization of proximity independently of any particular identification of what the ideal credences are. If you were convinced by Pettigrew’s discussion, then even if the “ideal credence” m for the purposes of interpretation is different from the “ideal credence” for the purposes of the most fundamental doxastic evaluation, you’ll still think that the measure of proximity—additive Bregman divergence/squared Euclidean distance–is relevant in both cases.

That’s the end of the (present) discussion as far as belief goes. I want to turn to an extension to this picture that becomes pressing when we think of this in the context of principles of interpretation. For in the implementations that I am most interested in, what we get out of step (I) is not a set of belief states alone, but a set of belief-desire psychologies—a pairing of credence and utility functions, for example. Now, it’s possible that the second step of interpretation, (II), cares only about what goes on with belief—picking the belief-desire psychologies whose belief component is closest to the truth, to the evidence, or to the belief state component of some other relevantly magnetic psychological state. But the more natural version of this picture wouldn’t simply forget about the desires that are also being attributed. And if it is proximity between belief-desire psychologies in C and magnetic belief-desire psychology m that is at issue, we are appealing to a proximity not between belief states alone, but proximity between pairs of belief-desire states.

If desire states are represented by a function from propositions to real numbers (degrees of desirability) then there’s clearly a straight formal extension of the above method available to us. If we used squared euclidean distance as a measure of the proximity or similarity of a pair of belief states, use exactly the same formula for measuring the proximity of similarity of desire! But Pettigrew’s arguments for the characteristics which select that measure do not all go over. In particular, Pettgrew’s most extended discussion is in defence of a “decomposition” assumption that makes essential use of notions (e.g. “well-calibrated counterpart of the belief state”) that do not have any obvious analogue for belief-desire psychologies.

Is there anything to be said for the squared euclidean distance measure of proximity between belief-desire psychologies, in the absence of an analogue of what Pettigrew says in the special case of proximity of belief states? Well, one thing we can note is that as it extends Pettigrew’s measure of the proximity of belief states, it’s consistent with it–a straight generalization is the natural first hypothesis for belief-desire proximity to try, relative to the Pettigrew starting point. What I want to now discuss is a way of getting indirect support for it. What I’ll argue is that it can do work for us analogous to the work that it does for the probabilist in accuracy framework.

To get the inaccuracy framework off the ground, recall, Pettigrew commits to the identification of an ideal belief state at each world. The ideal belief state at w is that belief state whose levels of confidence in each proposition matches the truth value of that proposition at w (1 if the proposition is true, 0 if the proposition is false). To add something similar for desire, instead of truth values, let’s start from a fundamental value function defined over the worlds, V, measured by a real number. You can think of the fundamental valuation relation as fixed objectively (the objective goodness of the world in question), fixed objectively relative to an agent (the objective goodness of the world for that agent), as a projection of values embraced by the particular agent, or some kind of mix of the above. Pick your favourite and we’ll move on.

I say: the ideal degree of desirability for our agent to attach to the proposition that w is the case is V(w). But what is the ideal degree of desirability for other, less committal propositions? Here’s a very natural thought (I’ll come back to alternatives later): look at which world would come about were p the case, and the ideal desirability of p is just V(w) for that w which p counterfactually implies. (This, by the way, is a proposal that makes heavy reliance on the Stalnakerian idea that for every world there is a unique closest world where p). So we extend V(w), defined over worlds, to V*(p), defined over propositions, via counterfactual connections of this kind.

If we have this conception of ideal desires, and also the squared-euclidean measure of proximity between desire states, then a notion of “distance from the ideal desire state” drops out. Call this measure the misalignment of a desire state. If we have the squared-euclidean masure of proximity between combined belief-desire states, then what drops out is a notion of “distance from the ideal belief-desire state”, which is simply the sum of the inaccuracy of its belief component and the misalignment of its desire component.

The fundamental result that accuracy-firsters point to as a vindication of probabilism is this: unless a belief state b is a convex combination of truth values (i.e. a probability function) then there will be a b* which is necessarily more accurate than b. In this setting, the same underlying result (as far as I kind see—there are a few nice details about finitude and boundedness to sort out) delivers this: unless belief-desire state <b,d> is a convex combination over w of vectors of the form <truth-value at w, V-value at w>, then will be some alternative psychology <b*,d*> which will necessarily be closer to the ideal psychology (more accurate-and-aligned) than is <b,d>.

What must undominated belief-desire psychologies be like? We know they must be convex combinations of <truth value at w, V-value at w> pairs for varying w. The b component will then be a convex combination of truth values with weights k(w), i.e. a probability function that invests credence k(w) in w. More generally, both the b and d components are expectations of random variables with weights k(w). b(p) will be the expectation of indicator random variables for proposition p, and d(p) the expectation of the value-of-p random variable. The expectation of the value-of-p random variable turns out to be equal to the sum over all possible values of k of the following: k multiplied by the agent’s degree of belief of the counterfactual conditional if p had been the case, then value of the world would be k. And that, in effect, is Gibbard-Harper’s version of causal decision theory.

If the sketch above is correct, then measuring proximity of whole psychologies by squared euclidean distance (or more generally, an additive Bregman divergence), will afford a combined accuracy-domination argument for probabilism and value-domination argument for causal decision theory. That’s nice!

Notice that there’s some obvious modularity in the argument. I already noted that we could treat V(w) as objective, relativized or subjective value. Further, we get the particular Gibbard-Harper form of causal decision theory because we extended the ideal V over worlds to ideal V* over propositions via counterfactual conditionals concerning which world would obtain if p were the case. If instead defined the ideal V* in p as the weighted average of the values of worlds, weighted by the conditional chance of that world obtaining given p, then we’d end up with an expected-chance formulation of causal decision theory. If we defined the ideal V* in p via combinations of counterfactuals about chance, we would derive a Lewisian formulation of causal decision theory. If we reinterpret the conditional in the formulation given above as an indicative conditional, then we get a variant of evidential decision theory, coinciding with Jeffrey’s expected decision theory only if the probability of the relevant conditional is always equal to the corresponding conditional probability (that thesis, though, is famously problematic).

Okay, so let’s sum up. What has happened here is that, for the purposes of formulating a particular kind of metasemantics of belief and desire, we need a notion of proximity of whole belief-desire psychologies for one another. Now, Pettigrew has explicitly argued for specific way of measuring proximity for the belief side of the psychology. The natural thing to do is to extend his arguments to descriptions of psychological states including desires as well as beliefs. But unfortunately, his arguments for that specific way of handling proximity look too tied to belief. However, we can provide a more indirect abductive argument for the straight generalization of this way of measuring proxmity over belief-desire psychologies by (a) endorsing Pettigrew’s arguments for the special case of belief; and (b) noting that the straight generalization of this would provide a uniform vindication of both probabilism and standard rational requirements on desire as well as belief.

This, at least, makes me feel that I should be pretty comfortable at appealing to a notion of “proximity to the magnetic belief-desire state m” in formulating metasemantics in the style above, and measuring this by squared Euclidean distance—at least insofar as I am bought in to the conception of accuracy that Pettigrew sketches.

Let me make one final note. I’ve been talking throughout as if we all understood what real-valued “degrees of desire” are. And the truth is, I believe I do understand this. I think that I have neutral desire for some worlds/propositions, positive desire for others, negative for a third. I think that we can measure and compare the gap between the desirability of two propositions—the difference between the desirability of eating cake and eating mud is much greater than the difference between the the desirability of overnight oats and porridge. I think there are facts of the matter about whether you and I desire the same proposition equally, or whether I desire it more than you, or you desire it more than me.

But famously, some are baffled by interpersonal comparisons of utility, or features of the utility-scale, of the kind I instinctively like. If you think of attributing utility as all about finding representations that vindicate choice behaviour, interpersonal comparisons will be as weird as the idea of an interpersonal choice. The whole project of measuring proximity between desireability states via functions on their representations as real values might look like a weird starting point. If you google the literature on similarity measures for utility, you’ll find a lot of work on similarity of preference orderings e.g. by counting how many reversals of the orderings it takes to turn one into another. You might think this is a much less controversial starting point than what I’m doing, and that I need to do a whole heap more work to earn the right to my starting point.

I think the boot is on the other foot. The mental metasemantics in which I aim to deploy this notion of proximity denies that all there is to attributing utility is to find a representation that vindicates the agent’s choice behaviour. That’s step I, but step II goes beyond this to play favourites among the set of vindicatory psychological states. By the same token, the mental metasemantics sketched grounds interpersonal comparisons of desirability between various agents, by way of facts about the proximity of the desirability of the agent’s psychology to the magnetic psychological state m.

There’s a kind of dialectical stalemate here. If interpersonal comparisons are a busted flush, the prospects look dim for any kind of proximity measure of the kind I’m after here (i.e. one that extends the proximity implicit in accuracy-first framework). If however, the kind of proximity measures I’ve been discussing make sense, then we can use them to ground the real-value representations of agent’s psychological states that make possible interpersonal comparisons. I don’t think either myself or my more traditional operationalizing opponent here should be throwing shade at the other at this stage of development–rather, each should be allowed develop their overall account of rational psychology, and at the end of the process we an come back and compare notes about whose starting assumptions were ultimately more fruitful.

Comparative conventionality

The TL;DR summary of what follows is that we should quantify the conventionality of a regularity (David-Lewis-style) as follows:

A regularity R in the behaviour of population P in a recurring situation S, is a convention of depth x, breadth y and degree z when there is a recurring situation T that refines S, and in each instance of T there is a subpopulation K of P, such that it’s true and common knowledge among K in that instance that:

(A) BEHAVIOUR CONDITION: everyone in K conforms to R
(B) EXPECTATION CONDITION: everyone in K expects everyone else in K to conform to R
(C) SPECIAL PREFERENCE CONDITION: everyone in K prefers that they conform to R conditionally on everyone else in K conforming to R. 

where x (depth) is the fraction of S-situations which are T, y (breadth) is the fraction of all Ps involved who are Ks in this instance, and z is the degree to which (A-C) obtaining resembles a coordination equilibrium that solves a coordination problem among the Ks.

From grades of conventionality so defined, we can characterize in the obvious way a partial ordering of regularities by whether one is more of a convention than another. What I have set out differs in several respects from what Lewis himself proposed along these lines. The rest of the post spells out why.

The first thing to note is that in Convention Lewis revises and re-revises what it takes to be a convention. The above partial version is a generalization of his early formulations in the book. Here’s a version of his original:

A regularity R in the behaviour of a population P in a recurring situation S is a convention if and only it is true that, and common knowledge in P that:

(A) BEHAVIOUR CONDITION: everyone conforms to R
(B) EXPECTATION CONDITION: everyone expects everyone else to conform to R
(C) SPECIAL PREFERENCE CONDITION: everyone prefers that they conform to R conditionally on everyone else conforming to R. 

where (C) holds because S is a coordination problem and uniform conformity to R is a coordination equilibrium in S.

A clarificatory note: in some conventions (e.g. a group of friends meeting in the same place week after week) the population in question are all present in instances of the recurring situation. But in others—languages, road driving conventions—the recurring situation involves more or less arbitrary selection of pairs, triples, etc of indiviuduals from a far larger situation. When we read the clauses, the intended reading is that the quantifiers “everyone” be restricted just to those members of the population who are present in the relevant instance of the recurring situation. The condition is then that it’s common knowledge instance-by-instance *between conversational participants* or *between a pair of drivers* what they’ll do, what they expect, what they prefer, and so on. That matters! For example, it might be that strictly there is no common knowledge at all among *everyone on the road* about what side of the road to drive on. I may be completely confident that there’s at least one person within the next 200 miles not following the relevant regularity. Still, I may share common knowledge with each individual I encounter, that in this local situation we are going to conform, that we have the psychological states backing that up, etc. (For Lewis’s discussion of this, see his discussion of generality “in sensu diviso” over instances). 

Let me now tell the story about how Lewis’s own proposal arose. First, we need to see his penultimate characterization of a convention:

A regularity R in the behaviour of P in a recurring state S, is a perfect convention when it’s common knowledge among P in any instance of S that:

(A) BEHAVIOUR CONDITION: everyone conforms to R
(B) EXPECTATION CONDITION: everyone expects everyone else to conform to R
(C) SPECIAL PREFERENCE CONDITION: everyone prefers that they conform to R conditionally on everyone else conforming to R. 
(D) GENERAL PREFERENCE CONDITION: everyone prefers that anyone conform to R conditionally on all but one conform to R.
(E) COOPERATION CONDITION: everyone has approximately the same preferences regarding all possible combinations of actions
(F) There exists an alternative regularity R* incompatible with R, which also meets the analogue of (C) and (D). 

The explicit appeal to coordination problems and their solution by coordination equilibria has disappeared. Replacing them are the three clauses (D-F). In (D) and (E) Lewis ensures that the scenario resembles recurring games of pure cooperation in a two specific, independent respects. Games of pure cooperation have exact match of preferences over all possible combinations of outcomes (cf. (E)’s approximate match). And because of this perfect match, if any one person prefers to conform conditionally on others conforming, all others share that preference too (cf (D)). So by requiring (D) we preserve a structural feature of coordination problems, and by requiring (C) we require some kind of approximation to a coordination problem. (F) on the other hand is a generalization of the condition that these games have more than one “solution” in the technical sense, and so are coordination *problems*.

It’s striking that, as far as I can see, Lewis says nothing about what further explanatory significance (beyond being analytic of David Lewis’s concept of convention) these three features enjoy. That contrasts with the explanatory power of (A-C) being true and common knowledge, which is at the heart of the idea of a rationally self-sustaining regularity in behaviour. I think it’s well worth keeping (A-C) and (D-F) separate in one’s mind when thinking through these matters, if only for this reason.

Here’s the Lewisian proposal to measure degree of conventionality:


A regularity R in the behaviour of P in a recurring situation S, is a convention to at least degree <z,a,b,c,d,e,f> when it’s common knowledge among P in at least fraction z of instances of S that:

(A*) BEHAVIOUR CONDITION: everyone in some fraction a of P conforms to R
(B*) EXPECTATION CONDITION: everyone in some fraction b of P expects a fraction of at least a of P else to conform to R
(C*) SPECIAL PREFERENCE CONDITION: everyone in some fraction c of P prefers that they conform to R conditionally on everyone in fraction a of P conforming to R. 
(D*) GENERAL PREFERENCE CONDITION: everyone in some fraction d of P prefers that anyone conform to R conditionally on everyone in fraction a of P conforming to R.
(E*) COOPERATION CONDITION: everyone on some fraction e of P has approximately the same preferences regarding all possible combinations of actions
(F*) there exists an alternative regularity R* incompatible with R in fraction f of cases, which also meets the analogue of (C) and (D). 


The degree of conventionality of R is then defined to be the set of tuples such that R is a convention to degree at least that tuple. A partial order of comparative conventionality can then be defined in the obvious way.

While measuring the degree to which the clauses of the characterization of perfect conventionality are met is a natural idea, there’s just no guarantee that it tracks anything we might want from a notion of partial conventionality, e.g. “resemblance to a perfect convention”. I’ll divide my remarks into two clusters: first on (A-C), and then on (D-F).

One the original conception, the (A-C) clauses work together in order to explain what a convention explains. That’s why, after all, Lewis makes sure that in clause C* the conditional preference is condition on the obtaining of the very fraction mentioned in clause (A*) and (B*). But more than this is required.

On that original conception, the rationality of conformity to (A) is to be explained by (common knowledge of) the expectations and preferences in (B) and (C). Where everyone has the expectations and preferences, the rationalization story roles along nicely. But once we allow exceptions, things break down.

Consider, first, the limit case where nobody at all has the expectation or preference (so (B,C) are met to degree zero). A regularity in conforming to the regularity can then be entirely accidental, obtaining independently of the attitudes prevailing among those conforming. Such situations lack the defining charactistics of a convention. But (holding other factors equal) Lewis’s definition orders them by how many people in the situation conform to the regularity. So, Lewis finds an ordering where this is really none to be had. That’s bad.

Consider, second, a case where the population divides evenly into two parts: those who have the preference but no expectation, and those who have the expectation but no preference. No person in any instance will have both the expectation and preference that in the paradigm cases work together to rationality support the regularity. To build a counterexample to Lewis’s analysis of comparative conventionality out of this, consider a situation where the expectation and preference clause are met to degree 0.4, but by the same group, which rationalizes 0.4 conformity. Now we have a situation where expectations and preferences do sustain the level of conformity, and so (all else equal) it deserves to be called a partial convention. But on Lewis’s characterization it is less of a convention than a situation where 50% of people have the preference, a non-overlapping 50% have the expectation, and 40% irrationally conform to the regularity. The correct view is that the former regularity is more conventional than the latter. Lewis says the opposite. I conclude Lewis characterized the notion of degree of convention in the wrong way. 

Let me turn to the way he handles (D-F). What’s going on here, I think, is that he’s picking up three specific ways in which what’s going on can resemble a solution to a coordination problem. But there are again multiple problems. For a start, there are the kind of lack-of-overlap problems we just saw above. A situation where 40% of the people conform, and meet the relevant expectations and preference clause, and perfectly match in preferences over all relevant situations, is ranked *below* situations where 40% of people conform, meet the relevant expectations and preference clause, and are completely diverse in their preferences *but* the remaining 60% of the population has perfectly matched preferences against conformity to R. That’s no good at all!

But as well as the considerations about overlap, the details of the respects of similarity seem to me suspect. For example, consider a scenario where (A-C) are fully met, and everybody has preferences that diverge just too much to count as approximately the same, so (E) is met to degree zero. And compare that to a situation where two people have approximately the same preferences, and 98 others have completely divergent preferences. Then (E) is met to degree 0.02. The first is much more similar to perfect match of preferences than the second, but Lewis’s ranking gives the opposite verdict. (This reflects the weird feature that he loosens the clause from exact match to approximate match, and then on *top* of that loosening, imposes a measure of degree of satisfaction. I really think that the right thing here is to stick with a measure of similarity of preference among a relevant group of people, rather than counting pairwise exact match).

I’d fold in clause F into the discussion at this point, but my main concerns about it would really turn into concerns about whether Lewis’s model of conventions as equilibria is right, and that’d take me too far afield. So I’ll pass over it in silence.

To summarize. Lewis’s characterization of degrees of conventionality looks like it misfires a lot. The most important thing wrong with it that it doesn’t impose any sort of requirement that its clauses to be simultaneously satisfied. And that leaves it open to the kind of problems below.

My own proposal, which I listed at the start of this post, seems to me to be the natural way to fix this problem. I say: what we need to do is look for “kernals” of self-sustaining subpopulations, where we insist that each member of the kernal meets the conformity, expectation and preference conditions perfectly. The size of this kernal, as a fraction of those in the population involved in the situation, then measures how closely we approximate the original case. That fraction I called the “depth” of the convention, where a convention with depth 1 involves everyone involved in any instance of the situation pulling their weight, and a convention with depth 0.5 being one where only half are involved, but where that is still just as rationally self-sustaining as a case of perfect convention. We might introduce the neologism “depth of subconvention” to articulate this:

A regularity R in the behaviour of P in a recurring situation S, is a sub-convention of depth x when in every instance of S there is kernal K of the members of P such that it’s true and common knowledge among K in this instance of S that:

(A**) BEHAVIOUR CONDITION: everyone in K conforms to R
(B**) EXPECTATION CONDITION: everyone in K expects everyone in K to conform to R
(C**) SPECIAL PREFERENCE CONDITION: everyone in K prefers that they conform to R conditionally on everyone in K conforming to R. 

and x is the fraction of P in the instance of S who are in K.

(These clauses contain a free variable K embedded in specifications of preference and expectation. So what is the content of the preferences and the expectations we’re here requiring? Do the people in the kernal satisfying the conditions need to conceive of the others in K who they expect to conform as being large enough (size k?) Or is it enough that they form preferences and expectations about a subgroup of those involved in the present instance, where that subgroup happens to be of size k? I go with the former, more liberal understanding. In cases where participants interests are grounded in the kind of success that requires k people to cooperate, then (C**) will likely not be met unless all participants have the belief that there are at least k of them. But that isn’t written into the clauses—and I don’t think it should be. Size might matter, but there’s no reason to think it always matters.)

To see why “breadth” as well as “depth” matters, consider the following setup. Suppose that our overall population P divides into conformers C (90%) and the defectors D (10%). The conformers are such that in any instance of S they will satisfy (A-C), whereas the defectors never do (for simplicity, suppose they violate all three conditions). So, if you’re a conformer, you always conform to R whenever you’re in S, because you prefer to do so if 90% of the others in that situation do, and you expect at least 90% of them to do so. 

If everyone in P is present in each instance of S, this will be a straightforward instance of a partial subconvention, to degree 0.9. The biggest kernal witnessing the truth of the above clauses is simply the set of conformers, who are all present in every case.

But now consider a variantion where not all members of P are present in every case. Stipulate that the members of P present in a given instance of S are drawn randomly from the population as a whole. This will not be a partial convention to degree 0.9. That is because there will be instances of S where by chance, too many defectors are present, and the set of conformers is less than the fraction 0.9 of the total involved in that situation. So the set of conformers present in a given instance is sometimes but not always a “kernal” that meets the conditions laid down. Indeed, it is not a convention to any positive degree, because it could randomly be that only defectors are selected for an instance of S, and in that instance there is no kernal of size >0 satisfying the clauses. So by the above definition it won’t be a partial convention to any positive degree, even if such instances are exceptionally rare. 


What we need to avoid this is to provide for exceptions to the “breadth” of the convention, i.e. the instances of S where the clauses are met, as Lewis does:

A regularity R in the behaviour of population P in a recurring situation S, is a convention of depth x, breadth y when there is a recurring situation T that refines S, and in each instance of T there is a subpopulation K of P, such that it’s true and common knowledge among K in that instance that:

(A**) BEHAVIOUR CONDITION: everyone in K conforms to R
(B**) EXPECTATION CONDITION: everyone in K expects everyone in K to conform to R
(C**) SPECIAL PREFERENCE CONDITION: everyone in K prefers that they conform to R conditionally on everyone in K conforming to R

and x is the fraction of S situations that are T situations and y is the fraction of P in the instance of T that are in K.


(I’ve written this in terms of a new recurring state T, rather than (per Lewis) talking about a fraction of the original recurring state type, to bring out the following feature. In the special case I’ve been discussing, where the largest kernal witnessing the truth of these clauses is simply those conformers present in C, then when the clauses are met with depth x and breadth y with respect to S and P, they will be met with depth 1 and breadth 1 with respect to T and C. That is: in this special case, the clauses in effect require there be a perfect subconvention with respect to some subpopulation and substitution of the population and situation we start from. Depth and Breadth of subconventionality is then measuring the fraction of the overall population and state that these “occupy”.

What do we now think about the remaining clauses of Lewis’s definition? I think there’s no obvious motive for extending the strategy I’ve pursued to this point, of requiring these clauses be satisfied perfectly by the kernal K. After all, (common knowledge of) the satisfaction of (A-C) already provides for the rational stability of the pattern of conformity. But equally (as we saw in one of my earlier objections to Lewis) we don’t want to measure the fraction of all those involved in the recurring situation who satisfy the clauses, else we’ll be back to problems of lack of overlap. What we want to do is take the kernal we have secured from subconvention conditions already set down, and look at the characteristics of the regularity that prevails among them. To what extent is that rationally stable regularity a convention? And that brings us right up to my official proposal, repeated here:

A regularity R in the behaviour of population P in a recurring situation S, is a convention of depth x, breadth y and degree z when there is a recurring situation T that refines S, and in each instance of T there is a subpopulation K of P, such that it’s true and common knowledge among K in that instance that:

(A) BEHAVIOUR CONDITION: everyone in K conforms to R
(B) EXPECTATION CONDITION: everyone in K expects everyone else in K to conform to R
(C) SPECIAL PREFERENCE CONDITION: everyone in K prefers that they conform to R conditionally on everyone else in K conforming to R. 

where x (depth) is the fraction of S-situations which are T, y (breadth) is the fraction of all Ps involved who are Ks in this instance, and z is the degree to which (A-C) obtaining resembles a coordination equilibrium that solves a coordination problem among the Ks.

The key thing to note here, compared to the previous version, is that I’ve declined to unpack the notion of “resembling a coordination equilibrium that solves a coordination problem”. For all that’s been said here, you could look at the implicit analysis that Lewis’s (D*-E*) gives of this notion (now restricted to the members of the kernal), and plug that in. But earlier I objected to that characterization–it doesn’t seem to me to that the fraction of people with approximately matching preferences is a good measure of similarity to the original. In the absence of a plausible analysis, better to keep the notion as a working primitive (and if it doesn’t do much explanatory work, as is my current working hypothesis, analyzing that working primitive will be low down the list of priorities).

A closing remark. Lewis’s official position is neither the unrestricted (A-F) nor the quantative (A*-F*) above. Rather, he gives a version of (A-F) in which quantifiers throughout are replaced by ones that allow for exceptions (“almost everyone…”). But as far as I can see, the same kinds of worries arise for this case—for example, given any threshold for how many count as “almost everyone”, almost everyone can have the relevant conditional preference, almost everyone can have the relevant expectation, but it be not the case that almost everyone have both the preference and expectation, and so if almost everyone conforms to a regularity, at least some of that conformity is not rationalized by the attitudes guaranteed by the other clauses. To fix this, we can extract a “threshold” variant from the quantative proposal I have proposed, which would look like this:

A regularity R in the behaviour of population P in a recurring situation S, is a convention when there is a recurring situation T that refines S, and in each instance of T there is a subpopulation K of P, such that it’s true and common knowledge among K in that instance that:

(A) BEHAVIOUR CONDITION: everyone in K conforms to R
(B) EXPECTATION CONDITION: everyone in K expects everyone else in K to conform to R
(C) SPECIAL PREFERENCE CONDITION: everyone in K prefers that they conform to R conditionally on everyone else in K conforming to R. 

where almost all S-situations are T, almost all P involved in the instance of T are in K, and (A-C) obtaining is almost a coordination equilibrium that solves a coordination problem among the Ks.

Here “almost a coordination equilibrium” is to be read as “having a high enough degree of similarity to a coordination equilibrium”.