# Category Archives: Uncategorized

## Follow up: GKL similarity and social choice

The previous post discusses a “distance minimizing” way of picking a compromise between agents with a diverse set of utilities. If you measure distance (better: divergence) between utility functions by square Euclidean distance, then a utilitarian compromise pops out.

I wanted now to discuss briefly a related set of results (I’m grateful for pointers and discussion with Richard Pettigrew here, though he’s not to blame for any goofs I make along the way). The basic idea here is to use a different distance/divergence measure between utilities, and look at what happens. One way to regard what follows is as a serious contender (or serious contenders) for measuring similarity of utilities. But another way of looking at this is as an illustration that the choice of similarity I made really has significant effects.

I borrowed the square Euclidean distance analysis of similarity from philosophical discussions of similarity of belief states. And the rival I now discuss is also prominent in that literature (and is all over the place in information theory). It is (generalized) Kullback-Leibler relative entropy (GKL), and it gets defined, on a pair of real valued vectors U,V in this way:

$D_{KL}(U,V):=\sum_{p\in P} U(p)\log \frac{U(p)}{V(p)} - U(p)+V(p)$

Note that when the vectors are each normalized to the same quantity, the sum of U(p) over all p is equal to the sum of V(p) over all p, and so two latter summands cancel. In the more general case, they won’t. Kullback-Leibler relative entropy is usually applied with U and V being probability functions, which are normalized, so you normally find it in the form where it is a weighted sum of logs. Notoriously, GKL is not symmetric: the distance from U to V can be different from the distance to U from V. This matters; more anon.

(One reason I’m a little hesitant with using this as a measure of similarity between utilities in this context is the following. When we’re using it to measure similarity between beliefs or probability functions, there’s a natural interpretation of it as the expectation from U’s perspective of difference between the log of U and the log of V. But when comparing utilities rather than probabilities means we can’t read the formula this way. It feels to me a bit more of a formalistic enterprise for that reason. Another thing to note is that taking logs is well defined only when the relevant utilities are positive, which again deserves some scrutiny. Nevertheless….)

What happens when we take GKL as a distance (divergence) measure, and then have a compromise between a set of utilities by minimizing total sum distance from the compromise point to the input utilities? This article by Pettigrew gives us the formal results that speak to the question. The key result is that the compromise utility $U_C$ that emerges from a set of m utility functions $U_i$ is the geometrical mean:

$U_C(p)= (\prod_{i\in A} U_i(p))^{\frac{1}{m}}$.

Where the utilitarian compromise utilities arising from squared euclidean distance similarity look to the sum of individual utilities, this compromise looks at the product of individual utilities. It’s what’s called in the social choice literature a symmetrical “Nash social welfare function” (that’s because it can be viewed as a special case of a solution to a bargaining game that Nash characterized: the case where the “threat” or “status quo” point is zero utility for all). It has some interesting and prima facie attractive features—it prioritizes the worse off, in that a fixed increment of utility will maximize the product of everyone’s utilities if awarded to someone who has ex ante lowest utility. It’s also got an egalitarian flavour, in that you maximize the product of a population’s utilities by dividing up total utility evenly among the population (contrast utilitarianism, where you can distribute utility in any old way among a population and get the same overall sum, and so any egalitarian features of the distribution of goods have to rely on claims about diminishing marginal utility of those goods; which by the same token leaves us open to “utility monsters” in cases where goods have increasing utility for one member of the population). Indeed, as far as I can tell, it’s a form of prioritarianism, in that it ranks outcomes by way of a sum of utilities which are discounted by the application of a concave function (you preserve the ranking of outcomes if you transform the compromise utility function by a monotone increasing function, and in this case we can first raise it to the mth power, and then take logs, and the result will be the sum of log utilities. And since log is itself a concave function this meets the criteria for prioritarianism). Anyway, the point here is not to evaluate Nash social welfare, but to derive it.

The formal result is proved in the Pettigrew paper, as a corollary to a very general theorem. Under the current interpretation that theorem also has the link between squared Euclidean distance and utilitarianism of the previous post as another special case. However, it might be helpful to see how the result falls out of elementary minimization (it was helpful for me to work through it, anyway, so I’m going to inflict it on you). So we start with the following characterization, where A is the set of agents whose utilities we are given:

$U_C=\textsc{argmin}_X \sum_{i\in A} D_{KL}(X,U_i)$

To find this we need to find X which makes this sum minimal (P being the set of n propositions over which utilities are defined, and A being the set of m agents):

$\sum_{i\in A} \sum_{p\in P} X(p)\log \frac{X(p)}{U_i(p)} - X(p)+U_i(p)$

Rearrange as a sum over p:

$\sum_{p\in P} \sum_{i\in A} X(p)\log \frac{X(p)}{U_i(p)} - X(p)+U_i(p)$

Since we can assign each X(p) independently of the others, we minimize this sum by minimizing each summand. Fixing p, and writing $x:=X(p)$ and $u_i:=U_i(p)$, our task now is to find the value of u which minimizes the following:

$\sum_{i\in A} x\log \frac{x}{u_i} - x+u_i$

We do this by differentiating and setting the result to zero. The result of differentiating (once you remember the product rule and that differentiating logs gives you a reciprocal) is:

$\sum_{i\in A} \log \frac{x}{u_i}$

But a sum of logs is the log of the product, and so the condition for minimization is:

$0=\log \frac{x^m}{\prod_{i\in A}u_i}$

Taking exponentials we get:

$1=\frac{x^m}{\prod_{i\in A}u_i}$

That is:

$x^m=\prod_{i\in A}u_i$

Unwinding the definitions of the constants and variables gets us the geometrical mean/Nash social welfare function as promised.

So that’s really neat! But there’s another question to ask here (also answered in the Pettigrew paper). What happens if we minimize sum total distance, not from the compromise utility to each of the components, but from the components to the compromise? Since GKL distance/divergence is not symmetric, this could give us something different. So let’s try it. We swap the positions of the constant and variables in the sums above, and the task becomes to minimize the following:

$\sum_{i\in A} u_i\log \frac{x}{u_i} - u_i+x$

When we come to minimize this by differentiating, we no longer have a product of functions in x to differentiate with respect to x. That makes the job easier, and ends up with us with the constraint:

$\sum_{i\in A} 1-\frac{u_i}{x}$

Rearranging we get:

$x= \frac{1}{n} \sum_{i\in A} u_i$

and we’re back to the utilitarian compromise proposal again! (That is, this distance-minimizing compromise delivers the arithmetical mean rather than the geometrical mean of the components).

Stepping back: what we’ve seen is that if you want to do distance-minimization (similarity-maximization, minimal-mutilation) compromise on cardinal utilities then the precise way distance you choose really matters. Go for squared euclidean distance and you get utilitarianism dropping out. Go for the log distance of the GKL, and you get either utilitarianism or the Nash social welfare rule dropping out, depending on the “direction” in which you calculate the distances. These results are the direct analogues of results that Pettigrew gives for belief-pooling. If we assume that the way of measuring similarity/distance for beliefs and utilities should be the same (as I did at the start of this series of posts) then we may get traction on social welfare functions through studying what is reasonable in the belief pooling setting (or indeed, vice versa).

## From desire-similarity to social choice

In an earlier post, I set out proposal for measuring distance or (dis)similarity between desire-states (if you like, between utility functions defined over a vector of propositions). That account started with the assumption that we measured strength of desire by real numbers. And the proposal was to measure the (dis)similarity between desires by the squared euclidean distance between the vectors of desirability at issue. If $\Omega$ is the finite set of n propositions at issue, we characterize similarity like this:

$d(U,V)= \sum_{p\in\Omega} (U(p)-V(p))^2$

In that earlier post, I linked this idea to “value” dominance arguments for the characteristic equations of causal decision theory. Today, I’m thinking about compromises between the desires of a diverse set of agents.

The key idea here is to take a set A of m utility functions $U_i$, and think about what compromise utility vector $U_C$ makes sense. Here’s the idea: we let the compromise $U_C$ be that utility vector which is closest overall to the inputs, where we measure overall closeness simply by adding up the distance between it and the input utilities $U_i$. That is:

$U_C = \textsc{argmin}_X \sum_i d(X,U_i)$

So what is the X which minimizes the following?

$\sum_{p\in\Omega} \sum_{i\in A} (X(p)-U_i(p))^2$

Rearranging:

$\sum_{i\in A} \sum_{p\in\Omega}(X(p)-U_i(p))^2$

This is a sum of m summands, each of which is positive. So you find the minimum value by minimizing each summand. And to minimize the ith summand we differentiate and set the result to zero:

$\sum_{p\in\Omega}(X(p)-U_i(p))=0$

This gives us the following value of X(p):

$X(p)=\frac{\sum_{i\in A}U_i(p)}{m}$

This tells us exactly what value $U_C$ must assign to p. It must be the average utility assigned to p of the m input functions.

Suppose our group of agents is faced with a collective choice between a number of options. Then one option O is strictly preferred to the other options according to the compromise utility $U_C$ just in case the average utility the agents assign to it is greater than the average utility the agents assign to any other option. (In fact, since the population is fixed when evaluating each option, we can ignore the fact we’re taking averages—O is preferred exactly when the sum total of utilities assigned to it across the population is greater than for any other). So the procedure for social choice “choose according to the distance-mimimizing compromise function” is the utilitarian choice procedure.

That’s really all I want to observe for today. A couple of finishing up notes. First, I haven’t found a place where this mechanism for compromise choice is set out and defended (I’m up for citations though, since it seems a natural idea). Second, there is at least an analogous strategy already in the literature. In Gaertner’s A Primer in Social Choice Theory he discusses (p.112) the Kemeny procedure for social choice, which works on ordinal preference rankings over options, and proceeds by finding that ordinal ranking which is “closest” to a profile of ordinal rankings of the options by a population. Closeness is here measured by the Kemeny metric, which counts the number of pairwise preference reversals required to turn one ranking into the other. Some neat results are quoted: a Condorcet winner (the option that would win against all others in a purality vote) if it exists is always top of the Kemeny compromise ranking. As the Kemeny compromise ranking stands to the Kemeny distance metric over sets of preference orderings, so the utilitarian utility function stands to the square-distance divergence over sets of cardinal utility functions.

I’ve been talking about all this as if every aspect of utility functions were meaningful. But (as discussed in recent posts) some disagree. Indeed, one very interesting argument for utilitarianism has as a premise that utility functions are invariant under level-changes—i.e the utility function U and the utility function V represent the same underlying desire-state if there is a constant $a$ such that for each proposition p, $U(p)=V(p)+a$ (see Gaertner ch7). Now, it seems like the squared euclidean similarity measure doesn’t jive with this picture at all. After all, if we measure the squared Euclidean distance between U and V that differ by a constant, as above, we get:

$\sum_{p\in\Omega}(V(p)-U(p))^2=\sum_{p\in\Omega}(U(p)+a-U(p))^2=n.a^2$

On the one hand, on the picture just mentioned, these are supposed to be two representations of the same underlying state (if level-boosts are just a “choice of unit”) and on the other hand, they have positive dissimilarity by the distance measure I’m working with.

Now, as I’ve said in previous posts, I’m not terribly sympathetic to the idea that utility functions represent the same underlying desire-state when they’re related by a level boost. I’m happy to take the verdict of the squared euclidean similarity measure literally. After all, it was only one argument for utilitarianism as a principle of social choice that required the invariance claim–the reverse implication may not hold. In this post we have, in effect, a second independent argument for utilitarianism as a social choice mechanism that starts from a rival, richer preference structure.

But what if you were committed to the level-boosting invariance picture of preferences? Well, really what you should be thinking about in that case is equivalence classes of utility functions, differing from each other solely by a level-boost. What we’d really want, in that case, is a measure of distance or similarity between these classes, that somehow relates to the squared euclidean distance. One way forward is to find a canonical representative of each equivalence class. For example, one could choose the member of a given equivalence class that is closest to the null utility vector–from a given utility function U, you find its null-closest equivalent by subtracting a constant equal to the average utility it assigns to propositions: $U_0=U-\frac{\sum_{p\in\Omega} U(p)}{n}$.

Another way to approach this is to look at the family of squared euclidean distances between level-boosted equivalents of two given utility functions. In general, these distances will take the form

$\sum_{p\in Omega} ((U(p)-\alpha) -(V(p) -\beta))^2=\sum_{p\in \Omega} (U(p)-V(p) -\gamma)^2$

(Where $\gamma=\alpha-\beta$.) You find the minimum element in this set of distances (the closest the two equivalence classes come to each other) by differentiating with respect to gamma and setting the result to zero. That is:

$0=\sum_{p\in Omega} (U(p)-V(p) -\gamma)$,

which rearranging gives:

$\gamma=\frac{\sum_{p\in \Omega} (U(p)-V(p))}{n}=\frac{\sum_{p\in \Omega} U(p)}{n}-\frac{\sum_{p\in \Omega} V(p))}{n}$

Working backwards, set $\alpha:=\frac{\sum_{p\in \Omega} U(p)}{n}$ and $\beta:=\frac{\sum_{p\in \Omega} V(p))}{n}$, and we have defined two level boosted variants of the original U and V which minimize the distance between the classes of which they are representatives (in the square-euclidean sense). But note these level boosted variants are just $U_0$ and $V_0$. That is: minimal distance (in the square-euclidean sense) between two equivalence classes of utility functions is achieved by looking at the squared euclidean distance between the representatives of those classes that are closest to the null utility.

This is a neat result to have in hand. I think the “minimum distance between two equivalence classes” is better motivated than simply picking arbitrary representatives of the two families, if we want a way of extending the squared-Euclidean measure of similarity to utilities which are assumed to be invariant under level boosts. But this last result shows that we can choose (natural) representatives of the equivalence classes generated and measure the distance between them to the same effect. It also shows us that the social choice compromise which minimizes distance between families of utility can be found by (a) using the original procedure above for finding the utility function $U_C$ selected as a minimum-distance compromise between the reprentative of each family of utility functions; and (b) selecting the family of utility functions that are level boosts of $U_C$. Since the level boosts wash out of the calculation of the relative utilities of a set of options, all the members of the $U_C$ family will agree on which option to choose from a given set.

I want to emphasize again: my own current view is that the complexity intoduced in the last few paragraphs is unnecessary (since my view is that utilities that differ by constant factors from one another represent distanct desire-states). But I think you don’t have to agree with me on this matter to use the minimum distance compromise argument for utilitarian social choice.

## How emotions might constrain interpretation

Joy is appropriate when you learn that something happens that you *really really* want. Despair is appropriate when you learn that something happens that you *really really* don’t want to happen. Emotional indifference is appropriate when you learn that something happens which you neither want nor don’t want–which is null for you. And there are grades of appropriate emotional responses—from joy to happiness, to neutrality, to sadness, to despair. I take it that we all know the differences in the intensity of the feeling in each case, and have no trouble distinguishing the valence as positive or negative.

More than just level and intensity of desire matters to the appropriateness of an emotional response. You might not feel joy in something you already took for granted, for example. Belief-like as well as desire-like states matter when we assess an overall pattern of belief/desire/emotional states as to whether they “hang together” in an appropriate way–whether they are rationally coherent. But levels and intensities of desire obviously matter (I think).

Suppose you were charged with interpreting a person about whose psychology you knew nothing beforehand. I tell you what they choose out the options facing them in a wide variety of circumstances, in response to varying kinds of evidence. This is a hard task for you, even given the rich data, but if you assumed the personal is rational you could make progress. But if *all* you did was attribute beliefs and desires which (structrurally) rationalize the choices and portray the target as responding rationally to the evidence, then there’d be a distintive kind of in-principle limit built into the task. If you attributed utility and credences which make the target’s choices maximize expected utility, and evolve by conditionalization on evidence, then you’d get a fix on what the target prefers to what, but not, in any objective sense, how much more they prefer one thing to another, or whether they are choosing x over y because x is the “lesser or two evils” or the “greater of two goods”. If you like, think of two characters facing the same situation–an enthusiast who just really likes the way the world is going, but mildly prefers some future developments to others, and the distraught one, who thinks the world has gone to the dogs, but regards some future developments as even worse than others. You can see how the the choice-dispositions of the two given the same evidence could match despite their very different attitudes. So given *only* information about the choice-dispositions of a target, you wouldn’t know whether to interpret the target as an enthusiast or their distraight friend.

While the above gloss is impressionistic, it reflects a deep challenge to the attempt to operationalize or otherwise reduce belief-desire psychology to patterns of choice-behaviour. It receives its fullest formal articulation in the claim that positive affine transformations of a utility function will preserve the “expected utility property”. (Any positive monotone transformation of a utility function will preserve the same ordering over options. The mathetically interesting bit here is that the positive affine transformations of utility function guarantee that the pattern between preferences over outcomes and preferences over acts that bring about those outcomes, mediated by credences in the act-outcome links, are all preserved).

One reaction to this in-principle limitation is to draw the conclusion that really, there are no objective facts about the level of desire we each have in an outcome, or how much more desirable we find one thing than another. A famous consequence of drawing that conclusion is that no objective sense could be made out of questions like: do I desire this pizza slice more or less than you do? Or questions like: does the amount by which I desire the pizza more than the crisps exceed the amount you desire the pizza more than the crisps? And clearly if desires aren’t “interpersonally comparable” in this sort of ways, certain ways of appealing to them within accounts of how its appropriate to trade off one person’s desires against another’s won’t make sense. A Rawlsian might say: if there’s pizza going spare, give it to the person for whom things are going worst (for whom the current situation, pre-pizza, is most undesirable). A utilitarian might say: if everyone is going to get pizza or crisps, and everyone prefers pizza to crisps, give the pizza to the person who’ll appreciate it the most (i.e. prefers pizza over crisps more than anyone else). If the whole idea of interpersonal comparisons of level and differences of desirability are nonsense, however, then those proposals write cheques that the metaphysics of attitudes can’t pay.

(As an aside, it’s worth noting at this point that you could have Rawlsian or utilitarian distribution principles that work with quantities other than desire—some kind of objective “value of the outcome for each person”. It seems to me that if the metaphysics of value underwrites interpersonally comparable quantities like the levels of goodness-for-Sally for pizza, and goodness-difference-between-pizza-and-crisps-for-Harry, then the metaphysics of desires should be such that Sally and Harry’s desire-state will, if tuned in correctly, reflect these levels and differences.)

It’s not only the utilitarian and Rawlsian distribution principles (framed in terms of desires) that have false metaphysical presuppositions if facts about levels and differences in desire are not a thing. Intraindividual ties between intensities of emotional reaction and strength of desire, and between type of emotional reaction and valence of desire, will have false metaphysical presuppositions if facts about an individual’s desire are invariant under affine tranformation. Affine transformations can change the “zero point” on the scale on which we measure desirability, and shrink or grow the differences between desirabilities. But we can’t regard zero-points or strengths of gaps as merely projections of the theorist (“arbitrary choices of unit and scale”) if we’re going to tie to them to real rational constraints on type and intensity of emotional reaction.

However. Suppose in the interpretive scenario I gave you, you knew not only the choice-behaviour of your target in a range of varying evidential situations, but also their emotional responses to the outcome of their acts. Under the same injunction to find a (structurally) rationalizing interpretation of the target, you’d now have much more to go on. When they have emotional reactions rationally linked to indifference, you would attribute a zero-point in the level of desirability. When an outcome is met with joy, and another with mere happiness, you would attribute a difference in desire (of that person, for that outcome) that makes sense of both. Information about emotions, together with an account of the rationality of emotions, allow us to set the scale and unit in interpreting an individual, in a way choice-behaviour alone struggles to. As a byproduct, we would then have a epistemic path to interpersonal comparability of desires. And in fact, this looks like an epistemic path that’s pretty commonly available in typical interpersonal situations–the emotional reactions of others are not *more* difficult to observe than the intentions with which they act or the evidence that is available to them. Emotions, choices and a person’s evidence are all interestingly epistemically problematic, but they are “directly manifestable” in a way that contrasts with the beliefs and desires that mesh with them.

The epistemic path suggests a metaphysical path to grounding levels and relative intensities of desires. Just as you can end up with a metaphysical argument against interpersonal comparability of desires by commiting oneself to grounding facts about desires in patterns of choice-behaviour, and then noting the mathematical limits of that project, you can get, I think, a metaphysical vindication of interpersonal comparabiilty of desire by including in the “base level facts” upon which facts about belief and desire are grounded facts about, type, intensity and valence of intentional emotional states. As a result, the metaphysical presuppositions of the desire-based Rawlsian and utilitarian distribution principles are met, and our desires have the structure necessary to capture and reflect level and valence of any good-for-x facts that might feature in a non-desire based articulation of those kind of principles.

In my book The Metaphysics of Representation I divided the task of grounding intentionality into three parts. First, grounding base-level facts about choice and perceptual evidence (I did this by borrowing from the teleosemantics literature). Then grounding belief-desire intentional facts in the base-level facts, via a broadly Lewisian metaphysical form of radical interpretation. (The third level concerned representational artefacts like words, but needn’t concern us here). In these terms, what I’m contemplating is to add intensional emotional states to the base level, using that to vindicate a richer structure of belief and desire.

Now, this is not the only way to vindicate levels and strength of desires (and their interpersonal comparability) in this kind of framework. I also argue in the book that the content-fixing notion of “correct interpretation” should use a substantive conception of “rationality”. The interpreter should not just select any old structurally-rationalizing interpretation of their target, but will go for the one that makes them closest to an ideal, where the ideal agent responds to their reasons appropriately. If an ideal agent’s strength and levels of desire are aligned, for example, to the strength and level of value-for-the-agent present in a situation, then this gives us a principled way to select between choice-theoretically equivalent interpretations of a target, grounding choices of unit and scale and interpersonal comparisons. I think that’s all good! But I think that including emotional reactions as constraining factors in interpretation can help motivate the hypothesis that there will be facts about the strength and level of desire *of the ideal agent*, and gives a bottom-up data-based constraint on such attributions that complements the top-down substantive-rationality constraint on attributions already present in my picture.

I started thinking about this topic with an introspectively-based conviction that *of course* there are facts about how much I want something, and whether I want it or want it not to happen. I still think all this. But I hope that I’ve now managed to identify how those convinctions to their roles in a wider theoretical edifice–their rational interactions with *obvious* truths about features of our emotional lives, the role of these in distribution principles, which give a fuller sense of what is at stake if we start denying that the metaphysics of attitudes has this rich structure. I can’t see much reason to go against this, *unless* you are in the grip of a certain picture of how attitudes get metaphysically grounded in choice-behaviour. And I like a version of that picture! But I’ve also sketched how the very links to emotional states give you a version of that kind of metaphysical theory that doesn’t have the unwelcome, counterintuitive consequences its often associated with.

## Proximal desires.

How might we measure the proximity or similarity of two belief states? Suppose they are represented in each case as a function from propositions to real numbers between 0 and 1, representing their respective degrees of belief. Is it possible to find a sensible and formally tractable measure of how similar these two states are?

How might we measure the proximity or similarity of two desire states? Suppose they are represented in each case as a function from propositions to real numbers, representing how desirable the agent finds each proposition being true. Is it possible to find a sensible and formally tractable measure of how similar these two states are?

The TL;DR of what follows is: I think we can find a measure of both (or better, a measure of the proximity of pairs of combined belief-desire states). And this idea of proximity between belief-desire psychologies is key to explaining the force of theoretical rationality constraints (probabilism) and means-end practical rationality constraints (causal expected utility theory). Furthermore, it’s the notion we need to articulate the role of “principles of charity” in metasemantics.

The first question above is one that has arisen prominently in accuracy-first formal epistemology. As the name suggests, the starting point of that project is a measure of the accuracy of belief states. Richard Pettigrew glosses the accuracy of a credence function at a world as its “proximity to the ideal credence at that world” (Accuracy and the laws of credence, p.47). If you buy Pettigrew’s main arguments for features of belief-proximity in chapter 4 of this book, then it’s a mathematical consequence that belief-proximity is what’s known as an “additive Bregman divergence”, and if you in addition think that the distance from belief b to belief b* is always the same as the distance from belief b* to belief b (i.e. proximity is symmetric) then one can prove, essentially, that the right way to measure the proximity of belief states Alpha and Beta is by taking the “squared Euclidean distance”, i.e. to take each proposition, take the difference between the real number representing Alpha’s credence in it and that representing Beta’s credence in it, take the square of this difference, and sum up the results over all propositions.

Now, once you have this measure of proximity to play with, accuracy-firsters like Pettigrew can put it to work in their arguments for the rational constraints on belief. Accuracy of a belief in w is proximity to the ideal belief state in w; if the ideal belief state for an agent x in w is one that matches the truth values of each proposition (“veritism”) then one can extract from the measure of proximity a measure of accuracy, and go on to prove, for example, that a non-probabilistic belief state b will be “accuracy dominated”, i.e. there will be some alternative belief state b* which is *necessarily* more accurate than it.

So far, so familiar. I like this way of relating theoretical rational constraints like probabilism to what’s ultimately valuable in belief–truth. But I’m also interested in notion of proximity for other reasons. In particular, when working in metasemantics, I want to think about principles of interpretation that take the following shape:

(I) On the basis of the interpreter’s knowledge of some primary data, and given constraints that tie possible belief states to features of that primary data, the interpreter is in a position to know that the target of interpretation has a belief state within a set C.

(II) The interpreter attributes to the target of interpretation that belief state within C which is closest to belief state m.

To fix ideas: the set C in (I) might arise out of a process of finding a probability-utility pair which rationalizes the target’s choice behaviour (i.e. always makes the option the target chooses the one which maximizes expected utility, by their lights, among the options they choose between). The magnetic belief state “m” in (II) might be the ideal belief state to have, by the interpreter’s lights, given what they know about the target’s evidential setting. Or it might be the belief state the interpreter would have in the target’s evidential setting.

There are lots of refinements we might want to add (allowing m to be non-unique, catering for situations in which there are several elements in C that are tied for closeness to m). We might want to clarify whether (I) and (II) are principles of practical interpretation, somehow mapping the processes or proper outputs of a real-life flesh and blood interpreter, or whether this is intended as a bit of theory of ideal interpretation, carried out on the basis of total “knowledge” of primary facts about the target. But I’ll set all that aside.

The thing I want to highlight is that step (II) of the process above makes essential use of a proximity measure. And it’s pretty plausible that we’re here shopping in the same aisle as the accuracy-first theorists. After all, a truth-maximizing conception of principles of interpretation would naturally want to construe (II) as attributing to the subject the most accurate belief state within the set C, and we’ll get that if we set the “ideal” credence (in a given world) to be the credal state that matches the truth values at that world, in line with Pettigrew’s veritism, and understand proximity in the way Pettigrew encourages us to. Pettigrew in fact defends his characterization of proximity independently of any particular identification of what the ideal credences are. If you were convinced by Pettigrew’s discussion, then even if the “ideal credence” m for the purposes of interpretation is different from the “ideal credence” for the purposes of the most fundamental doxastic evaluation, you’ll still think that the measure of proximity—additive Bregman divergence/squared Euclidean distance–is relevant in both cases.

That’s the end of the (present) discussion as far as belief goes. I want to turn to an extension to this picture that becomes pressing when we think of this in the context of principles of interpretation. For in the implementations that I am most interested in, what we get out of step (I) is not a set of belief states alone, but a set of belief-desire psychologies—a pairing of credence and utility functions, for example. Now, it’s possible that the second step of interpretation, (II), cares only about what goes on with belief—picking the belief-desire psychologies whose belief component is closest to the truth, to the evidence, or to the belief state component of some other relevantly magnetic psychological state. But the more natural version of this picture wouldn’t simply forget about the desires that are also being attributed. And if it is proximity between belief-desire psychologies in C and magnetic belief-desire psychology m that is at issue, we are appealing to a proximity not between belief states alone, but proximity between pairs of belief-desire states.

If desire states are represented by a function from propositions to real numbers (degrees of desirability) then there’s clearly a straight formal extension of the above method available to us. If we used squared euclidean distance as a measure of the proximity or similarity of a pair of belief states, use exactly the same formula for measuring the proximity of similarity of desire! But Pettigrew’s arguments for the characteristics which select that measure do not all go over. In particular, Pettgrew’s most extended discussion is in defence of a “decomposition” assumption that makes essential use of notions (e.g. “well-calibrated counterpart of the belief state”) that do not have any obvious analogue for belief-desire psychologies.

Is there anything to be said for the squared euclidean distance measure of proximity between belief-desire psychologies, in the absence of an analogue of what Pettigrew says in the special case of proximity of belief states? Well, one thing we can note is that as it extends Pettigrew’s measure of the proximity of belief states, it’s consistent with it–a straight generalization is the natural first hypothesis for belief-desire proximity to try, relative to the Pettigrew starting point. What I want to now discuss is a way of getting indirect support for it. What I’ll argue is that it can do work for us analogous to the work that it does for the probabilist in accuracy framework.

To get the inaccuracy framework off the ground, recall, Pettigrew commits to the identification of an ideal belief state at each world. The ideal belief state at w is that belief state whose levels of confidence in each proposition matches the truth value of that proposition at w (1 if the proposition is true, 0 if the proposition is false). To add something similar for desire, instead of truth values, let’s start from a fundamental value function defined over the worlds, V, measured by a real number. You can think of the fundamental valuation relation as fixed objectively (the objective goodness of the world in question), fixed objectively relative to an agent (the objective goodness of the world for that agent), as a projection of values embraced by the particular agent, or some kind of mix of the above. Pick your favourite and we’ll move on.

I say: the ideal degree of desirability for our agent to attach to the proposition that w is the case is V(w). But what is the ideal degree of desirability for other, less committal propositions? Here’s a very natural thought (I’ll come back to alternatives later): look at which world would come about were p the case, and the ideal desirability of p is just V(w) for that w which p counterfactually implies. (This, by the way, is a proposal that makes heavy reliance on the Stalnakerian idea that for every world there is a unique closest world where p). So we extend V(w), defined over worlds, to V*(p), defined over propositions, via counterfactual connections of this kind.

If we have this conception of ideal desires, and also the squared-euclidean measure of proximity between desire states, then a notion of “distance from the ideal desire state” drops out. Call this measure the misalignment of a desire state. If we have the squared-euclidean masure of proximity between combined belief-desire states, then what drops out is a notion of “distance from the ideal belief-desire state”, which is simply the sum of the inaccuracy of its belief component and the misalignment of its desire component.

The fundamental result that accuracy-firsters point to as a vindication of probabilism is this: unless a belief state b is a convex combination of truth values (i.e. a probability function) then there will be a b* which is necessarily more accurate than b. In this setting, the same underlying result (as far as I kind see—there are a few nice details about finitude and boundedness to sort out) delivers this: unless belief-desire state <b,d> is a convex combination over w of vectors of the form <truth-value at w, V-value at w>, then will be some alternative psychology <b*,d*> which will necessarily be closer to the ideal psychology (more accurate-and-aligned) than is <b,d>.

What must undominated belief-desire psychologies be like? We know they must be convex combinations of <truth value at w, V-value at w> pairs for varying w. The b component will then be a convex combination of truth values with weights k(w), i.e. a probability function that invests credence k(w) in w. More generally, both the b and d components are expectations of random variables with weights k(w). b(p) will be the expectation of indicator random variables for proposition p, and d(p) the expectation of the value-of-p random variable. The expectation of the value-of-p random variable turns out to be equal to the sum over all possible values of k of the following: k multiplied by the agent’s degree of belief of the counterfactual conditional if p had been the case, then value of the world would be k. And that, in effect, is Gibbard-Harper’s version of causal decision theory.

If the sketch above is correct, then measuring proximity of whole psychologies by squared euclidean distance (or more generally, an additive Bregman divergence), will afford a combined accuracy-domination argument for probabilism and value-domination argument for causal decision theory. That’s nice!

Notice that there’s some obvious modularity in the argument. I already noted that we could treat V(w) as objective, relativized or subjective value. Further, we get the particular Gibbard-Harper form of causal decision theory because we extended the ideal V over worlds to ideal V* over propositions via counterfactual conditionals concerning which world would obtain if p were the case. If instead defined the ideal V* in p as the weighted average of the values of worlds, weighted by the conditional chance of that world obtaining given p, then we’d end up with an expected-chance formulation of causal decision theory. If we defined the ideal V* in p via combinations of counterfactuals about chance, we would derive a Lewisian formulation of causal decision theory. If we reinterpret the conditional in the formulation given above as an indicative conditional, then we get a variant of evidential decision theory, coinciding with Jeffrey’s expected decision theory only if the probability of the relevant conditional is always equal to the corresponding conditional probability (that thesis, though, is famously problematic).

Okay, so let’s sum up. What has happened here is that, for the purposes of formulating a particular kind of metasemantics of belief and desire, we need a notion of proximity of whole belief-desire psychologies for one another. Now, Pettigrew has explicitly argued for specific way of measuring proximity for the belief side of the psychology. The natural thing to do is to extend his arguments to descriptions of psychological states including desires as well as beliefs. But unfortunately, his arguments for that specific way of handling proximity look too tied to belief. However, we can provide a more indirect abductive argument for the straight generalization of this way of measuring proxmity over belief-desire psychologies by (a) endorsing Pettigrew’s arguments for the special case of belief; and (b) noting that the straight generalization of this would provide a uniform vindication of both probabilism and standard rational requirements on desire as well as belief.

This, at least, makes me feel that I should be pretty comfortable at appealing to a notion of “proximity to the magnetic belief-desire state m” in formulating metasemantics in the style above, and measuring this by squared Euclidean distance—at least insofar as I am bought in to the conception of accuracy that Pettigrew sketches.

Let me make one final note. I’ve been talking throughout as if we all understood what real-valued “degrees of desire” are. And the truth is, I believe I do understand this. I think that I have neutral desire for some worlds/propositions, positive desire for others, negative for a third. I think that we can measure and compare the gap between the desirability of two propositions—the difference between the desirability of eating cake and eating mud is much greater than the difference between the the desirability of overnight oats and porridge. I think there are facts of the matter about whether you and I desire the same proposition equally, or whether I desire it more than you, or you desire it more than me.

But famously, some are baffled by interpersonal comparisons of utility, or features of the utility-scale, of the kind I instinctively like. If you think of attributing utility as all about finding representations that vindicate choice behaviour, interpersonal comparisons will be as weird as the idea of an interpersonal choice. The whole project of measuring proximity between desireability states via functions on their representations as real values might look like a weird starting point. If you google the literature on similarity measures for utility, you’ll find a lot of work on similarity of preference orderings e.g. by counting how many reversals of the orderings it takes to turn one into another. You might think this is a much less controversial starting point than what I’m doing, and that I need to do a whole heap more work to earn the right to my starting point.

I think the boot is on the other foot. The mental metasemantics in which I aim to deploy this notion of proximity denies that all there is to attributing utility is to find a representation that vindicates the agent’s choice behaviour. That’s step I, but step II goes beyond this to play favourites among the set of vindicatory psychological states. By the same token, the mental metasemantics sketched grounds interpersonal comparisons of desirability between various agents, by way of facts about the proximity of the desirability of the agent’s psychology to the magnetic psychological state m.

There’s a kind of dialectical stalemate here. If interpersonal comparisons are a busted flush, the prospects look dim for any kind of proximity measure of the kind I’m after here (i.e. one that extends the proximity implicit in accuracy-first framework). If however, the kind of proximity measures I’ve been discussing make sense, then we can use them to ground the real-value representations of agent’s psychological states that make possible interpersonal comparisons. I don’t think either myself or my more traditional operationalizing opponent here should be throwing shade at the other at this stage of development–rather, each should be allowed develop their overall account of rational psychology, and at the end of the process we an come back and compare notes about whose starting assumptions were ultimately more fruitful.

## Comparative conventionality

The TL;DR summary of what follows is that we should quantify the conventionality of a regularity (David-Lewis-style) as follows:

A regularity R in the behaviour of population P in a recurring situation S, is a convention of depth x, breadth y and degree z when there is a recurring situation T that refines S, and in each instance of T there is a subpopulation K of P, such that it’s true and common knowledge among K in that instance that:

(A) BEHAVIOUR CONDITION: everyone in K conforms to R
(B) EXPECTATION CONDITION: everyone in K expects everyone else in K to conform to R
(C) SPECIAL PREFERENCE CONDITION: everyone in K prefers that they conform to R conditionally on everyone else in K conforming to R.

where x (depth) is the fraction of S-situations which are T, y (breadth) is the fraction of all Ps involved who are Ks in this instance, and z is the degree to which (A-C) obtaining resembles a coordination equilibrium that solves a coordination problem among the Ks.

From grades of conventionality so defined, we can characterize in the obvious way a partial ordering of regularities by whether one is more of a convention than another. What I have set out differs in several respects from what Lewis himself proposed along these lines. The rest of the post spells out why.

The first thing to note is that in Convention Lewis revises and re-revises what it takes to be a convention. The above partial version is a generalization of his early formulations in the book. Here’s a version of his original:

A regularity R in the behaviour of a population P in a recurring situation S is a convention if and only it is true that, and common knowledge in P that:

(A) BEHAVIOUR CONDITION: everyone conforms to R
(B) EXPECTATION CONDITION: everyone expects everyone else to conform to R
(C) SPECIAL PREFERENCE CONDITION: everyone prefers that they conform to R conditionally on everyone else conforming to R.

where (C) holds because S is a coordination problem and uniform conformity to R is a coordination equilibrium in S.

A clarificatory note: in some conventions (e.g. a group of friends meeting in the same place week after week) the population in question are all present in instances of the recurring situation. But in others—languages, road driving conventions—the recurring situation involves more or less arbitrary selection of pairs, triples, etc of indiviuduals from a far larger situation. When we read the clauses, the intended reading is that the quantifiers “everyone” be restricted just to those members of the population who are present in the relevant instance of the recurring situation. The condition is then that it’s common knowledge instance-by-instance *between conversational participants* or *between a pair of drivers* what they’ll do, what they expect, what they prefer, and so on. That matters! For example, it might be that strictly there is no common knowledge at all among *everyone on the road* about what side of the road to drive on. I may be completely confident that there’s at least one person within the next 200 miles not following the relevant regularity. Still, I may share common knowledge with each individual I encounter, that in this local situation we are going to conform, that we have the psychological states backing that up, etc. (For Lewis’s discussion of this, see his discussion of generality “in sensu diviso” over instances).

Let me now tell the story about how Lewis’s own proposal arose. First, we need to see his penultimate characterization of a convention:

A regularity R in the behaviour of P in a recurring state S, is a perfect convention when it’s common knowledge among P in any instance of S that:

(A) BEHAVIOUR CONDITION: everyone conforms to R
(B) EXPECTATION CONDITION: everyone expects everyone else to conform to R
(C) SPECIAL PREFERENCE CONDITION: everyone prefers that they conform to R conditionally on everyone else conforming to R.
(D) GENERAL PREFERENCE CONDITION: everyone prefers that anyone conform to R conditionally on all but one conform to R.
(E) COOPERATION CONDITION: everyone has approximately the same preferences regarding all possible combinations of actions
(F) There exists an alternative regularity R* incompatible with R, which also meets the analogue of (C) and (D).

The explicit appeal to coordination problems and their solution by coordination equilibria has disappeared. Replacing them are the three clauses (D-F). In (D) and (E) Lewis ensures that the scenario resembles recurring games of pure cooperation in a two specific, independent respects. Games of pure cooperation have exact match of preferences over all possible combinations of outcomes (cf. (E)’s approximate match). And because of this perfect match, if any one person prefers to conform conditionally on others conforming, all others share that preference too (cf (D)). So by requiring (D) we preserve a structural feature of coordination problems, and by requiring (C) we require some kind of approximation to a coordination problem. (F) on the other hand is a generalization of the condition that these games have more than one “solution” in the technical sense, and so are coordination *problems*.

It’s striking that, as far as I can see, Lewis says nothing about what further explanatory significance (beyond being analytic of David Lewis’s concept of convention) these three features enjoy. That contrasts with the explanatory power of (A-C) being true and common knowledge, which is at the heart of the idea of a rationally self-sustaining regularity in behaviour. I think it’s well worth keeping (A-C) and (D-F) separate in one’s mind when thinking through these matters, if only for this reason.

Here’s the Lewisian proposal to measure degree of conventionality:

A regularity R in the behaviour of P in a recurring situation S, is a convention to at least degree <z,a,b,c,d,e,f> when it’s common knowledge among P in at least fraction z of instances of S that:

(A*) BEHAVIOUR CONDITION: everyone in some fraction a of P conforms to R
(B*) EXPECTATION CONDITION: everyone in some fraction b of P expects a fraction of at least a of P else to conform to R
(C*) SPECIAL PREFERENCE CONDITION: everyone in some fraction c of P prefers that they conform to R conditionally on everyone in fraction a of P conforming to R.
(D*) GENERAL PREFERENCE CONDITION: everyone in some fraction d of P prefers that anyone conform to R conditionally on everyone in fraction a of P conforming to R.
(E*) COOPERATION CONDITION: everyone on some fraction e of P has approximately the same preferences regarding all possible combinations of actions
(F*) there exists an alternative regularity R* incompatible with R in fraction f of cases, which also meets the analogue of (C) and (D).

The degree of conventionality of R is then defined to be the set of tuples such that R is a convention to degree at least that tuple. A partial order of comparative conventionality can then be defined in the obvious way.

While measuring the degree to which the clauses of the characterization of perfect conventionality are met is a natural idea, there’s just no guarantee that it tracks anything we might want from a notion of partial conventionality, e.g. “resemblance to a perfect convention”. I’ll divide my remarks into two clusters: first on (A-C), and then on (D-F).

One the original conception, the (A-C) clauses work together in order to explain what a convention explains. That’s why, after all, Lewis makes sure that in clause C* the conditional preference is condition on the obtaining of the very fraction mentioned in clause (A*) and (B*). But more than this is required.

On that original conception, the rationality of conformity to (A) is to be explained by (common knowledge of) the expectations and preferences in (B) and (C). Where everyone has the expectations and preferences, the rationalization story roles along nicely. But once we allow exceptions, things break down.

Consider, first, the limit case where nobody at all has the expectation or preference (so (B,C) are met to degree zero). A regularity in conforming to the regularity can then be entirely accidental, obtaining independently of the attitudes prevailing among those conforming. Such situations lack the defining charactistics of a convention. But (holding other factors equal) Lewis’s definition orders them by how many people in the situation conform to the regularity. So, Lewis finds an ordering where this is really none to be had. That’s bad.

Consider, second, a case where the population divides evenly into two parts: those who have the preference but no expectation, and those who have the expectation but no preference. No person in any instance will have both the expectation and preference that in the paradigm cases work together to rationality support the regularity. To build a counterexample to Lewis’s analysis of comparative conventionality out of this, consider a situation where the expectation and preference clause are met to degree 0.4, but by the same group, which rationalizes 0.4 conformity. Now we have a situation where expectations and preferences do sustain the level of conformity, and so (all else equal) it deserves to be called a partial convention. But on Lewis’s characterization it is less of a convention than a situation where 50% of people have the preference, a non-overlapping 50% have the expectation, and 40% irrationally conform to the regularity. The correct view is that the former regularity is more conventional than the latter. Lewis says the opposite. I conclude Lewis characterized the notion of degree of convention in the wrong way.

Let me turn to the way he handles (D-F). What’s going on here, I think, is that he’s picking up three specific ways in which what’s going on can resemble a solution to a coordination problem. But there are again multiple problems. For a start, there are the kind of lack-of-overlap problems we just saw above. A situation where 40% of the people conform, and meet the relevant expectations and preference clause, and perfectly match in preferences over all relevant situations, is ranked *below* situations where 40% of people conform, meet the relevant expectations and preference clause, and are completely diverse in their preferences *but* the remaining 60% of the population has perfectly matched preferences against conformity to R. That’s no good at all!

But as well as the considerations about overlap, the details of the respects of similarity seem to me suspect. For example, consider a scenario where (A-C) are fully met, and everybody has preferences that diverge just too much to count as approximately the same, so (E) is met to degree zero. And compare that to a situation where two people have approximately the same preferences, and 98 others have completely divergent preferences. Then (E) is met to degree 0.02. The first is much more similar to perfect match of preferences than the second, but Lewis’s ranking gives the opposite verdict. (This reflects the weird feature that he loosens the clause from exact match to approximate match, and then on *top* of that loosening, imposes a measure of degree of satisfaction. I really think that the right thing here is to stick with a measure of similarity of preference among a relevant group of people, rather than counting pairwise exact match).

I’d fold in clause F into the discussion at this point, but my main concerns about it would really turn into concerns about whether Lewis’s model of conventions as equilibria is right, and that’d take me too far afield. So I’ll pass over it in silence.

To summarize. Lewis’s characterization of degrees of conventionality looks like it misfires a lot. The most important thing wrong with it that it doesn’t impose any sort of requirement that its clauses to be simultaneously satisfied. And that leaves it open to the kind of problems below.

My own proposal, which I listed at the start of this post, seems to me to be the natural way to fix this problem. I say: what we need to do is look for “kernals” of self-sustaining subpopulations, where we insist that each member of the kernal meets the conformity, expectation and preference conditions perfectly. The size of this kernal, as a fraction of those in the population involved in the situation, then measures how closely we approximate the original case. That fraction I called the “depth” of the convention, where a convention with depth 1 involves everyone involved in any instance of the situation pulling their weight, and a convention with depth 0.5 being one where only half are involved, but where that is still just as rationally self-sustaining as a case of perfect convention. We might introduce the neologism “depth of subconvention” to articulate this:

A regularity R in the behaviour of P in a recurring situation S, is a sub-convention of depth x when in every instance of S there is kernal K of the members of P such that it’s true and common knowledge among K in this instance of S that:

(A**) BEHAVIOUR CONDITION: everyone in K conforms to R
(B**) EXPECTATION CONDITION: everyone in K expects everyone in K to conform to R
(C**) SPECIAL PREFERENCE CONDITION: everyone in K prefers that they conform to R conditionally on everyone in K conforming to R.

and x is the fraction of P in the instance of S who are in K.

(These clauses contain a free variable K embedded in specifications of preference and expectation. So what is the content of the preferences and the expectations we’re here requiring? Do the people in the kernal satisfying the conditions need to conceive of the others in K who they expect to conform as being large enough (size k?) Or is it enough that they form preferences and expectations about a subgroup of those involved in the present instance, where that subgroup happens to be of size k? I go with the former, more liberal understanding. In cases where participants interests are grounded in the kind of success that requires k people to cooperate, then (C**) will likely not be met unless all participants have the belief that there are at least k of them. But that isn’t written into the clauses—and I don’t think it should be. Size might matter, but there’s no reason to think it always matters.)

To see why “breadth” as well as “depth” matters, consider the following setup. Suppose that our overall population P divides into conformers C (90%) and the defectors D (10%). The conformers are such that in any instance of S they will satisfy (A-C), whereas the defectors never do (for simplicity, suppose they violate all three conditions). So, if you’re a conformer, you always conform to R whenever you’re in S, because you prefer to do so if 90% of the others in that situation do, and you expect at least 90% of them to do so.

If everyone in P is present in each instance of S, this will be a straightforward instance of a partial subconvention, to degree 0.9. The biggest kernal witnessing the truth of the above clauses is simply the set of conformers, who are all present in every case.

But now consider a variantion where not all members of P are present in every case. Stipulate that the members of P present in a given instance of S are drawn randomly from the population as a whole. This will not be a partial convention to degree 0.9. That is because there will be instances of S where by chance, too many defectors are present, and the set of conformers is less than the fraction 0.9 of the total involved in that situation. So the set of conformers present in a given instance is sometimes but not always a “kernal” that meets the conditions laid down. Indeed, it is not a convention to any positive degree, because it could randomly be that only defectors are selected for an instance of S, and in that instance there is no kernal of size >0 satisfying the clauses. So by the above definition it won’t be a partial convention to any positive degree, even if such instances are exceptionally rare.

What we need to avoid this is to provide for exceptions to the “breadth” of the convention, i.e. the instances of S where the clauses are met, as Lewis does:

A regularity R in the behaviour of population P in a recurring situation S, is a convention of depth x, breadth y when there is a recurring situation T that refines S, and in each instance of T there is a subpopulation K of P, such that it’s true and common knowledge among K in that instance that:

(A**) BEHAVIOUR CONDITION: everyone in K conforms to R
(B**) EXPECTATION CONDITION: everyone in K expects everyone in K to conform to R
(C**) SPECIAL PREFERENCE CONDITION: everyone in K prefers that they conform to R conditionally on everyone in K conforming to R

and x is the fraction of S situations that are T situations and y is the fraction of P in the instance of T that are in K.

(I’ve written this in terms of a new recurring state T, rather than (per Lewis) talking about a fraction of the original recurring state type, to bring out the following feature. In the special case I’ve been discussing, where the largest kernal witnessing the truth of these clauses is simply those conformers present in C, then when the clauses are met with depth x and breadth y with respect to S and P, they will be met with depth 1 and breadth 1 with respect to T and C. That is: in this special case, the clauses in effect require there be a perfect subconvention with respect to some subpopulation and substitution of the population and situation we start from. Depth and Breadth of subconventionality is then measuring the fraction of the overall population and state that these “occupy”.

What do we now think about the remaining clauses of Lewis’s definition? I think there’s no obvious motive for extending the strategy I’ve pursued to this point, of requiring these clauses be satisfied perfectly by the kernal K. After all, (common knowledge of) the satisfaction of (A-C) already provides for the rational stability of the pattern of conformity. But equally (as we saw in one of my earlier objections to Lewis) we don’t want to measure the fraction of all those involved in the recurring situation who satisfy the clauses, else we’ll be back to problems of lack of overlap. What we want to do is take the kernal we have secured from subconvention conditions already set down, and look at the characteristics of the regularity that prevails among them. To what extent is that rationally stable regularity a convention? And that brings us right up to my official proposal, repeated here:

A regularity R in the behaviour of population P in a recurring situation S, is a convention of depth x, breadth y and degree z when there is a recurring situation T that refines S, and in each instance of T there is a subpopulation K of P, such that it’s true and common knowledge among K in that instance that:

(A) BEHAVIOUR CONDITION: everyone in K conforms to R
(B) EXPECTATION CONDITION: everyone in K expects everyone else in K to conform to R
(C) SPECIAL PREFERENCE CONDITION: everyone in K prefers that they conform to R conditionally on everyone else in K conforming to R.

where x (depth) is the fraction of S-situations which are T, y (breadth) is the fraction of all Ps involved who are Ks in this instance, and z is the degree to which (A-C) obtaining resembles a coordination equilibrium that solves a coordination problem among the Ks.

The key thing to note here, compared to the previous version, is that I’ve declined to unpack the notion of “resembling a coordination equilibrium that solves a coordination problem”. For all that’s been said here, you could look at the implicit analysis that Lewis’s (D*-E*) gives of this notion (now restricted to the members of the kernal), and plug that in. But earlier I objected to that characterization–it doesn’t seem to me to that the fraction of people with approximately matching preferences is a good measure of similarity to the original. In the absence of a plausible analysis, better to keep the notion as a working primitive (and if it doesn’t do much explanatory work, as is my current working hypothesis, analyzing that working primitive will be low down the list of priorities).

A closing remark. Lewis’s official position is neither the unrestricted (A-F) nor the quantative (A*-F*) above. Rather, he gives a version of (A-F) in which quantifiers throughout are replaced by ones that allow for exceptions (“almost everyone…”). But as far as I can see, the same kinds of worries arise for this case—for example, given any threshold for how many count as “almost everyone”, almost everyone can have the relevant conditional preference, almost everyone can have the relevant expectation, but it be not the case that almost everyone have both the preference and expectation, and so if almost everyone conforms to a regularity, at least some of that conformity is not rationalized by the attitudes guaranteed by the other clauses. To fix this, we can extract a “threshold” variant from the quantative proposal I have proposed, which would look like this:

A regularity R in the behaviour of population P in a recurring situation S, is a convention when there is a recurring situation T that refines S, and in each instance of T there is a subpopulation K of P, such that it’s true and common knowledge among K in that instance that:

(A) BEHAVIOUR CONDITION: everyone in K conforms to R
(B) EXPECTATION CONDITION: everyone in K expects everyone else in K to conform to R
(C) SPECIAL PREFERENCE CONDITION: everyone in K prefers that they conform to R conditionally on everyone else in K conforming to R.

where almost all S-situations are T, almost all P involved in the instance of T are in K, and (A-C) obtaining is almost a coordination equilibrium that solves a coordination problem among the Ks.

Here “almost a coordination equilibrium” is to be read as “having a high enough degree of similarity to a coordination equilibrium”.

## Taking things for granted

If you believe p, do you believe you believe p?

Here’s one model for thinking about this. You have the first order belief—you believe p. On top of that, you have some extra mechanism (call it doxastic introspection) that monitors your internal state and extracts information about what beliefs you have. If that extra mechanism is working reliably in this instance, you’ll end up with the (true) belief that you believe p.

On the introspective model, it’s easy to see how first and second order beliefs can get out of alignment. One forms the first order belief, and the extra mechanism for some reason doesn’t work (maybe it’s unreliable in edge cases, and this is an edge case of having a first order belief). So you end up with the false belief that you don’t believe p, or (more modestly, suspending judgement on the issue). Only if we had a magical 100% reliable intrspective mechanism should we expect the original conditional to be always true.

There’s a rival way of thinking about this: the entailment model. On this model, the basic doxastic attitude is not belief, but a propositional attitude we can call “taking for granted”. Whenever you take p for granted in the relevant sense, it automatically follows that you believe p; and it also follows that you believe that you believe p, and so on. So long as the only way humans get to believe p is by taking p for granted, it’ll follow that whenever you believe p, you believe that you believe p. So the original conditional is always true, and not by any magical flawless introspective mechanism, but a “common cause” psychological structure that ensures the first order and higher order belief are formed together.

(Compatibly with this, it might be the case that sometimes you believe you believe p, even though you don’t believe p. After all, there’s nothing in the entailment model that guarantees that when you don’t believe p, you believe you don’t believe p. You’d get that additional result if you added to the assumptions above that the only way humans have of forming the higher order belief that they believe p is by taking p for granted. But as things stand, the entailment model allows that your most basic attitude can be: taking for granted that one believes that p. And that doesn’t itself require you to believe that p).

What might “taking for granted” be, such that the entailment relations mentioned above hold? Here I’m taking a leaf out of work I’ve been doing on common knowledge and public information. There, I’ve been starting from a psychological attitude I’ve called “taking for granted among a group G” (or “treating as public among G”). The idea is that things we take for granted among a group are things we hold fixed in deliberation even when simulating other group member’s perspectives. So, for example, I might see a car about to pull out in front of you from a side street, but also see that you can’t see the car. In one sense, I hold fixed in my own deliberation about what to do next that the car is about to pull out. But I do not hold that proposition fixed in the stronger sense, because in simulating your perspective (and so expected choices) in the next few seconds, most of the scenarios involve no car pulling out. On the other hand, that drivers will slam the breaks when they see a new obstacle in their way, that things fall downward when dropped, that every driver wants to avoid crashes–these are all things I hold fixed in simulating any relevant perspective. They are things that I take for granted among the group consisting of me and you. What I take for granted between us has an important role in rationalizing my actions in situations of interdependent decision.

It’s plausible that x taking-p-for-granted among G entails (i) x believes p (since they hold p fixed in their own deliberation); (ii) x believes that all Gs believe p (since they hold p fixed in their simulations of other group-member’s deliberations). Further iterations also follow. I’ve got work elsewhere that lays out a bunch of minimal conditions on this attitude which deliver the result: for x to take-p-for-granted among G is for x to believe that it is commonly believed that p (where common belief is understood as the standard infinite conjunction of iterated higher order belief conditions).

But consider now the limiting singleton case, where x takes-p-for-granted among the group {x}. Following the pattern above, that requires inter alia that (i) x believes p; (ii) x believes that everyone in {x} believes p. The latter is equivalent to: x believes that x believes p. So this primitive attitude of taking for granted, in the strong sense relevant to group deliberation, has as its limiting singleton case an attitude which satisfies the conditions of the entailment model.

Now, it’s a contingent matter whether human psychology contains an attitude like taking-p-for-granted-among-G. But suppose it does do so. Then it would seem otiose for it to contain an additional primitive attitude of first-order belief, when the limiting de se singleton case of taking-for-granted-among-{x} could do the job. Now, it does the job by way of an attitude that is more committal than belief, in one sense. Taking-p-for-granted is correctly held only when p, plus the world meet some logically independent condition q (which includes that one believes that p). But crucially, these extra conditions on taking-for-granted are self-vindicating. When one takes-for-granted among {oneself} that p, then one can go wrong if not-p. But one cannot go wrong by it failing to be the case that one doesn’t believe p, because ex hypothesi, taking p for granted entails believing p. And this goes for all the extra conditions that it takes for taking-for-granted to be correct that go beyond what it takes for believing-p to be correct. So even though “taking for granted” is stronger than belief, it’s no riskier.

On this model of human psychology rather than having to deal with an array of primitive attitudes with metacognitive contents (I believe that p, I believe that I believe that p, etc), we work with attitudes with simple first-order content, but which have an internal functional role which does the work for which you’d otherwise need metacognitive content. There can then be, in addition, really genuine cases of basic attitudes with true metacognitive content (as when I take for granted that I believe p, but do not take for granted p). And there may be specialized situations where that true metacognitive thinking is necessary or helpful. But for the general run of things, we’ll get by with first-order content alone.

Why might we hestitate to go with the entailment model? Well, if we had clear counterinstances to the original conditional, we’d need to be able to describe how they arise. And counterinstances do seem possible. Someone might, for example, accept bets about how they will act in the future (e.g. bet that they’d pick up the right hand box in an attempt to get cake) but when the moment comes, acts in another way (e.g. choose the left hand box). The final behaviour is in line with a belief that there’s cake in the right hand box; the earlier betting behaviour is in line with the agent failing to believe that they believe there’s cake in that box (it is in line, instead, with a belief that they believe there’s cake in the other box).

Now these kind of cases are easily explained by the introspection model as cases where the introspective mechanism misfires. Indeed, that model esssentially posits a special purpose mechanism plugging away in all the normal ways, just so we can say this about the recherche cases where first and higher order beliefs seem to come apart. What can the friend of the entailment model say about this?

There are two strategies here.

One is to appeal to “fragmentation”. The person concerned has a fragmented mind, one of which includes a taking-for-granted-p, and the other of which doesn’t (and instead includes a taking-for-granted-I-believe-p, or perhaps even a taking-for-granted-not-p). The fragments are dominant in different practical situations. If one already thinks that fragments indexed to distinct practical situations is part of what we need to model minds, then it’s no new cost to deploy the resources to make for the kind of case just sketched. By contrast to the introspective model, we don’t have any special machinery functioning in the normal run of cases, but rather a special (but independently motivated) phenomenon arising which accounts for what happens in the rare cases where first and higher order belief comes apart.

Another strategy that’s available is to loosen the earlier assumption that the only way that humans believe p is via taking p for granted. One insists that this is the typical way that humans believe p (and so, typically, when one believes p that’s because one takes p for granted, and hence believes one believes p too). But one allows that there are also states of “pure” believing-that-p, which match only the first-order functional role of taking for granted. (Compare: most of us think there are acceptance-states other than belief–pretense, say–which are like belief in many ways, but where acceptance-that-p is tied to acting-as-if-p only for a restricted range of contexts. Just so, on this account pure belief will be an artificially restricted version of the taking-for-granted, not the usual stock in trade of our cognitive processing, but something which can get into if the need demands, or lapse into as the result of unusual circumstances).

(I don’t want to pin anybody else with either of these models. But I should say that when I’m thinking about the entailment model, I have in mind certain things that Stalnaker says in defence of principles like the conditional from which I start—the idea that believing you believe p when you believe p is the default case, and that it failures of that tie that require justification, not the other way around.)

## A simple formal model of how charity might resolve underdetermination

To a first approximation, decision theoretic representation theorems take a bunch of information about (coherent) choices of an agent x, and spit out probability-utility pairs that (structurally) rationalize each of those choices. Call that the agential candidates for x’s psychology.

Problems arise if there are too many agential candidates for x’s psychology—if we cannot, for example, rule out hypotheses where x believes that the world beyond her immediate vicinity is all void, and where her basic desires solely concern the distribution of properties in that immediate bubble. And I’ve argued in other work that we do get bubble-and-void problems like these.

I also argued in that work that you could resolve some of the resulting underdetermination by appealing to substantive, rather than structural rationality. In particular, I said we make a person more substantively rational by representing her IBE inferences by inferences to genuinely good explanations (like the continued existence of things when they leave her immediate vicinity) than some odd bubble-and-void surrogate.

So can we get a simple model for this? One option is the following. Suppose there are some “ideal priors” that encode all the good forms of inference to the best explanation $Pr_i$. And suppose we’re given total information about the total evidence $E$ available to x (just as we were given total information about her choice-dispositions). Then we can construct an ideal posterior probability, $Pr_i(\cdot|E)$, which are the ideal doxastic attitudes to have in x’s evidential situation. Now, we can’t simply assume that x is epistemically ideal–there’s no guarantee that there’s any probability-utility pair among the agential candidates for x’s psychology whose first element matches $Pr_i(\cdot|E)$. But if we spot ourselves a metric of closeness between probability functions, we can consider the following way of narrowing down the choice-theoretic indeterminacy: the evidential-and-agential candidates for x’s psychology will be those agential candidates for x’s psychology whose first component is maximally close to the probability function $Pr_i(\cdot|E)$.

(One warning about the closeness metric we need—I think you’ll get the wrong results if this were simply a matter of measuring the point-wise similarity of attitudes. Roughly—if you can trace the doxastic differences between two belief states to a single goof that one agent made that the other didn’t, those can be similar even if there are lots of resulting divergences. And a belief state which diverged in many different unrelated ways—but where the resulting differences are less far reaching—should in the relevant sense be less similar to one of the originals than either is from each other. A candidate example: the mashed up state which agrees with both where they agree, and then where they diverge agrees with one or the other at random. So a great deal is packed into this rich closeness ordering. But also: I take it to be a familiar enough notion that is okay to use in these contexts)

So, in any case, that’s my simple model of how evidential charity can combine with decision-theoretic representation to yield the results—with the appeals to substantive rationality packed into the assumption of ideal priors, and the use of the closeness metric being another significant theoretic commitment.

I think we might want to add some further complexity, since it looks like we’ve been appealing to substantive rationality only as it applies to the epistemic side of the coin, and one might equally want to appeal to constraints of substantive rationality on utilities. So along with the ideal priors you might posit ideal “final values” (say, functions from properties of worlds to numbers, which we’d then aggregate—e.g. sum—to determine the ideal utilities to assign to a world). By pairing that with the ideal posterior probability we get an ideal probability-utility pair, relative to the agents evidence (I’m assuming that evidence doesn’t impact the agent’s final values—if it does in a systematic way, then that can be built into this model). Now, given an overall measure of closeness between arbitrary probability-utility pairs (rather than simply between probability pairs) we can replicate the earlier proposal in a more general form: the the evidential-and-agential candidates for x’s psychology will be those agential candidates which are maximally close to the pair $Pr_i(\cdot|E), U_i$.

(As before, this measure of closeness between psychologies will have to do a lot of work. In this case, it’ll have to accommodate rationally permissible idiosyncratic variation in utilities. Alternatively—and this is possible either for the ideal priors or the ideal final values/utilities—we could start from a set of ideal priors and ideal final values, and do something a bit more complex with the selection mechanism—e.g. pick out the member(s) of the set of ideal psychologies and the set of agential candidates psychologies which are closest to one another, attribute the latter to agent as their actual psychology, and the former as the proper idealization of their psychology. This allows different agents to be associated systematically with different ideal psychologies.

This is a description of interpretation-selection that relies heavily on substantive rationality. It is an implementation of the idea that when interpreting others we maximize how favourable a psychology we give them—this maximizing thought is witnessed in the story above by the role played by closeness to an ideal psychology.

I also talked in previous posts about a different kind of interpretation-selection. This is interpretation selection that maximizes, not objective favourability, but similarity to the psychology of the interpreter themself. We can use a variant of the simple model to articulate this. Rather than starting with ideal priors, we let the subscript “i” above indicate that we are working with the priors of the flesh and blood interpreter. We start with this prior, and feed it x’s evidence, in order to get a posterior probability tailored to x’s evidential situation (though processed in the way the interpreter would do). Likewise, rather than working with ideal final values, we start from the final values of the flesh and blood interpreter (if they regard some of their values as idiosyncratic, perhaps this characterizes a space of interpreter-sanctioned final values—that’s formally like allowing the set of ideal final values in the earlier implementation). From that point on, however, interpretation selection is exactly as before. The selected interpretation of x is that one among the agential candidates to be her psychology that is closest the interpreter’s psychology as adjusted and tailored to x’s evidential situation. This is exactly the same story as before, except with the interpreter’s psychology playing the role of the ideal.

Neither of these are yet in a form in which they could be a principle of charity implementable by a flesh and blood agent themselves (neither are principles of epistemic charity). They presuppose, in particular, that one has total access to x’s choice dispositions, and to her total evidence. In general, one will only have partial information at best about each. One way to start to turn it into a simple model of epistemic charity would be to think of there being a set of possible choice-dispositions that for all we flesh-and-blood interpreters know, could be the choice-dispositions of x. Likewise for her possible evidential states. But relative to each set of complete choice-dispositions and evidence pair characterizing our target x, either one of the stories above could be run, picking out a “selecting interpretation” for x in that epistemic possibility (if there’s a credal weighting given to each choice-evidence pair, the interpretation inherits that credal weighting).

In order for a flesh and blood interpreter—even one with insane computational powers—to implement the above, they would need to have knowledge of the starting psychologies on the basis of which the underdetermination is to be resolved (also the ability to reliably judge closeness). If the starting psychology is the interpreter’s own psychology, as on the second, similarity-maximizing reading of the story, then what we need to act is massive amounts of introspection. If the starting point is the an ideal psychology, however, then in order for the recipe to be usable by a flesh and blood interpreter with limited information, they would need to be aware of what the ideal was—what ideal priors are, and what the ideal final values are. If part of the point is to model interpretation by agents who are flawed in the sense of having non-ideal priors and final values (somewhat epistemically biased, somewhat immoral agents) then this is a interesting but problematic thing to credit them with. If the are aware of the right priors, what excuse do they have for the wrong ones? If they know the right final values, why aren’t they valuing things that way?

An account—even an account with this level of abstraction built in—should I think allow for uncertainty and false belief about what the ideal priors and final values are, among the flesh and blood agents who are deploying epistemic charity. So as well as giving our interpreter a set of epistemic possibilities for x’s evidence and choices, we will add in a set of epistemic possibilities for what the ideal priors and values in fact are. But the story is just the same: for any quadruple of x’s evidence, x’s choices, the ideal priors and ideal values, we run the story as given to select an interpretation. And credence distributions on an interpreter’s part across these valuations will be inherited as a credence distribution across the interpretations.

With that as our model of epistemic charity, we can then identify two ways of understanding how an “ideal” interpreter would interpret x, within the similarity-maximization story.

The first idealized similarity-maximization model says that the ideal interpreter knows the total facts of an interpreter, y’s psychology, and also total information about x’s evidence and choices. You feed all that information into the story as given, and you get one kind of result for what the ideal interpretion of x is (one that is relative to y, and in particular, y’s priors and values).

The second idealized similarity-maximization model says that the ideal interpeter knows the total facts about her own psychology, as well as total informationa bout x’s evidence and choices. The ideal interpreter is assumed to have the ideal priors and values, and so maximizing similarity to that psychology just is to maximizing closeness to the ideal. So if we feed all this information into the story as given, and we get a characterization of the ideal interpretation of x that is essentially the same as the favourability-maximization model that I started with.

Ok, so this isn’t yet to argue for any of these models as the best way to go. But if the models are good models of the ways that charity would work, then they might help to fix ideas and explore the relationships among them.

## Maximizing similarity and charity: redux

This is a quick post (because it’s the last beautiful day of the year). But in the last post, I was excited by the thought that a principle of epistemic charity that told you to maximize self-similarity in interpretation would correspond to a principle of metaphysical charity in which the correct belief/desire interpretation of an individual maximized knowledge, morality, and other ideal characteristics.

That seemed nice, because similarity-maximization seemed easier to defend as a reliable practical interpretative principle than maximizing morality/knowledge directly. The similarity-maximization seems to presuppose only that interpreter and interpretee are (with high enough objective probability) cut from the same cloth. A practical knowledge/morality maximization version of charity, on the other hand, looks like it has to get into far more contentious background issues.

But I think this line of thought has a big problem. It’s based on the thought that if the facts about belief and desire are those that the ideal interpreter would attribute. If the ideal interpreter is an omniscient saint (and let’s grant that this is built into the way we understand the idealization) then similarity-maximization will make the ideal interpreter choose theories of any target that make them as close to an omniscient saint as possible—i.e. maximize knowledge and morality.

Alright. But the thing is that similarity maximization as practiced by ordinary human beings is reliable, if it is, because (with high enough probability) we resemble each other in our flaws as well as our perfections. My maximization of Sally’s psychological similarity to myself may produce warranted beliefs because I’m a decent sample of human psychology. But a hypothetical omniscient saint is not even hypothetically a decent sample of human psychology. The ideal interpreter shouldn’t be maximizing Sally’s psychological similarity to themself, but rather her similarity to some representative individual (like me).

Now, you might still get an interesting principle of metaphysical charity out of similarity-maximization, even if you have to make it agent-relative by having the ideal interpeter maximizing similarity to x, for some concrete individual x (if you like, this ideal interpreter is x’s ideal interpretive advisor). If you have this relativization built into metaphysical charity, you will have to do something about the resulting dangline parameter—maybe go for a kind of perspectival relativism about psychological facts, or try to generalize this away as a source of indeterminacy. But it’s not the morality-and-knowledge maximization I originally thought resulted.

I need to think about this dialectic some more: it’s a little complicated. Here’s another angle to approach the issue. You could just stick with characterizing “ideal interpreter” as I originally did, as omniscient saints going through the same de se process as we ourselves do in interpreting others, and stipulate that belief/desire facts are what they those particular ideal interpreters say they are. A question, if we do this, is whether this would undercut a practice of flesh and blood human beings (FAB) interpreting others by maximizing similarity to themselves. Suppose FAB recognizes two candidate interpretations available of a target—and similarity-to-FAB ranks interpretation A over B, whereas similarity-to-an-omniscient-saint ranks B over A. In that situation, won’t the stipulation about what fixes the belief/desire facts mean that FAB should go for B, rather than A? But similarity-maximization charity would require the opposite.

One issue here is whether we could ever find a case instantiating this pattern which doesn’t have a pathological character. For example, if cases of this kind needed FAB to identify a specific thing that the omniscient agent knows, that they do not know—then they’d be committed to the Moorean proposition “the omnsicient saint knows p, but I do not know p”. So perhaps there’s some more room to explore whether the combination of similarlity-maximization and metaphysical charity I originally put forward could be sustained as a package-deal. But for now I think the more natural pairing with similarity-maximization is the disappointingly relativistic kind of metaphysics given above.

## From epistemic to metaphysical charity

I’ll start by recapping a little about epistemic charity. The picture was that we can get some knowledge of other minds from reliable criterion-based rules. We become aware of the behaviour-and-circumstances B of an agent, and form the belief that they are in S, in virtue of a B-to-S rule we have acquired through nature or nuture. But this leaves a lot of what we think we ordinarily know about other minds unexplained (mental states that aren’t plausibly associated with specific criteria). Epistemic charity is a topic-specific rule (a holistic one) which takes us from the evidence acquired e.g. through criterion-based rules like the above, to belief and desire ascriptions. The case for some topic-specific rule will have to be made by pointing to problems with topic-neutral rules that might be thought to do the job (like IBE). Once that negative case is made we can haggle about the character of the subject-specific rule in question.

If we want to make the case that belief-attributions are warranted in the Plantingan sense, the central question will be whether (in worlds like our own, in application to the usual targets, and in normal circumstances) the rule of interpreting others via the charitable instruction to “maximize rationality” is a reliable one. That’s surely a contingent matter, but it might be true. But we shouldn’t assume that just because a rule like this is reliable in application to humans, that we can similarly extend it to other entities—animals and organizations and future general AI.

There’s also the option of defending epistemic charity as the way we ought to interpret others, without saying it leads to beliefs that are warranted in Plantinga’s sense. One way of doing that would be to emphasize and build on some of the pro-social aspects of charity. The idea is that we maximize our personal and collective interests by cooperating, and defaulting to charitable interpretation promotes cooperation. One could imagine charity being not very truth-conducive, and these points about its pragmatic benefits obtaining—especially if we each take advantage of others’ tendancy to charitably interpret us by hiding our flaws as best we can. Now, if we let this override clear evidence of stupidity or malignity, then the beneficial pro-social effects might be outweighed by constant disappointment as people fail to meet our confident expectations. So this may work best as a tie-breaking mechanism, where we maximize individual and collective interest by being as pro-social as possible under constraints of respecting clear evidence.

I think the strongest normative defence of epistemic charity will have to mix and match a bit. It maybe that some aspects of charitable interpretation (e.g. restricting the search space to “theories” of other minds of a certain style, e.g. broadly structurally rational) look tempting targets to defend as reliable, in application to creatures like us. But as we give the principles of interpretation-selection greater and greater optimism bias, they get harder to defend as reliable, and it’s more tempting to reach for a pragmatic defence.

All this was about epistemic charity, and is discussed in the context of flesh and blood creatures forming beliefs about other minds. There’s a different context in which principles of charity get discussed, and that’s in the metaphysics of belief and desire. The job in that case is to take a certain range of ground-floor facts about how an agent is disposed to act and the perceptual information available to them (and perhaps their feelings and emotions too) and then selecting the most reason-responsive interpretation of all those base-level facts. The following is then proposed as a real definition of what it is for an agent to believe that p or desire that q: it is for that belief or desire to be part of the selected interpretation.

Metaphysical charity says what it is for someone to believe or desire something in the first place, doesn’t make reference to any flesh and blood interpreter, and a fortiori doesn’t have its base facts confined to those to which flesh and blood interpreters have access. But the notable thing is that (at this level of abstract definition) it looks like principles of epistemic and metaphysical charity can be paired. Epistemic charity describes, inter alia, a function from a bunch of information about acts/intentions and perceivings to overall interpretations (or sets of interpretations, or credence distributions over sets of interpretations). It looks like you can generate a paired principle of metaphysical charity out of this by applying that function to a particular rich starting set: the totality of (actual and counterfactual) base truths about the intentions/perceivings of the target. (We’ll come back to slippage between the two on the way).

It’s no surprise, then, that advocates of metaphysical charity have often framed the theory in terms of what an “ideal interpreter” would judge. We imagine a super-human agent whose “evidence base” were the totality of base facts about our target, and ask what interpretation (or set of interpretations, or credences over sets of interpretations) they would come up with. An ideal interpeter implementing a maximize-rationality priciple of epistemic charity would pick out the interpretation which maximizes rationality with respect to the total base facts, which is exactly what metaphysical charity selected as the belief-and-desire fixing theory. (What happens if the ideal interpreter would deliver a set of interpretations, rather than a single? That’d correspond to a tweak on metaphysical charity where agreement among all selected interpretations suffices for determinate truth. What if it delivers a credence distribution over such a set? That’d correspond to a second tweak, where the degree of truth is fixed by the ideal interpreters’ credence).

You could derive metaphysical charity from epistemic charity by adding (some refinement of) an ideal-interpreter bridge principle: saying that what it is for an agent to believe that p/desire that q is for it to be the case that an ideal interpreter, with awareness of all and only a certain range of base facts, would attribute those attitudes to them. Granted this, and also the constraint that they any interpreter ought to conform to epistemic charity, anything we say about epistemic charity will induce a corresponding metaphysical charity. The reverse does not hold. It is perfectly consistent to endorse metaphysical charity, but think that epistemic charity is all wrong. But with this ideal-interpreter bridge set up, whatever we say about epistemic charity will carry direct implications for the metaphysics of mental content.

Now metaphysical charity relates to the reliability of epistemic charity in one very limited respect. Given metaphysical charity, epistemic charity is bound to be reliable in one very restricted range of cases: a hypothetical case where a flesh and blood interpreter has total relevant information about the base facts, and so exactly replicates the ideal interpreter counterfactuals about whom fixes the relevant facts. Now, these cases are pure fiction–they do not arise in the actual world. And they cannot be straightforwardly used as the basis for a more general reliability principle.

Here’s a recipe that illustrates this, that I owe to Ed Elliott. Suppose that our total information about x is Z, which leaves open the two total patterns of perceivings/intendings A and B. Ideal interpretation applied to A delivers interpretation 1, the same applied to B delivers interpretation 2. 1 is much more favourable than 2. Epistemic charity applied to limited information Z tells us to attribute 1. But there’s nothing in the ideal interpreter/metaphysical charity picture that tells us A/1 is more likely to come about than B/2.

On the other hand, consider the search-space restrictions—say to interpretations that make a creature rational, or rational-enough. If we have restricted the search space in this way for any interpreter, then we have an ex ante guarantee that whatever the ideal interpreter comes up with, it’ll be an interpretation within their search space, i.e. one that makes the target rational, or rational-enough. So constraints on the interpretive process will be self-vindicating, if we add metaphysical charity/ideal interpeter bridges to the package, though as we saw, maximizing aspects of the methodology will not be.

I think it’s very tempting for fans of epistemic charity to endorse metaphysical charity. It’s not at all clear to me whether fans of metaphysical charity should taken on the burden of defending epistemic charity. If they do, then the key question will be the normative status of any maximizing principles they embrace as part of the characterization of charity.

Let me just finish by emphasizing both the flexibility and the limits to this package deal. The flexibility comes because you can understand “maximize reasonableness within search-space X” or indeed “maximize G-ness within search-space X” in all sorts of ways, and the bulk of the above discussion will go through. That means we can approach epistemic charity by fine-tuning for the maximization principle that allows us the best chance of normative success. On the other hand, there are some approaches that are very difficult to square with metaphysical charity or ideal interpreters. I mentioned in the previous post a “projection” or “maximize similarity to one’s own psychology” principle, which has considerable prima facie attraction—after all, the idea that humans have quite similar psychologies looks like a decent potential starting point. It’ll be complex translating that into a principle of metaphysical charity. What psychology would the ideal interpreter have, similarity of which must be maximized?

Well, perhaps we can make this work: perhaps the ideal interpreter, being ideal, would be omnsicient and saintly? If so, perhaps this form of epistemic charity would predict a kind of knowledge-and-morality-maximization principle in the metaphysical limit. So this is a phenomenon worth noting: metaphysical knowledge-and-morality maximization could potentially be derived either from epistemic similarity-maximization or epistemic knowledge-and-morality maximization. The normative defences these epistemologies of other minds call for would be very different.

## Epistemic charity as proper function.

Our beliefs about the specific beliefs and desires of others are not formed directly on the basis of manifest behaviour or circumstances, simply because in general individual beliefs and desires are not paired up in a one-to-one fashion with specific behaviour/circumstances (that is what I took away from the circularity objection to behaviourism). And with Plantinga, let’s set aside the suggestion we base such attributions in an inference by IBE. As discussed in the last post, the Plantingan complaint is that IBE is only somewhat reliable, and (on a Plantingan theory) this means it could only warrant a rather tenuous, unfirm belief that the explanation is right.

(Probably I should come back to that criticism—it seems important to Plantinga’s case that he thinks there would be close competitors to the other-minds hypothesis, if we were to construe attributions as the result of IBE, as the case for the comparative lack of reliability of IBE is very much stronger when we’re considering picking one out of a bunch of close competitor theories, than when e.g. there’s one candidate explanation that stands out a mile from the field, particularly when we remember we are interested only in reliability in normal circumstances. But surely there are some scientific beliefs that we initially form tentatively by an IBE which we end up believing very firmly, when the explanation they are a part of has survived a long process of testing and confirmation. So this definitely could do with more examination, to see if Plantinga’s charge stands up. It seems to me that Wright’s notion of wide vs. narrow cognitive roles here might be helpful—the thought being that physicalistic explanatory hypothesis we might arrive at by IBE tend to have multiple manifestations and so admit of testing and confirmation in ways that are not just “more of the same” (think: Brownian motion vs statistical mechanical phenomenon as distinct manifestations of an atomic theory of matter.)

What I’m now going to examine is a candidate solution to the second problem of other minds that can sit within a broadly Plantingan framework. Just as with criterion-based inferential rules that on the Plantingan account underpin ascriptions of pain, intentions, perceivings, and the like, the idea will be that we have special purpose belief forming mechanisms that generate (relatively firm) ascriptions of belief and desire. Unlike the IBE model, we’re not trying to subsume the belief formations within some general purpose topic-neutral belief forming mechanism, so it won’t be vulnerable in the way IBE was.

What is the special purpose belief forming mechanism? It’s a famous one: charitable interpretation. The rough idea is that one attributes the most favourable among the available overall interpretations that fits with the data you have about that person. In this case, the “data” may be all those specific criterion-based ascriptions—so stuff like what the person sees, how they are intentionally acting, what they feel, and so on. In a more full-blown version, we would have to factor in other factors (e.g. the beliefs they express through language and other symbolic acts; the influence of inductive generalizations made on the basis of previous interpretations, etc).

What is it for an interpretation to be “more favourable” than another? And what is it for a belief-desire interpretation to fit with a set of perceivings, intentions, feelings etc? For concreteness, I’ll take the latter to be fleshed out in terms of rational coherence between perceptual input and belief change and means-end coherence of beliefs and desires with intentions, and the like—structural rationality constraints playing the role that in IBE, formal consistency might play. And I’ll take favourability to be cashed out as the subject being represented as favourably as is possible—believing as they ought, acting on good reasons, etc.

Now, if this were to fit within the Plantingan project, it has to be the case that there is a component of our cognitive system that goes for charitable interpretation and issues in (relatively firm) ascriptions of mental states to others. Is that even initially plausible? We all have experience of being interpreted uncharitably, and complaining about it. We all know, if we’re honest, that we are inclined to regard some people as stupid or malign, including in cases where there’s no very good direct evidence for that.

I want to make two initial points here. The first is that we need to factor in some of the factors mentioned earlier in order to fairly evaluate the hypothesis here. Particularly relevant will be inductive generalizations from previous experience. If your experience is that everyone you’ve met from class 22B is a bully who wants to cause you pain, you might reasonably not be that charitable to the next person you meet from class 22B, even if the evidence about that person directly is thin on the ground. I’d expect the full-dress version of charity to instruct us to form the most favourable attributions consistent with those inductive generalizations we reasonably hold onto (clearly, there’ll be some nuance in spelling this out, since we will want to allow that sufficient acquaintance with a person allows us to start thinking of them as a counterexample to generalizations we have previously held). For similar reasons, an instruction to be as charitable as possible won’t tell you to assume that every stranger you meet is saintly and omnisicient, and merely behaving in ways that do not manifest this out of a concern not to embarrass you (or some such reason). For starters, it’s somewhat hard to think of decent ideas why omniscient saints would act as everyday people do (just ask those grappling with the problem of evil how easy this is), and for seconds, applied to those people with whom we have most interaction, such hypotheses wouldn’t stand much scrutiny. We have decent inductive grounds for thinking, generically people’s motives and information lie within the typical human band. What charity tells us to do is pick the most favourable interpretation consistent with this kind of evidence. (Notice that even if these inductive generalizations eventually take most of the strain in giving a default interpretation of another, charity is still epistemically involved insofar as (i) charity was involved in the interpretations which form the base from which the inductive generalization was formed; and (ii) insofar as are called on-the-fly to modify our inductively-grounded attributions when someone does something that doesn’t fit with them).

Further, the hypothesis that we have a belief-attributing disposition with charity as its centrepiece is quite consistent with this being defeasible, and quite often defeated. For example, here’s one way human psychology might be. We are inclined by default to be charitable in interpreting others, but we are also set up to be sensitive to potential threats from people we don’t know. Human psychology incorporates this threats-detection system by giving us a propensity to form negative stereotypes of outgroups on the basis of beliefs about bad behaviour or attitudes of salient members of those outgroups. So when these negative stereotypes are triggered, this overrides our underlying charitable disposition with some uncharitable default assumptions encoded in the stereotype. (In Plantingan terms, negative stereotype formation would not be a part of our cognitive structure aimed at truth, but rather one aimed at pragmatic virtues, such as threat-avoidance). Only where the negative stereotypes are absence would we then expect to find the underlying signal of charitable interpretation.

So again: is it even initially plausible that we actually engage in charitable interpretation? The points above suggest we should certainly not test this against our practice in relation to members of outgroups that may be negatively stereotyped. So we might think about this in application to friends and family. As well as being in-groups rather than out-groups, these are also cases where we have a lot of direct (criterion-based) evidence about their perceivings, intendings, feelings over time, so cases where we would expect to be less reliant on inductive generalizations and the like. I think in those cases charity is at least an initially plausible candidate as a principle constraining our interpretative practice. As some independent evidence of this, we might note Sarah Stroud’s account of the normative commitments constitutive of being a friend, which includes an epistemic bias towards charitable interpretation. Now, her theory of this says that it is the special normatively significant relation of friendship that places an obligation of charity upon us, and that is not my conjecture. But insofar as she is right about the phenomenology of friendship as including an inclination to charity, then I think this supports the idea that the idea that charitable interpretation is at least one of our modes of belief attribution. It’s not the cleanest case—because the very presence of the friendship relation is a potential confound—but I think it’s enough to motivate exploring the hypothesis.

So suppose that human psychology does work roughly along the lines just sketched, with charitable-ascription the default, albeit defeasible and overridable. If this is to issue in warranted ascriptions within a Plantigian epistemology, then not only does charitable interpretation have to be a properly-functioning part of our cognitive system, but it would have to be a part that’s aimed at truth, and which reliably issues in true beliefs. Furthermore, it’d have to very reliably issue in true beliefs, if it is, by Plantingan lights, to warrant our firm beliefs about the mental lives of others.

Both aspects might raise eyebrows. There are lots of things one could say in praise of charitable interpretation that are fundamentally pragmatic in character. Assuming the best of others is a pro-social thing to do. Everyone is the hero in their own story, and they like to learn that they are heroes in other people’s stories too. So expressing charitable interpretations of others is likely to strengthen relationships, enable cooperation, and prompt reciprocal charity. All that is good stuff! It might be built up into an ecological rationale for building charitable interpretation into one’s dealing with in-group members (more generally, positive stereotypes), just as threat-avoidance might motivate building cynical interpretation into one’s dealing with out-group members (more generally, negative stereotypes). But if we emphasize this kind of benefit of charitable interpretation, we are building a case for a belief forming mechanism that aims at sociability, not one aimed at truth. (We’re also undercutting the idea that charity is a default that is overridden by e.g. negative stereotypes–it suggests instead different stances in interpretation are tied to the different relationships).

It’s easiest to make the case that an interpretative disposition that is charitable is aimed at truth if we can make the case that it is reliable (in normal circumstances). What do we make of that?

Again, we shouldn’t overstate what it takes for charity to be reliable. We don’t have to defend the view that it’s reliable to assume that strangers are saints, since charity doesn’t tell us to do that (it wouldn’t get make it to the starting blocks of plausibility if it did). The key question will be whether charitable interpretation will be a reliable way of interpreting those with whom we have long and detailed acquaintance (so that the data that dominates is local to them, rather than inductive generalizations). The question is something like the following: are humans generally such that, among the various candidate interpretations that are structurally rationally compatible with their actions, perceptions, feelings (of the kind that friends and family would be aware of) the most favourable is the truest?

Posed that way, that’s surely a contingent issue—and something to which empirical work would be relevant. I’m not going to answer it here! But what I want to say is that if this is a reliable procedure in the constrained circumstances envisaged, then the prospects start to look good for accommodating charity within a Plantingan setup.

Now, even if charity is reliable, there remains the threat it won’t be reliable enough to vindicate the firmness of the confidence I have that family and strangers on the street believe that the sun will rise tomorrow, and so forth. (This is to avoid the analogue of the problem Plantinga poses for inference to the best explanation). This will guide the formulation of exactly how we characterize charity—it better not just say that we endorse the most charitable interpretation that fits the relevant data, with the firmness of that belief unspecified, but also says something about the firmness of such beliefs. For example, it could be that charity tells us to distribute our credence over interpretations in a way that respects how well they rationalize the evidence available so far. In that case, we’d predict that beliefs and desires common to almost all favourable candidates are ascribed much more firmly than beliefs and desires which are part of the very best interpretation, but not on nearby candidates. And we’d make the case that e.g. a belief that the sun will rise tomorrow is going to be part of almost all such candidates. (If we make this move, we need to allow the friend of topic-neutral IBE to make a similar one. Plantinga would presumably say that many of the candidates to be “best explanations” of data, when judged on topic neutral grounds, are essentially sceptical scenarios with respect to other minds. So I think we can see how this response could work here, but not in the topic-neutral IBE setting).

Three notes before I finish. The first is that even if charity as I categorized it (as a kind of justification-and-reason maximizing principle) isn’t vindicated as a special purpose interpretive principle, it illustrates the way that interpretive principles with very substantial content could play an epistemological role in solving the other problem of other minds. For example, a mirror-image principle would be to pick the most cynical interpretation. Among a creatures who are naturally malign dissemblers, that may reliable, and so a principle of cynicism vindicated on exactly parallel lines. And if in fact all humans are pretty similar in their final desires and general beliefs, then a principle of projection, where one by default assumes that other creatures have the beliefs and desires that you, the interpretor, have yourself, might be reliable in the same way. And so that too could be given a backing (Note that this would not count as a topic-neutral inference by analogy. It would be to a topic-specific inference concerned with psychological attribution alone, and so which could in principle issue in much firmer beliefs than a general purpose mechanism which has to avoid false positives in other areas).

Second, the role for charity I have set out above is very different from the way that it’s handled by e.g. Davidson and the Davidsonians (in those moments where they are using it as a epistemological principle, rather than something confined to the metaphysics of meaning). This kind of principle is contingent, and though we could insist that it is somehow built into the very concept of “belief”, that would just be to make the concept of belief somewhat parochial, in ways that Davidsonians would not like.

The third thing I want to point out is that if we think of epistemic charity as grounded in the kind of considerations given above, we should be very wary about analogical extensions of interpretative practices to creatures other than humans. For it could be that epistemic charity is reliable when restricted to people, but utterly unreliable when applied—for example–to Klingons. And if that’s so, then extending our usual interpretative practice to a “new normal” involving Klingons won’t give us warranted beliefs at all. More realistically, there’s often a temptation to extend belief and desire attrributions to non-human agents such as organizations, and perhaps, increasingly, AI systems. But if the reliance on charity is warranted only because of something about the nature of the original and paradigmatic targets of interpretation (humans mainly, and maybe some other naturally occurring entities such as animals and naturally formed groups) that makes it reliable, then it’ll continue to be warranted in application to these new entities if they have a nature which also makes it reliable. It’s perfectly possible that the incentive structures of actually existing complex organizations are just not such that we should “assume the best” of them, as we perhaps should of real people. I don’t take a stand on this—but I do flag it up as something that needs seperate evaluation.