Monthly Archives: November 2020

Follow up: GKL similarity and social choice

The previous post discusses a “distance minimizing” way of picking a compromise between agents with a diverse set of utilities. If you measure distance (better: divergence) between utility functions by square Euclidean distance, then a utilitarian compromise pops out.

I wanted now to discuss briefly a related set of results (I’m grateful for pointers and discussion with Richard Pettigrew here, though he’s not to blame for any goofs I make along the way). The basic idea here is to use a different distance/divergence measure between utilities, and look at what happens. One way to regard what follows is as a serious contender (or serious contenders) for measuring similarity of utilities. But another way of looking at this is as an illustration that the choice of similarity I made really has significant effects.

I borrowed the square Euclidean distance analysis of similarity from philosophical discussions of similarity of belief states. And the rival I now discuss is also prominent in that literature (and is all over the place in information theory). It is (generalized) Kullback-Leibler relative entropy (GKL), and it gets defined, on a pair of real valued vectors U,V in this way:

D_{KL}(U,V):=\sum_{p\in P} U(p)\log \frac{U(p)}{V(p)} - U(p)+V(p)

Note that when the vectors are each normalized to the same quantity, the sum of U(p) over all p is equal to the sum of V(p) over all p, and so two latter summands cancel. In the more general case, they won’t. Kullback-Leibler relative entropy is usually applied with U and V being probability functions, which are normalized, so you normally find it in the form where it is a weighted sum of logs. Notoriously, GKL is not symmetric: the distance from U to V can be different from the distance to U from V. This matters; more anon.

(One reason I’m a little hesitant with using this as a measure of similarity between utilities in this context is the following. When we’re using it to measure similarity between beliefs or probability functions, there’s a natural interpretation of it as the expectation from U’s perspective of difference between the log of U and the log of V. But when comparing utilities rather than probabilities means we can’t read the formula this way. It feels to me a bit more of a formalistic enterprise for that reason. Another thing to note is that taking logs is well defined only when the relevant utilities are positive, which again deserves some scrutiny. Nevertheless….)

What happens when we take GKL as a distance (divergence) measure, and then have a compromise between a set of utilities by minimizing total sum distance from the compromise point to the input utilities? This article by Pettigrew gives us the formal results that speak to the question. The key result is that the compromise utility U_C that emerges from a set of m utility functions U_i is the geometrical mean:

U_C(p)= (\prod_{i\in A} U_i(p))^{\frac{1}{m}}.

Where the utilitarian compromise utilities arising from squared euclidean distance similarity look to the sum of individual utilities, this compromise looks at the product of individual utilities. It’s what’s called in the social choice literature a symmetrical “Nash social welfare function” (that’s because it can be viewed as a special case of a solution to a bargaining game that Nash characterized: the case where the “threat” or “status quo” point is zero utility for all). It has some interesting and prima facie attractive features—it prioritizes the worse off, in that a fixed increment of utility will maximize the product of everyone’s utilities if awarded to someone who has ex ante lowest utility. It’s also got an egalitarian flavour, in that you maximize the product of a population’s utilities by dividing up total utility evenly among the population (contrast utilitarianism, where you can distribute utility in any old way among a population and get the same overall sum, and so any egalitarian features of the distribution of goods have to rely on claims about diminishing marginal utility of those goods; which by the same token leaves us open to “utility monsters” in cases where goods have increasing utility for one member of the population). Indeed, as far as I can tell, it’s a form of prioritarianism, in that it ranks outcomes by way of a sum of utilities which are discounted by the application of a concave function (you preserve the ranking of outcomes if you transform the compromise utility function by a monotone increasing function, and in this case we can first raise it to the mth power, and then take logs, and the result will be the sum of log utilities. And since log is itself a concave function this meets the criteria for prioritarianism). Anyway, the point here is not to evaluate Nash social welfare, but to derive it.

The formal result is proved in the Pettigrew paper, as a corollary to a very general theorem. Under the current interpretation that theorem also has the link between squared Euclidean distance and utilitarianism of the previous post as another special case. However, it might be helpful to see how the result falls out of elementary minimization (it was helpful for me to work through it, anyway, so I’m going to inflict it on you). So we start with the following characterization, where A is the set of agents whose utilities we are given:

U_C=\textsc{argmin}_X \sum_{i\in A} D_{KL}(X,U_i)

To find this we need to find X which makes this sum minimal (P being the set of n propositions over which utilities are defined, and A being the set of m agents):

\sum_{i\in A} \sum_{p\in P} X(p)\log \frac{X(p)}{U_i(p)} - X(p)+U_i(p)

Rearrange as a sum over p:

\sum_{p\in P} \sum_{i\in A} X(p)\log \frac{X(p)}{U_i(p)} - X(p)+U_i(p)

Since we can assign each X(p) independently of the others, we minimize this sum by minimizing each summand. Fixing p, and writing x:=X(p) and u_i:=U_i(p), our task now is to find the value of u which minimizes the following:

\sum_{i\in A} x\log \frac{x}{u_i} - x+u_i

We do this by differentiating and setting the result to zero. The result of differentiating (once you remember the product rule and that differentiating logs gives you a reciprocal) is:

\sum_{i\in A} \log \frac{x}{u_i}

But a sum of logs is the log of the product, and so the condition for minimization is:

0=\log \frac{x^m}{\prod_{i\in A}u_i}

Taking exponentials we get:

1=\frac{x^m}{\prod_{i\in A}u_i}

That is:

x^m=\prod_{i\in A}u_i

Unwinding the definitions of the constants and variables gets us the geometrical mean/Nash social welfare function as promised.

So that’s really neat! But there’s another question to ask here (also answered in the Pettigrew paper). What happens if we minimize sum total distance, not from the compromise utility to each of the components, but from the components to the compromise? Since GKL distance/divergence is not symmetric, this could give us something different. So let’s try it. We swap the positions of the constant and variables in the sums above, and the task becomes to minimize the following:

\sum_{i\in A} u_i\log \frac{x}{u_i} - u_i+x

When we come to minimize this by differentiating, we no longer have a product of functions in x to differentiate with respect to x. That makes the job easier, and ends up with us with the constraint:

\sum_{i\in A} 1-\frac{u_i}{x}

Rearranging we get:

x= \frac{1}{n} \sum_{i\in A} u_i

and we’re back to the utilitarian compromise proposal again! (That is, this distance-minimizing compromise delivers the arithmetical mean rather than the geometrical mean of the components).

Stepping back: what we’ve seen is that if you want to do distance-minimization (similarity-maximization, minimal-mutilation) compromise on cardinal utilities then the precise way distance you choose really matters. Go for squared euclidean distance and you get utilitarianism dropping out. Go for the log distance of the GKL, and you get either utilitarianism or the Nash social welfare rule dropping out, depending on the “direction” in which you calculate the distances. These results are the direct analogues of results that Pettigrew gives for belief-pooling. If we assume that the way of measuring similarity/distance for beliefs and utilities should be the same (as I did at the start of this series of posts) then we may get traction on social welfare functions through studying what is reasonable in the belief pooling setting (or indeed, vice versa).

From desire-similarity to social choice

In an earlier post, I set out proposal for measuring distance or (dis)similarity between desire-states (if you like, between utility functions defined over a vector of propositions). That account started with the assumption that we measured strength of desire by real numbers. And the proposal was to measure the (dis)similarity between desires by the squared euclidean distance between the vectors of desirability at issue. If \Omega is the finite set of n propositions at issue, we characterize similarity like this:

d(U,V)= \sum_{p\in\Omega} (U(p)-V(p))^2

In that earlier post, I linked this idea to “value” dominance arguments for the characteristic equations of causal decision theory. Today, I’m thinking about compromises between the desires of a diverse set of agents.

The key idea here is to take a set A of m utility functions U_i, and think about what compromise utility vector U_C makes sense. Here’s the idea: we let the compromise U_C be that utility vector which is closest overall to the inputs, where we measure overall closeness simply by adding up the distance between it and the input utilities U_i. That is:

U_C = \textsc{argmin}_X \sum_i d(X,U_i)

So what is the X which minimizes the following?

\sum_{p\in\Omega} \sum_{i\in A} (X(p)-U_i(p))^2


\sum_{i\in A} \sum_{p\in\Omega}(X(p)-U_i(p))^2

This is a sum of m summands, each of which is positive. So you find the minimum value by minimizing each summand. And to minimize the ith summand we differentiate and set the result to zero:


This gives us the following value of X(p):

X(p)=\frac{\sum_{i\in A}U_i(p)}{m}

This tells us exactly what value U_C must assign to p. It must be the average utility assigned to p of the m input functions.

Suppose our group of agents is faced with a collective choice between a number of options. Then one option O is strictly preferred to the other options according to the compromise utility U_C just in case the average utility the agents assign to it is greater than the average utility the agents assign to any other option. (In fact, since the population is fixed when evaluating each option, we can ignore the fact we’re taking averages—O is preferred exactly when the sum total of utilities assigned to it across the population is greater than for any other). So the procedure for social choice “choose according to the distance-mimimizing compromise function” is the utilitarian choice procedure.

That’s really all I want to observe for today. A couple of finishing up notes. First, I haven’t found a place where this mechanism for compromise choice is set out and defended (I’m up for citations though, since it seems a natural idea). Second, there is at least an analogous strategy already in the literature. In Gaertner’s A Primer in Social Choice Theory he discusses (p.112) the Kemeny procedure for social choice, which works on ordinal preference rankings over options, and proceeds by finding that ordinal ranking which is “closest” to a profile of ordinal rankings of the options by a population. Closeness is here measured by the Kemeny metric, which counts the number of pairwise preference reversals required to turn one ranking into the other. Some neat results are quoted: a Condorcet winner (the option that would win against all others in a purality vote) if it exists is always top of the Kemeny compromise ranking. As the Kemeny compromise ranking stands to the Kemeny distance metric over sets of preference orderings, so the utilitarian utility function stands to the square-distance divergence over sets of cardinal utility functions.

I’ve been talking about all this as if every aspect of utility functions were meaningful. But (as discussed in recent posts) some disagree. Indeed, one very interesting argument for utilitarianism has as a premise that utility functions are invariant under level-changes—i.e the utility function U and the utility function V represent the same underlying desire-state if there is a constant a such that for each proposition p, U(p)=V(p)+a (see Gaertner ch7). Now, it seems like the squared euclidean similarity measure doesn’t jive with this picture at all. After all, if we measure the squared Euclidean distance between U and V that differ by a constant, as above, we get:


On the one hand, on the picture just mentioned, these are supposed to be two representations of the same underlying state (if level-boosts are just a “choice of unit”) and on the other hand, they have positive dissimilarity by the distance measure I’m working with.

Now, as I’ve said in previous posts, I’m not terribly sympathetic to the idea that utility functions represent the same underlying desire-state when they’re related by a level boost. I’m happy to take the verdict of the squared euclidean similarity measure literally. After all, it was only one argument for utilitarianism as a principle of social choice that required the invariance claim–the reverse implication may not hold. In this post we have, in effect, a second independent argument for utilitarianism as a social choice mechanism that starts from a rival, richer preference structure.

But what if you were committed to the level-boosting invariance picture of preferences? Well, really what you should be thinking about in that case is equivalence classes of utility functions, differing from each other solely by a level-boost. What we’d really want, in that case, is a measure of distance or similarity between these classes, that somehow relates to the squared euclidean distance. One way forward is to find a canonical representative of each equivalence class. For example, one could choose the member of a given equivalence class that is closest to the null utility vector–from a given utility function U, you find its null-closest equivalent by subtracting a constant equal to the average utility it assigns to propositions: U_0=U-\frac{\sum_{p\in\Omega} U(p)}{n}.

Another way to approach this is to look at the family of squared euclidean distances between level-boosted equivalents of two given utility functions. In general, these distances will take the form

\sum_{p\in Omega} ((U(p)-\alpha) -(V(p) -\beta))^2=\sum_{p\in \Omega} (U(p)-V(p) -\gamma)^2

(Where \gamma=\alpha-\beta.) You find the minimum element in this set of distances (the closest the two equivalence classes come to each other) by differentiating with respect to gamma and setting the result to zero. That is:

0=\sum_{p\in Omega} (U(p)-V(p) -\gamma),

which rearranging gives:

\gamma=\frac{\sum_{p\in \Omega} (U(p)-V(p))}{n}=\frac{\sum_{p\in \Omega} U(p)}{n}-\frac{\sum_{p\in \Omega} V(p))}{n}

Working backwards, set \alpha:=\frac{\sum_{p\in \Omega} U(p)}{n} and \beta:=\frac{\sum_{p\in \Omega} V(p))}{n}, and we have defined two level boosted variants of the original U and V which minimize the distance between the classes of which they are representatives (in the square-euclidean sense). But note these level boosted variants are just U_0 and V_0. That is: minimal distance (in the square-euclidean sense) between two equivalence classes of utility functions is achieved by looking at the squared euclidean distance between the representatives of those classes that are closest to the null utility.

This is a neat result to have in hand. I think the “minimum distance between two equivalence classes” is better motivated than simply picking arbitrary representatives of the two families, if we want a way of extending the squared-Euclidean measure of similarity to utilities which are assumed to be invariant under level boosts. But this last result shows that we can choose (natural) representatives of the equivalence classes generated and measure the distance between them to the same effect. It also shows us that the social choice compromise which minimizes distance between families of utility can be found by (a) using the original procedure above for finding the utility function U_C selected as a minimum-distance compromise between the reprentative of each family of utility functions; and (b) selecting the family of utility functions that are level boosts of $U_C$. Since the level boosts wash out of the calculation of the relative utilities of a set of options, all the members of the U_C family will agree on which option to choose from a given set.

I want to emphasize again: my own current view is that the complexity intoduced in the last few paragraphs is unnecessary (since my view is that utilities that differ by constant factors from one another represent distanct desire-states). But I think you don’t have to agree with me on this matter to use the minimum distance compromise argument for utilitarian social choice.

How emotions might constrain interpretation

Joy is appropriate when you learn that something happens that you *really really* want. Despair is appropriate when you learn that something happens that you *really really* don’t want to happen. Emotional indifference is appropriate when you learn that something happens which you neither want nor don’t want–which is null for you. And there are grades of appropriate emotional responses—from joy to happiness, to neutrality, to sadness, to despair. I take it that we all know the differences in the intensity of the feeling in each case, and have no trouble distinguishing the valence as positive or negative.

More than just level and intensity of desire matters to the appropriateness of an emotional response. You might not feel joy in something you already took for granted, for example. Belief-like as well as desire-like states matter when we assess an overall pattern of belief/desire/emotional states as to whether they “hang together” in an appropriate way–whether they are rationally coherent. But levels and intensities of desire obviously matter (I think).

Suppose you were charged with interpreting a person about whose psychology you knew nothing beforehand. I tell you what they choose out the options facing them in a wide variety of circumstances, in response to varying kinds of evidence. This is a hard task for you, even given the rich data, but if you assumed the personal is rational you could make progress. But if *all* you did was attribute beliefs and desires which (structrurally) rationalize the choices and portray the target as responding rationally to the evidence, then there’d be a distintive kind of in-principle limit built into the task. If you attributed utility and credences which make the target’s choices maximize expected utility, and evolve by conditionalization on evidence, then you’d get a fix on what the target prefers to what, but not, in any objective sense, how much more they prefer one thing to another, or whether they are choosing x over y because x is the “lesser or two evils” or the “greater of two goods”. If you like, think of two characters facing the same situation–an enthusiast who just really likes the way the world is going, but mildly prefers some future developments to others, and the distraught one, who thinks the world has gone to the dogs, but regards some future developments as even worse than others. You can see how the the choice-dispositions of the two given the same evidence could match despite their very different attitudes. So given *only* information about the choice-dispositions of a target, you wouldn’t know whether to interpret the target as an enthusiast or their distraight friend.

While the above gloss is impressionistic, it reflects a deep challenge to the attempt to operationalize or otherwise reduce belief-desire psychology to patterns of choice-behaviour. It receives its fullest formal articulation in the claim that positive affine transformations of a utility function will preserve the “expected utility property”. (Any positive monotone transformation of a utility function will preserve the same ordering over options. The mathetically interesting bit here is that the positive affine transformations of utility function guarantee that the pattern between preferences over outcomes and preferences over acts that bring about those outcomes, mediated by credences in the act-outcome links, are all preserved).

One reaction to this in-principle limitation is to draw the conclusion that really, there are no objective facts about the level of desire we each have in an outcome, or how much more desirable we find one thing than another. A famous consequence of drawing that conclusion is that no objective sense could be made out of questions like: do I desire this pizza slice more or less than you do? Or questions like: does the amount by which I desire the pizza more than the crisps exceed the amount you desire the pizza more than the crisps? And clearly if desires aren’t “interpersonally comparable” in this sort of ways, certain ways of appealing to them within accounts of how its appropriate to trade off one person’s desires against another’s won’t make sense. A Rawlsian might say: if there’s pizza going spare, give it to the person for whom things are going worst (for whom the current situation, pre-pizza, is most undesirable). A utilitarian might say: if everyone is going to get pizza or crisps, and everyone prefers pizza to crisps, give the pizza to the person who’ll appreciate it the most (i.e. prefers pizza over crisps more than anyone else). If the whole idea of interpersonal comparisons of level and differences of desirability are nonsense, however, then those proposals write cheques that the metaphysics of attitudes can’t pay.

(As an aside, it’s worth noting at this point that you could have Rawlsian or utilitarian distribution principles that work with quantities other than desire—some kind of objective “value of the outcome for each person”. It seems to me that if the metaphysics of value underwrites interpersonally comparable quantities like the levels of goodness-for-Sally for pizza, and goodness-difference-between-pizza-and-crisps-for-Harry, then the metaphysics of desires should be such that Sally and Harry’s desire-state will, if tuned in correctly, reflect these levels and differences.)

It’s not only the utilitarian and Rawlsian distribution principles (framed in terms of desires) that have false metaphysical presuppositions if facts about levels and differences in desire are not a thing. Intraindividual ties between intensities of emotional reaction and strength of desire, and between type of emotional reaction and valence of desire, will have false metaphysical presuppositions if facts about an individual’s desire are invariant under affine tranformation. Affine transformations can change the “zero point” on the scale on which we measure desirability, and shrink or grow the differences between desirabilities. But we can’t regard zero-points or strengths of gaps as merely projections of the theorist (“arbitrary choices of unit and scale”) if we’re going to tie to them to real rational constraints on type and intensity of emotional reaction.

However. Suppose in the interpretive scenario I gave you, you knew not only the choice-behaviour of your target in a range of varying evidential situations, but also their emotional responses to the outcome of their acts. Under the same injunction to find a (structurally) rationalizing interpretation of the target, you’d now have much more to go on. When they have emotional reactions rationally linked to indifference, you would attribute a zero-point in the level of desirability. When an outcome is met with joy, and another with mere happiness, you would attribute a difference in desire (of that person, for that outcome) that makes sense of both. Information about emotions, together with an account of the rationality of emotions, allow us to set the scale and unit in interpreting an individual, in a way choice-behaviour alone struggles to. As a byproduct, we would then have a epistemic path to interpersonal comparability of desires. And in fact, this looks like an epistemic path that’s pretty commonly available in typical interpersonal situations–the emotional reactions of others are not *more* difficult to observe than the intentions with which they act or the evidence that is available to them. Emotions, choices and a person’s evidence are all interestingly epistemically problematic, but they are “directly manifestable” in a way that contrasts with the beliefs and desires that mesh with them.

The epistemic path suggests a metaphysical path to grounding levels and relative intensities of desires. Just as you can end up with a metaphysical argument against interpersonal comparability of desires by commiting oneself to grounding facts about desires in patterns of choice-behaviour, and then noting the mathematical limits of that project, you can get, I think, a metaphysical vindication of interpersonal comparabiilty of desire by including in the “base level facts” upon which facts about belief and desire are grounded facts about, type, intensity and valence of intentional emotional states. As a result, the metaphysical presuppositions of the desire-based Rawlsian and utilitarian distribution principles are met, and our desires have the structure necessary to capture and reflect level and valence of any good-for-x facts that might feature in a non-desire based articulation of those kind of principles.

In my book The Metaphysics of Representation I divided the task of grounding intentionality into three parts. First, grounding base-level facts about choice and perceptual evidence (I did this by borrowing from the teleosemantics literature). Then grounding belief-desire intentional facts in the base-level facts, via a broadly Lewisian metaphysical form of radical interpretation. (The third level concerned representational artefacts like words, but needn’t concern us here). In these terms, what I’m contemplating is to add intensional emotional states to the base level, using that to vindicate a richer structure of belief and desire.

Now, this is not the only way to vindicate levels and strength of desires (and their interpersonal comparability) in this kind of framework. I also argue in the book that the content-fixing notion of “correct interpretation” should use a substantive conception of “rationality”. The interpreter should not just select any old structurally-rationalizing interpretation of their target, but will go for the one that makes them closest to an ideal, where the ideal agent responds to their reasons appropriately. If an ideal agent’s strength and levels of desire are aligned, for example, to the strength and level of value-for-the-agent present in a situation, then this gives us a principled way to select between choice-theoretically equivalent interpretations of a target, grounding choices of unit and scale and interpersonal comparisons. I think that’s all good! But I think that including emotional reactions as constraining factors in interpretation can help motivate the hypothesis that there will be facts about the strength and level of desire *of the ideal agent*, and gives a bottom-up data-based constraint on such attributions that complements the top-down substantive-rationality constraint on attributions already present in my picture.

I started thinking about this topic with an introspectively-based conviction that *of course* there are facts about how much I want something, and whether I want it or want it not to happen. I still think all this. But I hope that I’ve now managed to identify how those convinctions to their roles in a wider theoretical edifice–their rational interactions with *obvious* truths about features of our emotional lives, the role of these in distribution principles, which give a fuller sense of what is at stake if we start denying that the metaphysics of attitudes has this rich structure. I can’t see much reason to go against this, *unless* you are in the grip of a certain picture of how attitudes get metaphysically grounded in choice-behaviour. And I like a version of that picture! But I’ve also sketched how the very links to emotional states give you a version of that kind of metaphysical theory that doesn’t have the unwelcome, counterintuitive consequences its often associated with.