Critical Postmodern Readings, Part 2: Finishing Lyotard

Last time we looked at the introduction to Lyotard’s The Postmodern Condition: A Report on Knowledge. That introduction already contained much of what gets fleshed out in the rest of the short book, so I’m going to mostly summarize stuff until we hit anything that requires serious critical thought.

The first chapter goes into how computers have changed the way we view knowledge. It was probably an excellent insight that required argument at the time. Now it’s obvious to everyone. Humans used to gain knowledge by reading books and talking to each other. It was a somewhat qualitative experience. The nature of knowledge has shifted with (big) data and machine learning. It’s very quantitative. It’s also a commodity to be bought and sold (think Facebook/Google).

It is a little creepy to understand Lyotard’s prescience. He basically predicts that multinational corporations will have the money to buy this data, and owning the data gives them real-world power. He predicts knowledge “circulation” in a similar way to money circulation.  Here’s a part of the prediction:

The reopening of the world market, a return to vigorous economic competition, the breakdown of the hegemony of American capitalism, the decline of the socialist alternative, a probable opening of the Chinese markets …

Other than the decline of the socialist alternative (which seems to have had a recent surge), Lyotard has a perfect prediction of how computerization of knowledge actually affected the world in the 40 years since he wrote this.

Chapter two reiterates the idea that scientific knowledge (i.e. the type discussed above) is different than, and in conflict with, “narrative” knowledge. There is also a legitimation “problem” in science. The community as a whole must choose gatekeepers seen as legitimate who decide what counts as scientific knowledge.

I’ve written about why I don’t see this as a problem like Lyotard does, but I’ll concede the point that there is a legitimation that happens, and it could be a problem if those gatekeepers change the narrative to influence what is thought of as true. There are even known instances of political biases making their way into schools of scientific thought (see my review of Galileo’s Middle Finger by Alice Dreger).

Next Lyotard sets up the framework for thinking about this. He uses Wittgenstein’s “language game” concept. The rules of the game can never legitmate themselves. Even small modifications of the rules can greatly alter meaning. And lastly (I think this is where he differs from Wittgenstein), each speech act is an attempt to alter the rules. Since agreeing upon the current set of rules is a social contract, it is necessary to understand the “nature of social bonds.”

This part gets a little weird to me. He claims that classically society has been seen either as a unified whole or divided in two. The rules of the language games in a unified whole follow standard entropy (they get more complicated and chaotic and degenerate). The divided in two conception is classic Marxism (bourgeoisie/proletariat).

Even if it gets a bit on the mumbo-jumbo side through this part, I think his main point is summarized by this quote:

For it is impossible to know what the state of knowledge is—in other words, the problems its development and distribution are facing today—without knowing something of the society within which it is situated.

This doesn’t seem that controversial to me considering I’ve already admitted that certain powers can control the language and flow of knowledge. Being as generous as possible here, I think he’s just saying we have to know how many of these powers there are and who has the power and who legitimated that power before we can truly understand who’s forming these narratives and why.

In the postmodern world, we have a ton of different institutions all competing for their metanarrative to be heard. Society is more fractured than just the two divisions of the modern world. But each of these institutions also has a set of rules for their language games that constrains them.  For example, the language of prayer has a different set of rules from an academic discussion at a university.

Chapters 7-9 seem to me to be where the most confusion on both the part of Lyotard and the reader can occur. He dives into the concept of narrative truth and scientific truth. You can already feel Lyotard try to position scientific truth to be less valuable than it is and narrative truth more valuable.

Lyotard brings up the classic objections to verification and falsification (namely a variant on Hume’s Problem of Induction). How does one prove ones proof and evidence of a theory is true? How does one know the laws of nature are consistent across time and space? How can one say that a (scientific) theory is true merely because it cannot be falsified?

These were much more powerful objections in Lyotard’s time, but much of science now takes a Bayesian epistemology (even if they don’t admit to this terminology). We believe what is most probable, and we’re open to changing our minds if the evidence leads in that direction. I addressed this more fully a few years ago in my post: Does Bayesian Epistemology Suffer Foundational Problems?

… drawing a parallel between science and nonscientific (narrative) knowledge helps us understand, or at least sense, that the former’s existence is no more—and no less—necessary than the latter’s.

These sorts of statements are where things get tricky for me. I buy the argument that narrative knowledge is important. One can read James Baldwin and gain knowledge and empathy of a gay black man’s perspective that changes your life and the way you see the world. Or maybe you read Butler’s performative theory of gender and suddenly understand your own gender expression in a new way. Both of these types of narrative knowledge could even be argued to be a “necessary” and vital part of humanity.

I also agree science is a separate type of knowledge, but I also see science as clearly more necessary than narrative knowledge. If we lost all of James Baldwin’s writings tomorrow, it would be a tragedy. If we lost the polio vaccine tomorrow, it would be potentially catastrophic.

It’s too easy to philosophize science into this abstract pursuit and forget just how many aspects of your life it touches (your computer, the electricity in your house, the way you cook, the way you get your food, the way you clean yourself). Probably 80% of the developed world would literally die off in a few months if scientific knowledge disappeared.

I’ll reiterate that Lyotard thinks science is vastly important. He is in no way saying the problems of science are crippling. The above quote is more in raising narrative knowledge to the same importance of science than the devaluing of science (Lyotard might point to the disastrous consequences that happened as a result of convincing a nation of the narrative that the Aryan race is superior). For example, he says:

Today the problem of legitimation is no longer considered a failing of the language game of science. It would be more accurate to say that it has itself been legitimated as a problem, that is, as a heuristic driving force.

Anyway, getting back to the main point. Lyotard points out that problems of legitimating knowledge is essentially modern, and though we should be aware of the difficulties, we shouldn’t be too concerned with it. The postmodern problem is the grand delegitimation of various narratives (and one can’t help but hear Trump yell “Fake News” while reading this section of Lyotard).

Lyotard spends several sections developing a theory of how humans do science, and he develops the language of “performativity.” It all seems pretty accurate to me, and not really worth commenting on (i.e. it’s just a description). He goes into the issues Godel’s Incompleteness Theorem caused for positivists. He talks about the Bourbaki group. He talks about the seeming paradox of having to look for counterexamples while simultaneously trying to prove the statement to be true.

I’d say the most surprising thing is that he gets this stuff right. You often hear about postmodernists hijacking math/science to make their mumbo-jumbo sound more rigorous. He brings up Brownian motion and modeling discontinuous phenomena with differentiable functions to ease analysis and how the Koch curve has a non-whole number dimension. These were all explained without error and without claiming they imply things they don’t imply.

Lyotard wants to call these unintuitive and bizarre narratives about the world that come from weird scientific and mathematical facts “postmodern science.” Maybe it’s because we’ve had over forty more years to digest this, but I say: why bother? To me, this is the power of science. The best summary I can come up with is this:

Narrative knowledge must be convincing as a narrative; science is convincing despite the unconvincing narrative it suggests (think of the EPR paradox in quantum mechanics or even the germ theory of disease when it was first suggested).

I know I riffed a bit harder on the science stuff than a graduate seminar on the book would. Overall, I thought this was an excellent read. It seems more relevant now than when it was written, because it cautions about the dangers of powerful organizations buying a bunch of data and using that to craft narratives we want to hear while deligitimating narratives that hurt them (but which might be true).

We know now that this shouldn’t be a futuristic, dystopian fear (as it was in Lyotard’s time). It’s really happening with targeted advertising and the rise of government propaganda and illegitimate news sources propagating our social media feeds. We believe what the people with money want us to believe, and it’s impossible to free ourselves from it until we understand the situation with the same level of clarity that Lyotard did.

The Carter Catastrophe

I’ve been reading Manifold: Time by Stephen Baxter. The book is quite good so far, and it presents a fascinating probabilistic argument that humans will go extinct in the near future. It is sometimes called the Carter Catastrophe, because Brandon Carter first proposed it in 1983.

I’ll use Bayesian arguments, so you might want to review some of my previous posts on the topic if you’re feeling shaky. One thing we didn’t talk all that much about is the idea of model selection. This is the most common thing scientists have to do. If you run an experiment, you get a bunch of data. Then you have to figure out the most likely reason for what you see.

Let’s take a basic example. We have a giant tub of golf balls, and we can’t see inside the tub. There could be 1 ball or a million. We’re told the owner accidentally dropped a red ball in at some point. All the other balls are the standard white golf balls. We decide to run an experiment where we draw a ball out, one at a time, until we reach the red one.

First ball: white. Second ball: white. Third ball: red. We stop. We’ve now generated a data set from our experiment, and we want to use Bayesian methods to give the probability of there being three total balls or seven or a million. In probability terms, we need to calculate the probability that there are x balls in the tub given that we drew the red ball on the third draw. Any time we see this language, our first thought should be Bayes’ theorem.

Define A_i to be the model of there being exactly i balls in the tub. I’ll use “3” inside of P( ) to be the event of drawing the red ball on the third try. We have to make a finiteness assumption, and although this is one of the main critiques of the argument, we can examine what happens as we let the size of the bound grow. Suppose for now the tub can only hold 100 balls.

A priori, we have no idea how many balls are in there, so we’ll assume all “models” are equally likely. This means P(A_i)=1/100 for all i. By Bayes’ theorem we can calculate:

P(A_3|3) = \frac{P(3|A_3)P(A_3)}{(\sum_{i=1}^{100}P(3|A_i)P(A_i))}

\frac{(1/3)(1/100)}{(1/100)\sum_{i=3}^{100}1/i} \approx 0.09

So there’s around a 9% chance that there are only 3 balls in the tub. That bottom summation remains exactly the same when computing P(A_n | 3) for any n and equals about 3.69, and the (1/100) cancels out every time. So we can compute explicitly that for n > 3:

P(A_n|3)\approx \frac{1}{n}(0.27)

This is a decreasing function of n, and this shouldn’t be surprising at all. It says that as we guess there are more and more balls in the tub, the probability of that guess goes down. This makes sense, because it’s unreasonable to think we’d see the red one that early if there are actually 100 balls in the tub.

There’s lots of ways to play with this. What happens if our tub could hold millions but we still assume a uniform prior? It just takes all the probabilities down, but the general trend is the same: It becomes less and less reasonable to assume large amounts of total balls given that we found the red one so early.

You could also only care about this “earliness” idea and redo the computations where you ask how likely is A_n given that we found the red ball by the third try. This is actually the more typical way the problem is formulated in the Doomsday arguments. It’s more complicated, but the same idea pops out, and this should make intuitive sense.

Part of the reason these computations were somewhat involved is because we tried to get a distribution on the natural numbers. But we could have tried to compare heuristically to get a super clear answer (homework for you). What if we only had two choices “small number of total balls (say 10)” or “large number of total balls (say 10,000)”? You’d find there is around a 99% chance that the “small” hypothesis is correct.

Here’s the leap. Now assume the fact that you exist right now is random. In other words, you popped out at a random point in the existence of humans. So the totality of humans to ever exist are the white balls and you are the red ball. The same type of argument above applies, and it says that the most likely thing is that you aren’t born at some super early point in human history. In fact, it’s unreasonable from a probabilistic standpoint to think that humans will continue much longer at all given your existence.

The “small” total population of humans is far, far more likely than the “large” total population, and the interesting thing is that this remains true even if you mess with the uniform prior. You could assume it is much more likely a priori for humans to continue to make improvements and colonize space and develop vaccines giving a higher prior for the species existing far into the future. But unfortunately the Bayesian argument will still pull so strongly in favor of humans ceasing to exist in the near future that one must conclude it is inevitable and will happen soon!

Anyway. I’m travelling this week, so I’m sorry if there are errors in those calculations. I was in a hurry and never double checked them. The crux of the argument should still make sense even if you don’t get my exact numbers. There’s also a lot of interesting and convincing rebuttals, but I don’t have time to get into them now (including the fact that unlikely hypotheses turn out to be true all the time).

Validity in Interpretation Chapter 5

You know the drill by now. These are just notes from my reading of E.D. Hirsch, Jr.’s Validity in Interpretation. We have finally reached the last chapter. The main thrust of this last chapter is on how to tell whether our interpretation is valid. It rehashes a lot of stuff we’ve already covered, and it gives some examples of putting the theory to use.

The first point is that we can often trick ourselves into self-validating an invalid interpretation. Hirsch doesn’t use the term, but this is a direct rephrasing of confirmation bias to literary interpretation. If we go into a text thinking it must mean something, then try to find confirmation of this interpretation, we will always find it and will overlook conflicting evidence. This is not the correct way to validate an interpretation (or anything for that matter!).

We are led back to the hermeneutic circle, because some of the evidence will only appear after a hypothesis about the interpretation has been formed. In the next section, Hirsch doesn’t say this, but he essentially argues for a Bayesian theory of interpretation. The process of validation is to take all the hypotheses and then figure out which one is most likely correct based on the evidence. As new evidence comes in, we revise our view.

All that matters are the relative probabilities. Sometimes two interpretations are equally likely, and then we say both are valid. The point is not to have one victorious theory, but to have a way to measure how likely each is in terms of the others.

Personal Note: Whenever someone brings up probabilistic reasoning in the arts (or even history) the same sorts of objections get raised. The assignment of a probability is arbitrary. You can make up whatever priors you want to skew the results in favor of your pet interpretation. These are very recent debates that came decades after this book was published. Surprisingly, Hirsch gives the same answers to these objections that we still give.

First, we already speak in probabilities when analyzing interpretations. I think it is “extremely unlikely” that the word “plastic” means the modern substance in this 1744 poem, because it hadn’t been invented yet. It is “likely” that this poem is about the death of a loved one, because much of Donne’s work is about death. These statements assign relative probabilities to the likelihood of the interpretation, but they try to mask this.

By clearly stating what we are doing, and coming up with actual quantities that can be disputed and argued for, we make our reasoning more explicit and less likely to error. If we pretend that we are not dealing with probabilities, then our arguments and reasoning become sloppy.

As usual, when determining probabilities, we need to figure out the narrowest class that the work under consideration fits in. A good clarifying example is the broad classification of women vs men. Women live longer on average than men. But when we pick a specific woman and a specific man, it would be insane to argue that the woman will probably live longer based only on that broad class. If we note that the woman is a sedentary smoker with lung cancer, and the man is an Olympic marathon runner, then these narrower classes improve our probability judgments.

This was the point of having an entire chapter on genre. We must analyze the intrinsic genre of a work to find the narrowest class that it fits in. This gives us a prior probability for certain types of interpretation. Then we can continue the analysis, updating our views as we encounter more or less evidence.

Hirsch then goes on to talk about the principle of falsifiability as we know it from science. Rather than confirming our hypothesis, we should come up with plausible evidence that would conclusively falsify the interpretation. He goes on to give a bunch of subtle examples that would take a lot of time to explain here. For simplicity, we could go back to the plastic example. If a poem dates before 1907, then any interpretation that requires the substance meaning of the word plastic is false.

He ends the section by reminding us that we always have to think in context. There are no rules of interpretation that can be stated generally and be practical in all situations. There are always exceptions. The interpretive theory in this book is meant as a starting point or provisional guide. This is also true of all methods of interpretation (think of people who always do a “Marxist reading” or “feminist reading” of a text).

I’ll end with a quote:

“While there is not and cannot be any method or model of correct interpretation, there can be a ruthlessly critical process of validation to which many skills and many hands may contribute. Just as any individual act of interpretation comprises both a hypothetical and a critical function, so the discipline of interpretation also comprises the having of ideas and the testing of them.”

Decision Theory 1

Today we’ll start looking at a branch of math called Decision Theory. It uses the types of things in probability and statistics that we’ve been looking at to make rational decisions. In fact, in the social sciences when bias/rationality experiments are done, seeing how closely people make decisions to these optimal decisions is the base line definition of rationality.

Today’s post will just take the easiest possible scenarios to explain the terms. I think most of this stuff is really intuitive, but all the textbooks and notes I’ve looked at make this way more complicated and confusing. This basically comes from doing too much too fast and not working basic examples.

Let’s go back to our original problem which is probably getting old by now. We have a fair coin. It gets flipped. I have to bet on either heads or tails. If I guess wrong, then I lose the money I bet. If I guess right, then I double my money. The coin will be flipped 100 times. How should I bet?

Let’s work a few things out. A decision function is a function from the space of random variables {X} (technically we can let {X} be any probability space) to the set of possible actions. Let’s call {A=\{0,1\}} our set of actions where {0} corresponds to choosing tails and {1} corresponds to heads. Our decision function is a function that assigns to each flip a choice of picking heads or tails, {\delta: X \rightarrow A}. Note that in this example {X} is also just a discrete space corresponding to the 100 flips of the coin.

We now define a loss function, {L:X\times A \rightarrow \mathbb{R}}. To make things easy, suppose we bet 1 cent every time. Then our loss is {1} cent every time we guess wrong and {-2} cents if we guess right. Because of the awkwardness of thinking in terms of loss (i.e. a negative loss is a gain) we will just invert it and use a utility function in this case which measures gains. Thus {U=-1} when we guess wrong and {U=2} when we guess right. Notationally, suppose {F: X\rightarrow A} is the function that tells us the outcome of each flip. Explicitly,

\displaystyle U(x_i, \delta(x_i)) = \begin{cases} -1 \ \text{if} \ F(x_i) \neq \delta(x_i) \\ 2 \ \text{if} \ F(x_i) = \delta(x_i) \end{cases}

The last thing we need is the risk involved. The risk is just the expected value of the loss function (or the negative of the expected value of the utility). Suppose our decision function is to pick {0} every time. Then our expected utility is just {100(1/2(-1)+1/2(2))=50}. This makes sense, because half the time we expect to lose and half we expect to win. But we double our money on a win, so we expect a net gain. Thus our risk is {-50}, i.e. there is no risk involved in playing this way!
This is a weird example, because in the real world we have to make our risk function up and it does not usually have negative expected value, i.e. there is almost always real risk in a decision. Also, our typical risk will still be a function. It is only because everything is discrete that some concepts have been combined which will need to be pulled apart later.

The other reason this is weird is that even though there are {2^{10}} different decision functions, they all have the same risk because of the symmetry and independence of everything. In general, each decision function will give a different risk, and they are ordered by this risk. Any minimum risk decision function is called “admissible” and it corresponds to making a rational decision.

I want to point out that if you have the most rudimentary programming skills, then you don’t have to know anything about probability, statistics, or expected values to figure these things out in these simple toy examples. Let’s write a program to check our answer (note that you could write a much simpler program which is only about 5 lines, has no functions, etc to do this):

import random
import numpy as np
import pylab

def flip():
    return random.randint(0,1)

def simulate(money, bet, choice, length):
    for i in range(length):
        tmp = flip()
        if choice == tmp:
            money += 2*bet
            money -= bet
    return money

results = []
for i in range(1000):
    results.append(simulate(10, 1, 0, 100))

pylab.title('Coin Experiment Results')
pylab.xlabel('Trial Number')
pylab.ylabel('Money at the End of the Trial')

print np.mean(results)

This python program runs the given scenario 1000 times. You start with 10 cents. You play the betting game with 100 flips. We expect to end with 60 cents at the end (we start with 10 and have an expected gain of 50). The plot shows that sometimes we end with way more, and sometimes we end with way less (in these 1000 we never end with less than we started with, but note that is a real possibility, just highly unlikely):


It clearly hovers around 60. The program then spits out the average after 1000 simulations and we get 60.465. If we run the program a bunch of times we get the same type of thing over and over, so we can be reasonably certain that our above analysis was correct (supposing a frequentist view of probability it is by definition correct).

Eventually we will want to jump this up to continuous variables. This means doing an integral to get the expected value. We will also want to base our decision on data we observe, i.e. inform our decisions instead of just deciding on what to do ahead of time and then plugging our ears, closing our eyes, and yelling, “La, la, la, I can’t see what’s happening.” When we update our decision as the actions happen it will just update our probability distributions and turn it into a Bayesian decision theory problem.

So you have that to look forward to. Plus some fun programming/pictures should be in the future where we actually do the experiment to see if it agrees with our analysis.