## Some probability theory

I’ll explain what I see as the point behind some elementary concepts of probability theory. The primary source is Introduction to Probability theory by Geiss, Christel and Stefan.

## The basis

The idea behind probability theory is to treat uncertainty with mathematical rigour. The first necessary component is a (non-empty) set of events; for example, if rolling a d4, this would be {1, 2, 3, 4}. When arriving to traffic lights, {red, yellow, green} is a passable set of events (at least as the traffic light are around here). The lifetime of an electronic device could have the set {1, 2, 3, 4, …}; that is, the set of natural numbers, which indicates how many days the device functions. If there are news in the radio every half an hour, the time one has to wait for the next news after turning on the radio creates the real line between 0 and 30 minutes; in math, [0, 30[ (0 is included, 30 is not).

### Sigma-algebra

Sigma-algebra is defined by a certain set of properties, which I’ll list a bit later. The idea behind sigma-algebra is to list the collections of events one might want to measure the probability of.

Taking the d4 as an example, the largest possible sigma-algebra is the set of subsets or power set of {1, 2, 3, 4}, which means a set that contains {} (the empty set) {1} {2} {3} {4} {1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4} { 1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4} {1, 2, 3, 4} for a total of 16 sets (16 = 2^4 which is not a coincidence, and it also is the reason for using d4; d6 would have involved 64 subsets).

What if one is only interested in the end result being even or odd? The following sets also form a sigma-algebra: {}, {1, 3}, {2, 4}, {1, 2, 3, 4}.

#### The properties of a sigma-algebra

Let E be a non-empty set. The sigma-algebra of A always contains the empty set and E. The idea is that the propability of nothing happening and that of something happening are always known. In addition, if any subset A of E is part of sigma(E), the complement of A (negation of A) is also part of sigma(E). The idea is that if the probability of A is measurable, the probability of “not A” must also be measurable. Further, if any (finite or countable) group of subsets of E are part of sigma(E), so is their union, which means that for any group of measurable events, one can measure the chance of at least one of them happening.

From these follow a lot of things; see the PDF for more detail on the process and results.

### Probability measure

Probability measure intuitively assign weight to every set that lives in the sigma-algebra. To take the d4 again, the weight added to every outcome is 1/4 (0.25 or 0,25 or 25% for those who fear fractions) if the die is fair (and is rolled fairly). If the die is weighted, the probability of {4} could be 1/2 (50% aka 0,5 aka 0.5), while that of every other number could be 1/6 (about 17% or 0,17 or 0.17).

The rules that probability measures must conform to in order to be called probability measures are as follows: The probability that something happens is 1, which is written P(E) = 1. If A and B are disjoint subsets of E (they are entirely distinct; for example, even and odd number or the number 1 and cats), the probability that something out of A or B happens equals the probability that something out of A happens plus the probability that something out of B happens. In symbols, P(A or B) = P(A) + P(B) for disjoint A, B. This applies to all numerable groups of subsets. The third rule is that the probability that something out of A does not happen equals one minus the probability that something out of A does happen, which is equivalent to P(not A) = 1 – P(A). It follows that the probability of nothing happening is zero.

#### The connection to sigma-algebra

In addition to every probability measure requiring a sigma-algebra to even be defined, there is another connection between the rules that defined them. Every sigma-algebra of E always includes the empty set and E; likewise, the probabilities for both of these are always defined. Likewise, if A is part of sigma(E), “not A” also lives there. Contrast to the fact that if the probability of A is known, so is that of not A. The final part of the connection is that summing up probabilities and taking a union of subsets work in similar way (there is an easy way of making any numerable group of sets disjoint; take the first set as is, take the second but remove any overlap with the first, take the third and remove any parts that overlap with the first or the second part and so forth).

The existence of this connection is obvious; there is no sense in building the definitions in a way that does not produce these connections. Still, they are useful guidelines for remembering the definitions, since it is sufficient to only remember one of them and the other can be deduced from it.

## Elaboration on d4

I won’t build any more theory, since it is well-presented in the lecture notes and this is long enough as is. Go read those if you are really interested and already know some mathematics. The notation that follows is occasionally a bit clumsy, but there are reasons for it. Anything in square brackets indicates a set.

The measurable space ({1, 2, 3, 4}, {{}, {1, 3}, {2, 4}, {1, 2, 3, 4}}) can be used to determine the probabilities of getting an even or odd number with a d4. First, assuming a fair die, the relevant probability measure is defined by P({1, 3}) = 1/2 (it follows that P({2, 4}) = 1/2). The probability of rolling for example 3 is not well-defined, because {3} is not part of the sigma-algebra in use. One can think of this as a friend telling that the result was even or odd, but not what the exact number rolled was. Using the loaded die introduced earlier, the relevant probability measure would be characterised by P({1, 3}) = 2/6 = 1/3 from which it follows that P({2, 4}) = 4/6 = 2/3.

Note that with the measurable space given one could as well flip a coin; it would have two options, though they would be heads and tails, not numbers, but they could be mapped to the real line to give numeric results.