## Mathematical proofs

In which I shall relate a few ways of building mathematical proofs. Should be useful for several kinds of problem solving.

## Basic structure

The basic structure of mathematical problem solving is simple: There are certain assumptions (or axioms or definitions) and a certain desired outcome. The assumptions are used to get the desired outcome.

For example (disregard if not interested; this is a toy example, and the substance continues after it), Bolzano’s theorem states that if there is a continuous function f such that f(a) < 0 and f(b) > 0 (f is a mapping from real line to real line, a < b), then there exists c between a and b such that f(c) = 0.

So, assume that one has a continuous function, such as g(x) = x^2, that has g(0) = 0 and g(2) = 4. Can Blozano’s theorem be used to prove that there exists z between 0 and 2 such that g(z) = 1? As Bolzano’s theorem only says that if the function gets positive and negative values, it also gets zero, it can’t be directly applied. The trick is to define h(x) = g(x) -1. Now h(0) = -1, h (2) = 3, and hence the there is z between 0 and 2 such that h(z) = 0, and hence g(z) = 1. Tacitly I assumed that continuous function minus a constant is still continuous, which would also have to be proven.

### Indirect proof a.k.a. proof by contradiction

Also known as reductio ad absurdum, the idea of an indirect proof is that one assumes that what is being proven is actually false and from that follows a contradiction. The point of indirect proofs is that they give a free assumption to play with, essentially. That is: If one assumes A and B and must prove C, an indirect proof would mean assuming A, B, and “not C”, and find any contradiction.

It is easy to make a mistake in formulating the antithesis (“not C”): Take the definition of a continuous function, which says that for all x, for all positive e there exists a positive d so that if the absolute value of z-x is less than d, then absolute value of f(z)-f(x) is less than e. If something is not continuous, it means that there exists and x for which there exists a positive e such that for all positive d |z-x| < d and |f(z)-f(x)| is at least as large as e. This is a fairly simple concept (continuous function on the real line), which just happens to look scary.

An indirect proof can use any other tool in the box; it gives a free assumption and is never actually harmful, though often useless.

### Constructive proof

A proof can be said to be constructive when there is an item that must be shown to exist and this is done by actually constructing the item in question. Constructive proofs are often cumbersome (if rigorous) and longer than uncontructive ones. The reason for appreciating constructive proofs is that they are concrete: Something is actually built. It makes understanding the proof a lot easier.

For example, a function is (Riemann-)integrable if one can put rectangles below and above it such that as leaner rectangles are used, the approximations grow arbitrarily accurate. The problem is that if the function is not a very simple one, building the rectangles is difficult. Hence people instead learn a bunch of rules and tables so that they don’t need to and can instead do easy calculus. Or, in case of people who study physics, handwave it all away with an infinite number of infinitesimals. (One can treat them rigorously, but…)

One can also name unconstructive proofs, if one wants to. For example, everything that uses the axiom of choice is unconstructive (I know of no exceptions and have hard time figuring out how to create one, but someone will hopefully come and tell me I am wrong). Some hardcore mathematicians only accept constructive proofs; they are consequently known as constructivists, and are a rare breed. The scope of what they can prove is greatly limited.

### Proof by exhaustion

Proof by exhaustion means dealing with every special case in order, one-by-one. Proof by exhaustion is often long, ponderous, boring, and avoided at all costs by most mathematicians. The most famous example if the four colour theorem, which essentially says that given any map, one can colour the nations that share borders using different colours and only use four colours in the process (this means that they must actually have a small bit of common border; single point is not sufficient). It was proven by a computer that essentially went through all the interesting situations. Five- and six-colour theorems can be proven in the conventional way with relative ease.

### Mathematical induction

Normal induction works as follows: The sun has risen every morning that I remember. Hence, it will probably rise tomorrow morning, too. Pretty sensible, though it often goes wrong in nasty ways (every black-skinned person I have met thus far…).

Mathematical induction is used to prove things that apply to all natural numbers (1, 2, 3, …; 0 may or may not be included), or to anything that can be numbered by them, such as the propositions of propositional logic.

For example, the sum 1 + 2 + 3 + … + n = (n+1)*n/2. (E.g. 1 + 2+ 3 + 4 +5 = 6*5/2 = 3*5 = 15).

The first step is to check that the equation works when n = 1. This is often trivial. Particularly: 1 = 2*1/2 is indeed true.

The second step is to assume that the equation works for some natural number k, which is not specified. That is: 1+ 2 + … + k = k*(k+1)/2. This step is not particularly strenous. That is: Assume the equation holds for n = k.

The third step, which is the actual substance of the proof, is to see that the equation now holds for n = k+1. In this specific example, it must be shown that 1 + 2 + … + k + (k+1) = (k+1)*(k+2)/2. The assumption made in step two will be useful here. (k+1)(k+2)/2 = (k+1)*k/2 + (k+1)*2/2. By the assumption in the second step the first term equals 1 + 2 + … + k; that is, the formula looks like 1 + 2 + 3 + … + k + (k+1)*2/2, where the last term is simply k+1. Hence, 1 + … + k + k+1 = (k+2)*(k+1)/2.

The first step established that the equation holds for n = 1. The second and third established that if it works for n = k, it also holds for n = k+1. That is: Because it works in the first case, it must also work in the second case, and hence also in the third case, and so forth. Hence it works for all n, as it well should.

### Proof by handwaving

The most powerful tool wielded in a seminar of lecture, proof by handwaving involves drawing pretty pictures, writing down some equations, and appealing to the common sense of whoever is subjected to the explanation and the vigorous hand movements. Phrases such as “Clearly”, “One can easily show that”, “It can be proven that”, “Gauss has proven that”, “this is left as an exercise for the reader/student” are often used to great effect. Proof by handwaving can even be used to prove false statements, which makes it the strongest method catalogued herein, even if not the most rigorous.

For real life examples, see such achievements in accurate film-craft as everything by Michael Moore and the “documentaries” titled “The great global warming swindle” and “Expelled! No intelligence allowed” (or something to that effect). Even if they are correct in some parts, such facts are established by vigorous handwaving and propaganda, and hence can’t really by trusted. (Disclaimer: I have seen part of the global warming propaganda and a film or two by Moore.)

## Some probability theory

I’ll explain what I see as the point behind some elementary concepts of probability theory. The primary source is Introduction to Probability theory by Geiss, Christel and Stefan.

## The basis

The idea behind probability theory is to treat uncertainty with mathematical rigour. The first necessary component is a (non-empty) set of events; for example, if rolling a d4, this would be {1, 2, 3, 4}. When arriving to traffic lights, {red, yellow, green} is a passable set of events (at least as the traffic light are around here). The lifetime of an electronic device could have the set {1, 2, 3, 4, …}; that is, the set of natural numbers, which indicates how many days the device functions. If there are news in the radio every half an hour, the time one has to wait for the next news after turning on the radio creates the real line between 0 and 30 minutes; in math, [0, 30[ (0 is included, 30 is not).

### Sigma-algebra

Sigma-algebra is defined by a certain set of properties, which I’ll list a bit later. The idea behind sigma-algebra is to list the collections of events one might want to measure the probability of.

Taking the d4 as an example, the largest possible sigma-algebra is the set of subsets or power set of {1, 2, 3, 4}, which means a set that contains {} (the empty set) {1} {2} {3} {4} {1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4} { 1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4} {1, 2, 3, 4} for a total of 16 sets (16 = 2^4 which is not a coincidence, and it also is the reason for using d4; d6 would have involved 64 subsets).

What if one is only interested in the end result being even or odd? The following sets also form a sigma-algebra: {}, {1, 3}, {2, 4}, {1, 2, 3, 4}.

#### The properties of a sigma-algebra

Let E be a non-empty set. The sigma-algebra of A always contains the empty set and E. The idea is that the propability of nothing happening and that of something happening are always known. In addition, if any subset A of E is part of sigma(E), the complement of A (negation of A) is also part of sigma(E). The idea is that if the probability of A is measurable, the probability of “not A” must also be measurable. Further, if any (finite or countable) group of subsets of E are part of sigma(E), so is their union, which means that for any group of measurable events, one can measure the chance of at least one of them happening.

From these follow a lot of things; see the PDF for more detail on the process and results.

### Probability measure

Probability measure intuitively assign weight to every set that lives in the sigma-algebra. To take the d4 again, the weight added to every outcome is 1/4 (0.25 or 0,25 or 25% for those who fear fractions) if the die is fair (and is rolled fairly). If the die is weighted, the probability of {4} could be 1/2 (50% aka 0,5 aka 0.5), while that of every other number could be 1/6 (about 17% or 0,17 or 0.17).

The rules that probability measures must conform to in order to be called probability measures are as follows: The probability that something happens is 1, which is written P(E) = 1. If A and B are disjoint subsets of E (they are entirely distinct; for example, even and odd number or the number 1 and cats), the probability that something out of A or B happens equals the probability that something out of A happens plus the probability that something out of B happens. In symbols, P(A or B) = P(A) + P(B) for disjoint A, B. This applies to all numerable groups of subsets. The third rule is that the probability that something out of A does not happen equals one minus the probability that something out of A does happen, which is equivalent to P(not A) = 1 – P(A). It follows that the probability of nothing happening is zero.

#### The connection to sigma-algebra

In addition to every probability measure requiring a sigma-algebra to even be defined, there is another connection between the rules that defined them. Every sigma-algebra of E always includes the empty set and E; likewise, the probabilities for both of these are always defined. Likewise, if A is part of sigma(E), “not A” also lives there. Contrast to the fact that if the probability of A is known, so is that of not A. The final part of the connection is that summing up probabilities and taking a union of subsets work in similar way (there is an easy way of making any numerable group of sets disjoint; take the first set as is, take the second but remove any overlap with the first, take the third and remove any parts that overlap with the first or the second part and so forth).

The existence of this connection is obvious; there is no sense in building the definitions in a way that does not produce these connections. Still, they are useful guidelines for remembering the definitions, since it is sufficient to only remember one of them and the other can be deduced from it.

## Elaboration on d4

I won’t build any more theory, since it is well-presented in the lecture notes and this is long enough as is. Go read those if you are really interested and already know some mathematics. The notation that follows is occasionally a bit clumsy, but there are reasons for it. Anything in square brackets indicates a set.

The measurable space ({1, 2, 3, 4}, {{}, {1, 3}, {2, 4}, {1, 2, 3, 4}}) can be used to determine the probabilities of getting an even or odd number with a d4. First, assuming a fair die, the relevant probability measure is defined by P({1, 3}) = 1/2 (it follows that P({2, 4}) = 1/2). The probability of rolling for example 3 is not well-defined, because {3} is not part of the sigma-algebra in use. One can think of this as a friend telling that the result was even or odd, but not what the exact number rolled was. Using the loaded die introduced earlier, the relevant probability measure would be characterised by P({1, 3}) = 2/6 = 1/3 from which it follows that P({2, 4}) = 4/6 = 2/3.

Note that with the measurable space given one could as well flip a coin; it would have two options, though they would be heads and tails, not numbers, but they could be mapped to the real line to give numeric results.