## Mathematics is and is not formalistic

Some philosophy of mathematics. And of philosophy.

The question I will cover is: Is mathematics purely formal manipulation of strings of symbols?

First, the trivial answers: Yes, since all results, definitions and theorems could in principle be reduced to such. No, because doings mathematics takes imagination, creativity, and whatever other positive-sounding things I would like to mention. (It also requires manipulation of formal systems, occasionally known as abstract nonsense or computation or calculation.)

There was this philosopher named Immanuel Kant. He proved (by very nonrigorous standards and with many hidden assumptions) a number of seemingly contradictory results called antinomies (of pure reason), such that the world is infinite and finite both. I stated something similar above, but it is fairly easy to see that I cheated, since my two statements are actually about different things.

Mathematics is formalistic in that the mathematical results: theorems, definitions and proofs – can be stated formalistically. Mathematics as in the activity that mathematicians do is not so, as it involves building and understanding (often complicated) arguments, seeing analogies, seeing past analogies, and other quite non-formalistic activities.

There is more to the question. Namely, is mathematics something discovered by or created by people? (One could ask this of natural sciences, I think.) Mathematics is discovered in that all the theorems and results are fixed, given a particular set of axioms (which means assumptions): Given proposition is true or false (or undeterminable due to Gödel’s incompleteness or even meaningless in the same sense that not every string of letters is a word or has meaning) based on the axioms in use.

Mathematics is created in that the particular results achieved, axiomatics considered, and so on, are determined by the questions mathematicians ask and answer. Physics, statistics, and various other fields of inquiry motivate plenty of mathematics, for example.

What is our conclusion? It is that the answer one finds is not important, but what philosophy does is highlight concepts that are hard to separate otherwise. This is what I find useful in philosophy. This is what I wish more philosophers recognised.

(Post inspired by this one.)

## Mathematical proofs

In which I shall relate a few ways of building mathematical proofs. Should be useful for several kinds of problem solving.

## Basic structure

The basic structure of mathematical problem solving is simple: There are certain assumptions (or axioms or definitions) and a certain desired outcome. The assumptions are used to get the desired outcome.

For example (disregard if not interested; this is a toy example, and the substance continues after it), Bolzano’s theorem states that if there is a continuous function f such that f(a) < 0 and f(b) > 0 (f is a mapping from real line to real line, a < b), then there exists c between a and b such that f(c) = 0.

So, assume that one has a continuous function, such as g(x) = x^2, that has g(0) = 0 and g(2) = 4. Can Blozano’s theorem be used to prove that there exists z between 0 and 2 such that g(z) = 1? As Bolzano’s theorem only says that if the function gets positive and negative values, it also gets zero, it can’t be directly applied. The trick is to define h(x) = g(x) -1. Now h(0) = -1, h (2) = 3, and hence the there is z between 0 and 2 such that h(z) = 0, and hence g(z) = 1. Tacitly I assumed that continuous function minus a constant is still continuous, which would also have to be proven.

### Indirect proof a.k.a. proof by contradiction

Also known as reductio ad absurdum, the idea of an indirect proof is that one assumes that what is being proven is actually false and from that follows a contradiction. The point of indirect proofs is that they give a free assumption to play with, essentially. That is: If one assumes A and B and must prove C, an indirect proof would mean assuming A, B, and “not C”, and find any contradiction.

It is easy to make a mistake in formulating the antithesis (“not C”): Take the definition of a continuous function, which says that for all x, for all positive e there exists a positive d so that if the absolute value of z-x is less than d, then absolute value of f(z)-f(x) is less than e. If something is not continuous, it means that there exists and x for which there exists a positive e such that for all positive d |z-x| < d and |f(z)-f(x)| is at least as large as e. This is a fairly simple concept (continuous function on the real line), which just happens to look scary.

An indirect proof can use any other tool in the box; it gives a free assumption and is never actually harmful, though often useless.

### Constructive proof

A proof can be said to be constructive when there is an item that must be shown to exist and this is done by actually constructing the item in question. Constructive proofs are often cumbersome (if rigorous) and longer than uncontructive ones. The reason for appreciating constructive proofs is that they are concrete: Something is actually built. It makes understanding the proof a lot easier.

For example, a function is (Riemann-)integrable if one can put rectangles below and above it such that as leaner rectangles are used, the approximations grow arbitrarily accurate. The problem is that if the function is not a very simple one, building the rectangles is difficult. Hence people instead learn a bunch of rules and tables so that they don’t need to and can instead do easy calculus. Or, in case of people who study physics, handwave it all away with an infinite number of infinitesimals. (One can treat them rigorously, but…)

One can also name unconstructive proofs, if one wants to. For example, everything that uses the axiom of choice is unconstructive (I know of no exceptions and have hard time figuring out how to create one, but someone will hopefully come and tell me I am wrong). Some hardcore mathematicians only accept constructive proofs; they are consequently known as constructivists, and are a rare breed. The scope of what they can prove is greatly limited.

### Proof by exhaustion

Proof by exhaustion means dealing with every special case in order, one-by-one. Proof by exhaustion is often long, ponderous, boring, and avoided at all costs by most mathematicians. The most famous example if the four colour theorem, which essentially says that given any map, one can colour the nations that share borders using different colours and only use four colours in the process (this means that they must actually have a small bit of common border; single point is not sufficient). It was proven by a computer that essentially went through all the interesting situations. Five- and six-colour theorems can be proven in the conventional way with relative ease.

### Mathematical induction

Normal induction works as follows: The sun has risen every morning that I remember. Hence, it will probably rise tomorrow morning, too. Pretty sensible, though it often goes wrong in nasty ways (every black-skinned person I have met thus far…).

Mathematical induction is used to prove things that apply to all natural numbers (1, 2, 3, …; 0 may or may not be included), or to anything that can be numbered by them, such as the propositions of propositional logic.

For example, the sum 1 + 2 + 3 + … + n = (n+1)*n/2. (E.g. 1 + 2+ 3 + 4 +5 = 6*5/2 = 3*5 = 15).

The first step is to check that the equation works when n = 1. This is often trivial. Particularly: 1 = 2*1/2 is indeed true.

The second step is to assume that the equation works for some natural number k, which is not specified. That is: 1+ 2 + … + k = k*(k+1)/2. This step is not particularly strenous. That is: Assume the equation holds for n = k.

The third step, which is the actual substance of the proof, is to see that the equation now holds for n = k+1. In this specific example, it must be shown that 1 + 2 + … + k + (k+1) = (k+1)*(k+2)/2. The assumption made in step two will be useful here. (k+1)(k+2)/2 = (k+1)*k/2 + (k+1)*2/2. By the assumption in the second step the first term equals 1 + 2 + … + k; that is, the formula looks like 1 + 2 + 3 + … + k + (k+1)*2/2, where the last term is simply k+1. Hence, 1 + … + k + k+1 = (k+2)*(k+1)/2.

The first step established that the equation holds for n = 1. The second and third established that if it works for n = k, it also holds for n = k+1. That is: Because it works in the first case, it must also work in the second case, and hence also in the third case, and so forth. Hence it works for all n, as it well should.

### Proof by handwaving

The most powerful tool wielded in a seminar of lecture, proof by handwaving involves drawing pretty pictures, writing down some equations, and appealing to the common sense of whoever is subjected to the explanation and the vigorous hand movements. Phrases such as “Clearly”, “One can easily show that”, “It can be proven that”, “Gauss has proven that”, “this is left as an exercise for the reader/student” are often used to great effect. Proof by handwaving can even be used to prove false statements, which makes it the strongest method catalogued herein, even if not the most rigorous.

For real life examples, see such achievements in accurate film-craft as everything by Michael Moore and the “documentaries” titled “The great global warming swindle” and “Expelled! No intelligence allowed” (or something to that effect). Even if they are correct in some parts, such facts are established by vigorous handwaving and propaganda, and hence can’t really by trusted. (Disclaimer: I have seen part of the global warming propaganda and a film or two by Moore.)

## Some probability theory

I’ll explain what I see as the point behind some elementary concepts of probability theory. The primary source is Introduction to Probability theory by Geiss, Christel and Stefan.

## The basis

The idea behind probability theory is to treat uncertainty with mathematical rigour. The first necessary component is a (non-empty) set of events; for example, if rolling a d4, this would be {1, 2, 3, 4}. When arriving to traffic lights, {red, yellow, green} is a passable set of events (at least as the traffic light are around here). The lifetime of an electronic device could have the set {1, 2, 3, 4, …}; that is, the set of natural numbers, which indicates how many days the device functions. If there are news in the radio every half an hour, the time one has to wait for the next news after turning on the radio creates the real line between 0 and 30 minutes; in math, [0, 30[ (0 is included, 30 is not).

### Sigma-algebra

Sigma-algebra is defined by a certain set of properties, which I’ll list a bit later. The idea behind sigma-algebra is to list the collections of events one might want to measure the probability of.

Taking the d4 as an example, the largest possible sigma-algebra is the set of subsets or power set of {1, 2, 3, 4}, which means a set that contains {} (the empty set) {1} {2} {3} {4} {1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4} { 1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4} {1, 2, 3, 4} for a total of 16 sets (16 = 2^4 which is not a coincidence, and it also is the reason for using d4; d6 would have involved 64 subsets).

What if one is only interested in the end result being even or odd? The following sets also form a sigma-algebra: {}, {1, 3}, {2, 4}, {1, 2, 3, 4}.

#### The properties of a sigma-algebra

Let E be a non-empty set. The sigma-algebra of A always contains the empty set and E. The idea is that the propability of nothing happening and that of something happening are always known. In addition, if any subset A of E is part of sigma(E), the complement of A (negation of A) is also part of sigma(E). The idea is that if the probability of A is measurable, the probability of “not A” must also be measurable. Further, if any (finite or countable) group of subsets of E are part of sigma(E), so is their union, which means that for any group of measurable events, one can measure the chance of at least one of them happening.

From these follow a lot of things; see the PDF for more detail on the process and results.

### Probability measure

Probability measure intuitively assign weight to every set that lives in the sigma-algebra. To take the d4 again, the weight added to every outcome is 1/4 (0.25 or 0,25 or 25% for those who fear fractions) if the die is fair (and is rolled fairly). If the die is weighted, the probability of {4} could be 1/2 (50% aka 0,5 aka 0.5), while that of every other number could be 1/6 (about 17% or 0,17 or 0.17).

The rules that probability measures must conform to in order to be called probability measures are as follows: The probability that something happens is 1, which is written P(E) = 1. If A and B are disjoint subsets of E (they are entirely distinct; for example, even and odd number or the number 1 and cats), the probability that something out of A or B happens equals the probability that something out of A happens plus the probability that something out of B happens. In symbols, P(A or B) = P(A) + P(B) for disjoint A, B. This applies to all numerable groups of subsets. The third rule is that the probability that something out of A does not happen equals one minus the probability that something out of A does happen, which is equivalent to P(not A) = 1 – P(A). It follows that the probability of nothing happening is zero.

#### The connection to sigma-algebra

In addition to every probability measure requiring a sigma-algebra to even be defined, there is another connection between the rules that defined them. Every sigma-algebra of E always includes the empty set and E; likewise, the probabilities for both of these are always defined. Likewise, if A is part of sigma(E), “not A” also lives there. Contrast to the fact that if the probability of A is known, so is that of not A. The final part of the connection is that summing up probabilities and taking a union of subsets work in similar way (there is an easy way of making any numerable group of sets disjoint; take the first set as is, take the second but remove any overlap with the first, take the third and remove any parts that overlap with the first or the second part and so forth).

The existence of this connection is obvious; there is no sense in building the definitions in a way that does not produce these connections. Still, they are useful guidelines for remembering the definitions, since it is sufficient to only remember one of them and the other can be deduced from it.

## Elaboration on d4

I won’t build any more theory, since it is well-presented in the lecture notes and this is long enough as is. Go read those if you are really interested and already know some mathematics. The notation that follows is occasionally a bit clumsy, but there are reasons for it. Anything in square brackets indicates a set.

The measurable space ({1, 2, 3, 4}, {{}, {1, 3}, {2, 4}, {1, 2, 3, 4}}) can be used to determine the probabilities of getting an even or odd number with a d4. First, assuming a fair die, the relevant probability measure is defined by P({1, 3}) = 1/2 (it follows that P({2, 4}) = 1/2). The probability of rolling for example 3 is not well-defined, because {3} is not part of the sigma-algebra in use. One can think of this as a friend telling that the result was even or odd, but not what the exact number rolled was. Using the loaded die introduced earlier, the relevant probability measure would be characterised by P({1, 3}) = 2/6 = 1/3 from which it follows that P({2, 4}) = 4/6 = 2/3.

Note that with the measurable space given one could as well flip a coin; it would have two options, though they would be heads and tails, not numbers, but they could be mapped to the real line to give numeric results.

## Teaching is like game mastering

I spend this weekend prepping s person for the mathematics part of the matriculation examination. The student was a pretty smart guy, but he did not have much routine in twisting numbers and letters around.

I could not help noticing some parallels between teaching and running games.

### The matter of preparation

When running a game, I have some possible events that I can confront the player characters with. Extensive planning is just work and encourages railroading (when done by me). I also learn the system well enough so that I don’t need to check the books that often, if at all.

When teaching, I had some ideas about what subjects to handle and possibly some specific problems to solve or tricks to show. I did not build a script of the teaching session. I did go through the MAOL formula book (contains most of the formulae used in the matriculation examination and can evn be used in said examination). I still have an excellent handle of almost all of the material, differential equations excluded. Maybe I’ll ask Thalin to give me a quick summary or alternative check out some literature. Anyway. I familiarised myself with the “rules”. The parallels are clear, I hope.

### The flow of the session

When running a game, I usually have prepared enough toget the thing going and then follow up with improvisation that springs from player choices and the dice. This leads to quite dynamic gameplay, but does have some drawbacks, too. One relevant strength is that I can change the direction or emphasis of the game based on player input, verbal or nonverbal. This would work a lot better if I actually noticed the nonverbal cues, as opposed to what I do now.

When teaching, I had something to start with (nested functions, understanding derivatives to be the rate of change or angle of a graph). I explained something from different angles until the Samuli, the student, understood what I was talking about. If a difficult problem came up, I constructed a series of problems so that it started with very simple and basic, every problem was a bit more complicated than the previous one, and the difficult situation was culmination of the series. The longest series was probably three or four, so nothing terribly elaborate. Likewise, if something was easy, we could skip past it and get on to the harder stuff.

### The skills required

Roleplaying and game mastering are both skills. They can be learned, improved, or get rusty. Ditto with teaching.

I think the following skills are all central to both running a game and teaching, if interpreted sufficiently broadly:

- Gauging interest: Are people yawning or eager and focused?
- Building suitable obstacles/problems that are not trivial, yet are not too difficult.
- Clear communication: Explaining/describing things so that shared understanding of the subject matter/fiction is built. See anchors by Bruce (Tumac).
- Leaving room for the student/players: Teacher/GM knows more about the problem/obstacle than the student/players does/do. Yet, the student is the one who should solve the problem, and players the ones who deal with an obstacle. No use setting up a problem or obstacle and then solving it by yourself or having a GMPC solve all problems.
- Judgement: When a solution or mode of action is suggested, teacher/GM is the one who judges how well it works. Sometimes a simple mistake is done, sometimes the solution is flawless. GM does have the luxury of letting dice arbitrate some events, but even then the difficulty or modifiers of those rolls are up to the GM. (There are some games where this is not true. E.g. Primetime adventures, Beast hunters, Agon. This all IIRC; I have never played any of those.)
- Quick thinking. This one is obvious and general enough to not be worth extensive commentary.

### Other styles

I had the luxury of only teaching a single person. This is good. It is very exhaustive. I’d say that teaching a group demands slower pace and is probably more conductive to preparing. Reasoning: If everyone must understand a subject, more numerous angles of presentation are useful. It is often hard to improvise multiple ways of doing the same thing (at least for me it is). Hence, preparing them ahead of them is something between viable and necessary.

Does this apply to roleplaying? Does a larger number of players imply easier or more useful preparation?

In my experience (I have never run a game with many players; four is probably the upper limit), solo games move fast. It takes huge amounts of prep to not have to improvise in solo game. When there are many players, OOG chatter is more prominent, the characters interact with each other, and generally everything takes more time. This means that less content is used in the same time and hence preparation is more possible.

On the other hand, multiple player characters means more complex plans can be formulated and practically achieved. I’d say that the time planning takes means that adjudicating such is not a huge burden, as opposed to the rapid-fire approach a single player is likely to take.

Also, when there are several PCs, the material can be more generic, because nobody assumes that every moment of the game is relevant to every character in a very personal and intimate way. I hope.

### Conclusions and a warning

There are clear parallels between teaching and game mastering. The different styles, prep-wise, are quite similar. (Sandbox play would be roughly equivalent to having a huge menu of problems for the studen to choose among; there is similarity between a textbook and a published setting).

The warning part: Don’t use roleplaying to teach a lesson. Like, “greed is teh evilness”. Punish every character that does a greedy thing, reward every generous action. Players will see it and resent it. Be open about such an undertone and add it as a setting or rules premise, like a setting where the generous are held in esteem and greed as a sign of possession by evil spirits. Let players challenge the notions and don’t fiat a conclusion you like. Provide a playground, not a brainwashing session.

This applies to teaching ethics, too. Teach something and people will resist it just because. Give something that people can interact with and they will form their own opinions about it and actually learn something.

### Disclaimer

I have no training in pedagogy. Take my opinions with a grain of salt. They are opinions, not facts.

## A diversion: On square circles

This is a semiformal proof. A formal one would be hard to understand without pictures and would require me checking out the English translations of a number of words, which I am not inclined to do.

A square is defined as an ordered set of four points, no three of which are on the same line, and further that the angles thus created are all right (there is a bit more, but it is not too relevant). Circle is defined as a set of points that are from given distance (radius) from a given point. Because a square does not exist in hyperbolic geometry, it is sufficient to think about Euclidean geometry. Assume that there is a square circle. Let A and B be two points of the circle that are part of the same edge (i.e. that edge does not go through the circle’s center). Let C be a point that is between A and B. Distance from The center of the circle to C is lesser than the distance to A or B, so it is also lesser than the radius. Accordin to one axiom I am a bit too lazy to check out, it is possible to find D so that D is behind C when observed from the circle’s center and D’s distance from the center is the circle’s radius. Hence D is part of the circle. D is not anyof the original points that defined the circle, because if it was either of the unnamed ones, the definition of rectangle would get messy, and if it were A or B, C would also be A or B, which contradicts the choice of C. Hence, there are no square circles. Qued est demonstratum.

This diversion due to another discussion.

## The difference between philosophy and math

Mathematics is the art of proving p.

Philosophy is the art of justifying p.

A proof is almost always a justification.