# Blog

## Archive

Show me a Random Blog Post**2017**

**2016**

**2015**

**2014**

**2013**

**2012**

## Tags

folding paper folding tube maps london underground platonic solids rhombicuboctahedron raspberry pi weather station programming python php news royal baby probability game show probability christmas flexagons frobel coins reuleaux polygons countdown football world cup sport stickers tennis braiding craft wool emf camp people maths trigonometry logic propositional calculus twitter mathslogicbot oeis pac-man graph theory video games games chalkdust magazine menace machine learning javascript martin gardner reddit national lottery rugby puzzles advent game of life dragon curves fractals pythagoras geometry triangles european cup dates palindromes chalkdust christmas card bubble bobble asteroids final fantasy curvature binary arithmetic bodmas statistics error bars estimation accuracy misleading statistics**2017-02-25 14:50:20**

## The Importance of Estimation Error

Recently, I've noticed a few great examples of misleading uses of numbers in news articles.

On 15 Feb, BBC News published a breaking news article with the headline
"UK unemployment falls by 7,000 to 1.6m".
This fall of 7,000 sounds big; but when compared to the total of 1.6m, it
is insignificant. The change could more accurately be described as a fall from 1.6m to 1.6m.

But there is a greater problem with this figure. In the
original Office of National Statistics (ONS) report,
the fall of 7,000 was accompanied by a 95% confidence interval of ±80,000.
When calculating figures about large populations (such as unemployment levels), it is impossible to ask every person in the UK whether they
are employed or not. Instead, data is gathered from a sample and this is used to estimate the total number. The 95% confidence interval
gives an idea of the accuracy of this estimation: 95% of the time, the true number will lie of the confidence interval. Therefore, we can
think of the 95% confidence interval as being a range in which the figure lies (although this is not true, it is a helpful way to think
about it).

Compared to the size of its confidence interval (±80,000), the fall of 7,000 is almost indistinguishable from zero. This means that it
cannot be said with any confidence whether the unemployment level rose or fell. This is demonstrated in the following diagram.

To be fair to the BBC, the headline of the article changed to "UK wage growth outpaces inflation"
once the article was upgraded from breaking news to a complete article, and a mention of the lack of confidence in the change was added.

On 23 Feb, I noticed another BBC News with misleading figures: Net migration to UK falls by 49,000.
This 49,000 is the difference between
322,000 (net migration for the year ending 2015) and
273,000 (net migration for the year ending 2016).
However both these figures are estimates: in the original ONS report,
they were placed in 95% confidence intervals of ±37,000 and ±41,000 respectively. As can be seen in the diagram below,
there is a significant portion where these intervals overlap, so it cannot be said with any confidence whether or not net immigration actually fell.

Perhaps the blame for this questionable figure lies with the ONS, as it appeared prominently in their report while the discussion of its
accuracy was fairly well hidden. Although I can't shift all blame from the journalists: they should really be investigating the quality of these
figures, however well advertised their accuracy is.

Both articles criticised here appeared on BBC News. This is not due to the BBC being especially bad with figures, but simply due to the
fact that I spend more time reading news on the BBC than in other places, so noticed these figures there. I quick Google search reveals that the unemployment figure was
also reported, with little to no discussion of accuracy, by
The Guardian,
the Financial Times, and
Sky News.

### Similar Posts

Euro 2016 Stickers | How to Kick a Conversion | How Much Will I Win on the New National Lottery? | Tennis Maths |

### Comments

Comments in green were written by me. Comments in blue were not written by me.

**2016-05-04 16:15:21**

## Euro 2016 Stickers

Back in 2014, I calculated the expected cost of
filling Panini world cup sticker album. I found that you should expect to buy
4505 stickers, or 1285 if you order the last 100 from the Panini website (this
includes the last 100). This would cost £413.24 or £133.99
respectively.

Euro 16 is getting close, so it's sticker time again. For the Euro 16
album there are 680 stickers to collect, 40 more than 2014's 640 stickers.
Using the same calculation method as before,
to fill the Euro 16 album, you should expect to buy 4828
stickers (£442.72), or 1400 (£134.32) if you order the last 100.

This, however, does not tell the whole story. Anyone who has collected
stickers as a child or an adult will know that half the fun comes from
swapping your doubles with friends. Getting stickers this way is not
taken into account in the above numbers.

### Simulating a Sticker Collection

Including swaps makes the situation more complicated: too complicated
to easily calculate the expected cost of a full album. Instead, a different
method is needed. The cost of filling an album can be estimated by
simulating the collection lots of times and taking the average of the cost of
filling the album in each simulation. With enough simulations, this estimate
will be very close the the expected cost.

To get an accurate estimation, simulations are run,
calculating the running average as they go, until the running averages after recent simulations
are close together. (In the examples, I look for the four most recent running averages to be within 0.01.)
The plot below shows how the running average changes as more simulations are performed.

The simulations estimate the number of stickers needed as 4500. This is
very close to the 4505 I calculated last year.

Now that the simulations are set up, they can be used to see what happens if you have friends to swap with.

### What Should I Do?

The plots below shows how the number of stickers you need to buy each changes based on how many friends you have.

In both these cases, having friends reduces the number of stickers you need to buy significantly, with your first few friends
making the most difference.

Ordering the last 100 stickers looks to be a better idea than ordering no stickers. But how many stickers should you order to
minimise the cost? When you order stickers, you are guaranteed to get those that you need, but they cost more: ordered stickers cost 14p
each, while stickers in 6 pack multipacks come out at just 9.2p each. The next plot shows how the cost changes based on how many you order.

Each of the coloured curves represents a group of a different size. For each group, ordering no stickers works out the most
expensive—this is expected as so many stickers must be bought to find the last few stickers—and ordering all the stickers also works
out as not the best option. The best number to order
is somewhere in the middle, where the curve reaches its lowest point. The minimum points on each of these curves are summarised in the
next plots:

Again, having friends to swap with dramatically reduces the cost of filling an album. In fact, it will almost definitely pay off in future
swaps if you go out right now and buy starter packs for all your friends...

### Similar Posts

World Cup Stickers | How to Kick a Conversion | Tennis Maths | The Importance of Estimation Error |

### Comments

Comments in green were written by me. Comments in blue were not written by me.

**Add a Comment**

**2015-10-21 14:27:38**

## How to Kick a Conversion

This post also appeared on the Chalkdust Magazine blog.

If you're like me, then you will be disappointed that all of the home nations have been knocked out of the Rugby World Cup. If you're

*really*like me, doing some maths related to rugby will cheer you up...The scoring system in rugby awards points in packets of 3, 5 and 7. This leads a number of interesting questions that you can find in my guest puzzle on Alex Bellos's Guardian blog. In this blog post, we will focus on another area of rugby: conversion kicking.

### Conversion Kicks

When a try is scored by putting the ball down behind the line, the scoring team gets to take a conversion kick. This kick must be taken in line with where the try was scored but it is up to the kicker how far away the kick should be taken. But how far back should the ball be taken to make the kick easiest?

One way to answer this question is to look to maximise the angle between the posts which the kicker will have to aim at: if the kick is taken too close to or too far from the goal line there will be a very thin angle to aim at. Somewhere between these extremes there will be a maximum angle to aim at.

When looking to maximise this angle, we can use one of the 'circle theorems' which have tormented many generations of GCSE maths students: 'angles subtended by the same arc at the circumference are equal'. This means that if a circle is drawn going through both posts, then the angle made at any point on this circle will be the same.

A larger circle drawn through the posts will give a smaller angle. If a vertical line is drawn which just touches the right of the circle, then the point at which it touches the circle will be the best place on this line to take a kick. This is because any other point on the line will be on a larger circle and so make a smaller angle.

Using this method for circles of different sizes leads to the following diagram, which shows where the kick should be taken for every position a try could be scored:

This, however, is not the best place to take the kick.

### Taking Account of Height

When a try is scored near the posts, the above method recommends a position from where the ball must be kicked at an impossibly steep angle to go over. To deal with this problem, we are going to have to look at the situation from the side.

When kicked, the ball will travel along a parabola (ignoring air resistance and wind as their effects will be small

^{[citation needed]}). Given a distance from the posts, there will be two angles which the ball can be kicked at and just make it over the bar. Kicking at any angle between these two will lead to a successful conversion. Again, we have an angle which we would like to maximise.However, the position where this angle is maximised is very unlikely to also maximise the angle we looked at earlier. To find the best place to kick from, we need to find a compromise point where both angles are quite big.

To do this, imagine that the kicker is standing inside a large sphere. For each point on the sphere, kicking the ball at the point will either lead to it going over or missing. We can draw a shape on the sphere so that aiming inside the shape will lead to scoring. Our sensible kicker will aim at the centre of this shape.

But our kicker will not be able to aim perfectly: there will be some random variation. We can predict that this variation will follow a Kent distribution, which is like a normal distribution but on the surface of a sphere. We can use this distribution to calculate the probability that our kicker will score. We would like to maximise this probability.

The Kent distribution can be adjusted to reflect the accuracy of the kicker. Below are the optimal kicking positions for an inaccurate, an average and a very accurate kicker.

As you might expect, the less accurate kicker should stand slightly further forwards to make it easier to aim. Perhaps surprisingly, the good kicker should stand further back when between the posts than when in line with the posts.

The model used to create these results could be further refined. Random variation in the speed of the kick could be introduced. Or the kick could be made to have more variation horizontally than vertically: there are parameters in the Kent distribution which allow this to be easily adjusted. In fact, data from players could be used to determine the best position for each player to kick from.

In addition to analysing conversions, this method could be used to determine the probability of scoring 3 points from any point on the pitch. This could be used in conjunction with the probability of scoring a try from a line-out to decide whether kicking a penalty for the posts or into touch is likely to lead to the most points.

Although estimating the probability of scoring from a line-out is a difficult task. Perhaps this will give you something to think about during the remaining matches of the tournament.

### Similar Posts

Euro 2016 Stickers | How Much Will I Win on the New National Lottery? | Tennis Maths | World Cup Stickers |

### Comments

Comments in green were written by me. Comments in blue were not written by me.

**Add a Comment**

**2015-10-08 04:38:58**

## How Much Will I Win on the New National Lottery?

This post also appeared on the Chalkdust Magazine blog. You can read the excellent second issue of Chalkdust here, including the £100 prize crossnumber which I set.

From today, the National Lottery's Lotto draw has 59 balls instead of 49. You may be thinking that this means there is now much less chance of winning. You would be right, except the prizes are also changing.

Camelot, who run the lottery, are saying that you are now "more likely to win a prize" and "more likely to become a millionaire". But what do these changes actually mean?

### The Changes

Until yesterday, Lotto had 49 balls. From today, there are 59 balls. Each ticket still has six numbers on it and six numbers, plus a bonus ball, are still chosen by the lottery machine. The old prizes were as follows:

Requirement | Estimated Prize |

Match all 6 normal balls | £2,000,000 |

Match 5 normal balls and the bonus ball | £50,000 |

Match 5 normal balls | £1,000 |

Match 4 normal balls | £100 |

Match 3 normal balls | £25 |

50 randomly picked tickets | £20,000 |

The prizes have changed to:

Requirement | Estimated Prize |

Match all 6 normal balls | £2,000,000 |

Match 5 normal balls and the bonus ball | £50,000 |

Match 5 normal balls | £1,000 |

Match 4 normal balls | £100 |

Match 3 normal balls | £25 |

Match 2 normal balls | Free lucky dip entry in next Lotto draw |

One randomly picked ticket | £1,000,000 |

20 other randomly picked tickets | £20,000 |

### Probability of Winning a Prize

The probability of winning each of these prizes can be calculated. For example, the probability of matching all 6 balls in the new lotto is $$\mathbb{P}(\mathrm{matching\ ball\ 1})\times \mathbb{P}(\mathrm{matching\ ball\ 2})\times...\times\mathbb{P}(\mathrm{matching\ ball\ 6})$$ $$=\frac{6}{59}\times\frac{5}{58}\times\frac{4}{57}\times\frac{3}{56}\times\frac{2}{55}\times\frac{1}{54}$$ $$=\frac{1}{45057474},$$ and the probability of matching 4 balls in the new lotto is $$(\mathrm{number\ of\ different\ ways\ of\ picking\ four\ balls\ out\ of\ six})\times\mathbb{P}(\mathrm{matching\ ball\ 1})\times\\...\times\mathbb{P}(\mathrm{matching\ ball\ 4})\times\mathbb{P}(\mathrm{not\ matching\ ball\ 5})\times\mathbb{P}(\mathrm{not\ matching\ ball\ 6})$$ $$=15\times\frac{6}{59}\times\frac{5}{58}\times\frac{4}{57}\times\frac{3}{56}\times\frac{53}{55}\times\frac{52}{54}$$ $$=\frac{3445}{7509579}.$$ In the second calculation, it is important to include the probabilities of not matching the other balls to prevent double counting the cases when more than 4 balls are matched.

Calculating a probability for every prize and then adding them up gives the probability of winning a prize. In the old draw, the probability of winning a prize was \(0.0186\). In the new draw, it is \(0.1083\). So Camelot are correct in claiming that you are now more likely to win a prize.

But not all prizes are equal: these probabilities do not take into account the values of the prizes. To analyse the actual winnings, we're going to have to look at the expected amount of money you will win. But first, let's look at Camelot's other claim: that under the new rules you are more likely to become a millionaire.

### Probability of Winning £1,000,000

In the old draw, the only way to win a million pounds was to match all six balls. The probability of this happening was \(0.00000007151\) or \(7.151\times 10^{-8}\).

In the new lottery, a million pounds can be won either by matching all six balls or by winning the millionaire raffle. This will lead to different probabilities of winning on Wednesdays and Saturdays due to different numbers of people buying tickets. Based on expected sales of 16.5 million tickets on Saturdays and 8.5 million tickets on Wednesdays, the chances of becoming a millionaire on a Wednesday or Saturday are \(0.0000001398\) (\(1.398\times 10^{-7}\)) and \(0.00000008280\) (\(8.280\times 10^{-8}\)) respectively.

These are both higher than the probability of winning a million in the old draw, so again Camelot are correct: you are now more likely to become a millionaire...

But the new chances of becoming a millionaire are actually even higher. The probabilities given above are the chances of winning a million in a given draw. But if two balls are matched, you win a lucky dip: you could win a million in the next draw without buying another ticket. We should include this in the probability calculated above, as you are still becoming a millionaire due to the original ticket you bought.

In order to count this, let \(A_W\) and \(A_S\) be the probabilities of winning a million in a given draw (as given above) on a Wednesday or a Saturday, let \(B_W\) and \(B_S\) be the probabilities of winning a million in this draw or due to future lucky dip tickets on a Wednesday or a Saturday (the values we want to find) and let \(p\) be the probability of matching two balls. We can write $$B_W=A_W+pB_S$$ and $$B_S=A_S+pB_W$$ since the probability of winning a million is the probability of winning in this draw (\(A\)) plus the probability of winning a lucky dip ticket and winning in the next draw (\(pB\)). Substituting and rearranging, we get $$B_W=\frac{A_W+pA_S}{1-p^2}$$ and $$B_W=\frac{A_S+pA_W}{1-p^2}.$$

Using this (and the values of \(A_S\) and \(A_W\) calculated earlier) gives us probabilities of \(0.0000001493\) (\(1.493\times 10^{-7}\)) and \(0.00000009736\) (\(9.736\times 10^{-8}\)) of becoming a millionaire on a Wednesday and a Saturday respectively. These are both significantly higher than the probability of becoming a millionaire in the old draw (\(7.151\times 10^{-8}\)).

Camelot's two claims—that you are more likely to win a prize and you are more likely to become a millionaire—are both correct. It sounds like the new lottery is a great deal, but so far we have not taken into account the size of the prizes you will win and have only shown that a very rare event will become slightly less rare. Probably the best way to measure how good a lottery is is by working out the amount of money you should expect to win, so let's now look at that.

### Expected Prize Money

To find the expected prize money, we must multiply the value of each prize by the probability of winning that prize and then add them up, or, in other words,

$$\sum_\mathrm{prizes}\mathrm{value\ of\ prize}\times\mathbb{P}(\mathrm{winning\ prize}).$$
Once this has been calculated, the chance of winning due to a free lucky dip entry must be taken into account as above.

In the old draw, after buying a ticket for £2, you could expect to win 78p or 83p on a Wednesday or Saturday respectively. In the new draw, the expected winnings have changed to 58p and 50p (Wednesday and Saturday respectively). Expressed in this way, it can be seen that although the headline changes look good, the overall value for money of the lottery has significantly decreased.

Looking on the bright side, this does mean that the lottery will make even more money that it can put towards charitable causes: the lottery remains an excellent way to donate your money to worthy charities!

### Similar Posts

How to Kick a Conversion | "Uncanny" Royal Coincidence | Dragon Curves II | The Importance of Estimation Error |

### Comments

Comments in green were written by me. Comments in blue were not written by me.

**Add a Comment**

**2014-06-21 08:47:00**

## Tennis Maths

### The Smallest Share of Points & Serving Stats

With World Cup fever taking over, you may have forgotten that Wimbledon is just a few days away.

### Tennis Scoring

Tennis matches are split into sets (three sets for ladies' matches, five sets for men's), which are in turn split into games. The players take it in turns to serve for a game. The scoring in a game is probably best explained with a flowchart (click to enlarge):

To win a set, a player must win at least six games and two more games than their opponent. If the score reaches six games all, then a tie break is played. In this tie break, the first player to win at least seven points and two points more than their opponent wins. In the final set there is no tie break, so matches can last a long time.

### Winning with the Smallest Share of Points

Due to the way tennis is split into sets and games, the player who wins the most points will not necessarily win the match. This got me thinking: what is the smallest proportion of points which can be won while still winning the tennis match?

First, let's consider a men's match. In order to win with the lowest proportion of points, our player should let his opponent win two sets without winning a point and win the other three sets. In the two lost sets, the opponent should win 0-6 taking every point: in total the opponent will win 48 points in these sets.

Leaving the final set for now, the other two sets are won by our player. To win these with the smallest proportion of the points, they should be won 7-6 on a tie break. In the 6 lost games, the opponent should take all the points. In the won games and the tie break, our player should win by two points with the lowest total score. (Winning with more than the lowest total score will mean both players win an equal number of extra points, moving the proportion of points our player wins closer to 50%, higher than it needs to be.)

Therefore, our player will win 4 points out of 6 in the games he wins, win 0 out of 4 points in the games he loses and wins the tie break 7 points to 5. This means that in total our player will 62 points out of 144 in the two won sets.

For the same reason as above, the final set should be won with the lowest total score: 6-4. Using the same scores for each game, our player wins 24 points out of 52.

Overall, our player has won 86 points out of 244, a mere

**35%**of the points.If the match is a ladies' match then the same analysis will work, but with each player winning one less set. This gives our player 55 points out of 148,

**37%**of the points.This result demonstrates why tennis remains exciting through the whole match. The way tennis is split into sets and games means that our opponent can win 65% of the points but if the pressure gets to them at the most important points, our player can still win the match. This makes for a far more interesting competition than a simple race to one hundred points which could quickly become a foregone conclusion.

### Comparing Players with Serving Stats

During tennis matches, players are often compared using statistics such as the percentages of serves which are successful. Imagine a match between Player A and Player B.

In the first set, Player A and Player B are successful with 100% and 92% of their serves respectively. In the second set, these figures are 56% and 48%. Player A clearly looks to be the better server, as they have a higher percentage in each set. However if we look at the two sets in more detail:

Player A | Player B | |

First Set | 20/20 | 67/73 |

Second Set | 45/80 | 13/27 |

Total | 65/100 | 80/100 |

Overall, Player B has an 80% serve success rate, while Player A only manages 65%.

This is an example of Simpson's paradox: a trend which appears in the set-by-set data disappears when the data is combined. This occurs because when we look at the set-by-set percentages, the total number of serves is not taken into account: Player A served more in the second set so their overall percentage will be closer to 56%; Player B served more in the first set so their overall percentage will be closer to 92%.

### Similar Posts

Euro 2016 Stickers | How to Kick a Conversion | World Cup Stickers | The Importance of Estimation Error |

### Comments

Comments in green were written by me. Comments in blue were not written by me.

**Add a Comment**

Add a Comment