Hi Ed,
I've designed a web-based bingo game (www.paulspages.co.uk/avalaff - it's
just for fun, see below), and recently added some basic odds calculations.
I'm not a mathematician, and found most of the odds-formula explanations on
the Web completely incomprehensible. Yours (the "gambling" page with the Urn
formula and bingo odds example) was the only one that made sense, so thank
you very much!
Now for the predictable bit - I've got a problem (hopefully an interesting
one), and I wonder if you can help me.
I use the Urn formula to predict which draw each game will end on. To do
this I divide the odds for each successive draw into the number of cards in
play for that game. The result nearest to 1 (e.g. 1515/1 with 1500 cards in
play) is my prediction. It's crude, but surprisingly effective - over 100
games its average error is around 1.5 draws. However I'm aware that the
calculations aren't correct for some games, and I wonder if you could help
me, or point me in the right direction.
My game began life as a lampoon of a UK satellite TV gaming channel called
Avago (I wanted to draw attention to the wallet-emptying nature of these
things, but now I've become interested in the game theory). Avago's game
uses a 5 X 5 gamecard grid, with 75 possible numbers. The free square at the
centre is optional. Each column on a card can only contain values from a
15-number range, so the first column has numbers between 1 and 15, the
second between 16 and 30, and so on.
Each game involves matching a specific pattern of squares, and the problem
arises because some of the patterns don't include squares from every column.
This reduces the range of possible matches, while still leaving 75 possible
values being drawn. The most extreme example is a single-column pattern,
where only 15 of the 75 balls are candidate matches.
Intuition tells me that these "subset" patterns should take longer to
complete than ones where all 75 values are candidates. On the other hand,
logic tells me that with a random draw it shouldn't make any difference,
since all possible sequences of numbers are equally likely to be drawn.
It turns out that when running just two cards, the game takes roughly the
same number of draws for single-column and single-row patterns, indicating
that the odds against a single card winning are the same for subset and
full-range patterns. However when running larger numbers of cards (between
500 and 3000 in my game) the single-column pattern takes consistently longer
than the single-row (by 10 - 15 draws), but has a greater tendency to
produce multiple winners. Even I've managed to work out that this is because
the subset patterns make it more likely that multiple cards will need the
same numbers, although I've no idea how to express this mathematically.
So am I right in thinking that subset patterns don't affect the odds against
each card? And how (if at all) can I calculate the effect of subset patterns
on groups of cards? I'd really appreciate any advice you could give me.
Thanks,
Paul Stephens.
PS I saw that you're a Mozilla fan. Unfortunately my game only runs in
Internet Explorer, as I've used a lot of IE-only programming features in the
graphics. Sorry!
Incidentally, in case you're interested, here's my javascript code for the
Urn formula. I've assumed it's correct, as it gives the same results as the
examples on your page. No doubt this can all be reduced down to about two
lines, but as I said, I'm no mathematician. As you can see, I've even kept
to your coloured-balls analogy - didn't dare change a thing, in case it
didn't work!
function urn (red, green, draws, need) {
return (factor(red) / factor(red - need)) *
((factor(green) / factor(green-(draws-need)))) *
((factor(draws) / (factor(draws - need)* factor(need)))) *
(factor((red + green) - draws) / factor(red+green))
}
function factor(num) {
var i, res = 1
for (i = 1; i <= num; i++) {
res = res * i
}
return res
}
----------------------------------------
Which is more likely to come first in Bingo: a horizontal row,
or a vertical row?
On a single card, any bingo is as likely as any other; you need
the five specific numbers on that line to be chosen, and any one is
as likely as any other. (If the center square is free, then the lines
through the center are a bit easier, but since it affects one vertical
line and one horizontal line, it does not change this result. For the
rest of the analysis, I will ignore "free", but it will have little effect
on the results.)
However, bingo is not played on a single card, but on many different cards
at the same time (even many different cards by the same player at one time,
and there are typically many players). In a typical bingo game there may
be hundreds or even thousands of cards in play at once. (I did stop by
Paul's game long enough to sit in for one round, playing the minimum of
2 cards, and there were about 2500 cards in play at that time.)
However, the cards have something in common. Due to Bingo tradition,
the first column only has numbers in the range 1 to 15, the second column
16 to 30, and so on. As a result of this, there is less variability in
the vertical rows across many cards than there is in the horizontal rows.
If you look at the complete space of Bingo cards, there are only 15-choose-5
or 3003 different sets of 5 numbers that can appear in each vertical row of
the Bingo card (though there are 5 different groups of 3003 for the 5 rows).
By contrast, there are 15^5 = 759375 different horizontal rows, of which
each card has 5.
The result of this is that you are more likely to get multiple simultaneous
winners in vertical rows -- either because two cards have the same set of
five numbers in the row (which *must* happen on some cards if you have
more than 3003 of them) or because they have overlapping sets and one of
the overlapping numbers is called last (which becomes likely with much
smaller numbers of cards).
However, since each individual row is as likely as any other, this must
mean that collectively, the horizontal rows are more likely to contain
the first Bingo.
As a typical example, suppose that after 25 numbers have been called,
the distribution of the 25 numbers into the 5 groups 1-15 etc. is
3 in one, 4 in another, 5, 6, and 7. (This is probably the most likely
distribution, since there are 120 permutations of it and fewer of all
the more evenly-distributed ones.) Out of the 3003 different possible
bingo sets in each vertical rows, in two of there it's still impossible
to have a bingo, while in the others there are, respectively, 1, 6, and
21 of the 3003 sets which have already won, or a bit under 1% of all
possible cards. Out of the 759375 possible horizontal sets, there are
3*4*5*6*7 = 2520 winning sets. Since there are 5 of these sets on
each card, this gives about 1.7% winning cards.
Since a typical Bingo has hundreds of cards, the winner is likely to
occur before this. But earlier, the likelihood of a vertical winner is
even slimmer. Suppose only 20 numbers are called in the distribution
2, 3, 4, 5, 6. Then only 1 of the 3003 vertical combinations in one
group and 6 of another are winners (0.23%) while 2*3*4*5*6 = 720
horizontal sets are winners, or about 0.5% of all cards.
A full analysis would have to perform a weighted average over all the
possible distributions of numbers. This includes more widely distributed
sets in which the likelihood of vertical bingos is greater and horizontal
bingos is less, as well as flatter distributions in which horizontal
bingos are more likely and vertical bingos are less likely. But this
gives a rough feel for how the numbers go.
Joseph DeVincentis
--------------------------------------
Hi, Ed,
I've started thinking about the question of what lines are likely to
win in Bingo. I'm pretty sure the horizontal and diagonal 4-lines are
most likely and equiprobable. I'm not sure about the vertical 4-line
versus the horizontal 5-lines (it may depend on the number of boards
in play). Is there any previous work? I didn't notice any references
other than the mention of Paul Stephens's interest.
But in http://www.mathpuzzle.com/gamble.html you write:
> Getting a line in one direction isn't correlated to
> another line, so we can use the same trick as the dice example.
which is false. Events A and B are defined to be independent just
when P(A&B)=P(A)P(B). But if A and B are any two lines (or disjoint
subsets), then P(A&B) < P(A)P(B) (unless there are so few draws that
all are zero). Consider, for instance, the situation after five
draws, when P(A)>0, P(B)>0, and P(A&B)=0.
Still, I suspect the events are approximately independent (especially
after many draws), in which case your analysis is approximately
correct.
Dan Hoey
----------------------------------------
There's gotta be something basic I'm missing about the question (or some
unwarranted assumption I'm making), as it seems obvious to me that the
probability of a horizontal bingo would be the same as that of a
vertical one.
Also, the bingo analysis on your gambling odds page doesn't seem quite
right to me. You list the probability of a line bingo in X draws as:
1 - (1-Urn[5,70,X,5])^8 * (1-Urn[4,71,X,4])^4
According to this, the probability of getting a bingo within the first
71 draws would be 0.957172348. But after 71 draws, a bingo is
guaranteed. (There are only 4 numbers left unpicked, but five rows (or
columns) on the bingo card.
The problem is that there is a correlation between winning and losing on
different lines, but your analysis assumes there isn't. For example,
after five picks, winning one one line guarantees not winning on all the
others. After 71 picks, not winning on four horizontal (or vertical)
lines guarantees winning on the 5th.
On another note, is the Urn[] function just the hypergeometric
distribution calculated differently?
Thanks,
Joseph Kisenwether
--------------------------------------------------
[ Note ... the following is based on a misinterpretation
of Bingo (5x15 board). The analysis contains a few nice
ideas, so I'm posting it. ]
I provide several levels of proof that a column is much likelier to come
before a row. My final analysis will give the exact probabilities:
Level 0:
There are more columns. Each is much easier to attain.
Level 1: (Gives probability at least .84375)
1. If you pick 14 numbers out of your 75, the probability that the top
row will be empty is less than 1/32.
(The probability is 60/75*59/74*...*47/62)
2. If you pick 14 numbers out of your 75, the probability that there is
some empty row is less than 5/32.
(The probability that one of a collection events occurs is at most the
sum of the individual probabilities. There are 5 events, each of
probability less than 1/32.)
3. The probability that a column occurs before a row does is greater than
27/32 = .84375
(Let us suppose the bingo machine breaks down after 61 picks. Since there
are 15 columns, Dirichlet's pigeonhole principle tells us that some
column receives at least 5 balls and is full. However, 2. above tells us
that the probability that there is some row that contains none of the 14
remaining balls is less than 5/32. Therefore with probability greater
than 27/32 we have no full row.)
Level 2: Gives probability .966656
1. The probability that among the first 54 balls there is a full column
is:
Sum_{j=1}^{j=10} Binomial(5*(15-j),21)*(-1)^(j+1)*Binomial(15,j) /
Binomial(75,21)
=82931251895342643/84141409366045768=0.985618
(We use the inclusion-exclusion principle to count the number of ways to
pick 21 (=75-54) balls from 75 so that at least one column is empty.
The Inclusion-Exclusion principle says that given a collection of sets
A1,A2,...,An, the number of elements in their union is calculated by
summing the sizes, subtracting the sum of the sizes of the pairwise
intersections, adding the sizes of the 3-wise intersections and so on.
If A1,A2,...,A15 are the sets of choices of 21 balls from 75 with
column 1,2,...,15 empty, then the size of each j-wise intersection is
Binomial(5*(15-j),21). The (-1)^(k+1) term comes from the alternating
positive, negative nature of the inclusion-exclusion principle and the
Binomial(15,j) from the number of j-wise intersections.)
2. The probability that among the first 54 balls there is a full row is:
Sum_{j=1}^{j=3} Binomial(15*(5-j),21)*(-1)^(j+1)*Binomial(5,j) /
Binomial(75,21)
=62515034815532970771695159=0.018961
3. The probability that among the first 54 balls either a full row occurs
or a full column does not occur is at most 0.033344.
(From the above, a row occurs with probability 0.018961, and a column
does not occur with probability 0.014382, giving a probability of either
of at most 0.033344)
4. There is a probability at least 0.966656 of there being a full column
before there is a full row.
(If the machine breaks down after 54 balls, with probability 1-0.033344=
0.966656, the first full column has happened and the first full row has
not happened.)
Level 3: Gives probability 0.967058
1. The probability that by the ith drawing, you have no full row and a
full column is precisely:
Sum_{k=1}^{k=5}Sum_{l=1}^{l=14}Binomial(5,k)Binomial(15,l)(-1)^(k+l+1)
Binomial(kl,75-i)/Binomial(75,i)
(A slight extension of the inclusion-exclusion principle says that if
G_1,G_2,...,G_g are a collection of good sets and B_1,B_2,...,B_b
are a collection of bad sets, the number of points that are in some good
set but no bad set is calculated by:
summing over all non-empty subcollections of good sets {G_i: i in
gamma} and all possibly-empty subcollections of bad sets {B_i: i in beta}
(-1)^(1+|gamma|+|beta|) multiplied by the size of the total
intersection of the subcollections. Here G_1,G_2,...,G_15 are the good
sets consisting of choices of i points from 75 with column 1,2,...,15
full. B_1,B_2,...,B_5 are the bad sets consiting of choices of i
points from 75 with column 1,2,...,5 full.
To get our sum, we look at non-empty subcollections of 15-l sets (of
which there are Binomial(15,l) for l between 1 and 14 exclusive) and
possibly-empty subcollections of 5-k bad sets (of which there are
Binomial(5,k) for k between 1 and 5 exclusively). Members of the
intersection can be found by choosing the 75-i ommitted points from the
k remaining rows and l remaining columns. Hence Binomial(lk,75-i) is
the size of this intersection.
)
2. The probability of getting a column before a row is at least 0.9670582
(Set i=54.)
Level 4: Gives precise probability of 0.997667
1. The probability that on the ith go on a board of size r x c there is
no full row or full column is:
sum_{k=0}^{k=r}sum_{l=0}^{l=c}Binomial(r,k)Binomial(c,l)(-1)^(k+l+r+c)
Binomial(kl,cr-i)/Binomial(cr,i)
(Simple use of Inclusion-Exclusion Principle)
2. The probability that on the ith go on a board of size r x c the first
full column occurs is, and there is no full row is:
sum_{k=0}^{k=r}sum_{l=0}^{l=c-1}(-1)^(k+l+r+c+1)Binomial(r,k)
Binomial(c-1,l)Binomial(kl,(c-1)r-i)/Binomial(cr-1,i-1)
(It is the probability that the other elements of the column have been
filled: Binomial((c-1)r,i-r)/Binomial(cr-1,i-1)
multiplied by the probability that no full columns or rows exist when
you remove the relevant column.)
3. Therefore the probability that a full column occurs is
sum_{i=0}^{i=rc}sum_{k=0}^{k=r}sum_{l=0}^{l=c-1}
(-1)^(k+l+r+c+1)Binomial(r,k)Binomial(c-1,l)Binomial(kl,(c-1)r-i)/
Binomial(cr-1,i-1).
(Add the above over all i)
4. Similarly the probability that the full row and column occur at the
same time is
sum_{i=0}^{i=rc}sum_{k=0}^{k=r-1}sum_{l=0}^{l=c-1}(-1)^(r+c+k+l)
Binomial(r-1,k)Binomial(c-1,l)Binomial(kl,cr-i)/Binomial(cr-1,i-1)
For the 5x15 rectangle, a column occurs first with probability
502530292980359140939/269109591435396365102064
(approx 0.997667)
a row occurs first with probability
126186432521578618379823831264815079746329159797008
(approx 0.001867)
they occur together with probability
1472155383922610547623/3162037699365907289949252
(approx 0.000466)
The odds: 2147:1 that they occur at the same time, 535:1 that a row
occurs first, 428:1 that a column does not occur first.
L T Pebody