Hi Ed, I've designed a web-based bingo game (www.paulspages.co.uk/avalaff - it's just for fun, see below), and recently added some basic odds calculations. I'm not a mathematician, and found most of the odds-formula explanations on the Web completely incomprehensible. Yours (the "gambling" page with the Urn formula and bingo odds example) was the only one that made sense, so thank you very much! Now for the predictable bit - I've got a problem (hopefully an interesting one), and I wonder if you can help me. I use the Urn formula to predict which draw each game will end on. To do this I divide the odds for each successive draw into the number of cards in play for that game. The result nearest to 1 (e.g. 1515/1 with 1500 cards in play) is my prediction. It's crude, but surprisingly effective - over 100 games its average error is around 1.5 draws. However I'm aware that the calculations aren't correct for some games, and I wonder if you could help me, or point me in the right direction. My game began life as a lampoon of a UK satellite TV gaming channel called Avago (I wanted to draw attention to the wallet-emptying nature of these things, but now I've become interested in the game theory). Avago's game uses a 5 X 5 gamecard grid, with 75 possible numbers. The free square at the centre is optional. Each column on a card can only contain values from a 15-number range, so the first column has numbers between 1 and 15, the second between 16 and 30, and so on. Each game involves matching a specific pattern of squares, and the problem arises because some of the patterns don't include squares from every column. This reduces the range of possible matches, while still leaving 75 possible values being drawn. The most extreme example is a single-column pattern, where only 15 of the 75 balls are candidate matches. Intuition tells me that these "subset" patterns should take longer to complete than ones where all 75 values are candidates. On the other hand, logic tells me that with a random draw it shouldn't make any difference, since all possible sequences of numbers are equally likely to be drawn. It turns out that when running just two cards, the game takes roughly the same number of draws for single-column and single-row patterns, indicating that the odds against a single card winning are the same for subset and full-range patterns. However when running larger numbers of cards (between 500 and 3000 in my game) the single-column pattern takes consistently longer than the single-row (by 10 - 15 draws), but has a greater tendency to produce multiple winners. Even I've managed to work out that this is because the subset patterns make it more likely that multiple cards will need the same numbers, although I've no idea how to express this mathematically. So am I right in thinking that subset patterns don't affect the odds against each card? And how (if at all) can I calculate the effect of subset patterns on groups of cards? I'd really appreciate any advice you could give me. Thanks, Paul Stephens. PS I saw that you're a Mozilla fan. Unfortunately my game only runs in Internet Explorer, as I've used a lot of IE-only programming features in the graphics. Sorry! Incidentally, in case you're interested, here's my javascript code for the Urn formula. I've assumed it's correct, as it gives the same results as the examples on your page. No doubt this can all be reduced down to about two lines, but as I said, I'm no mathematician. As you can see, I've even kept to your coloured-balls analogy - didn't dare change a thing, in case it didn't work! function urn (red, green, draws, need) { return (factor(red) / factor(red - need)) * ((factor(green) / factor(green-(draws-need)))) * ((factor(draws) / (factor(draws - need)* factor(need)))) * (factor((red + green) - draws) / factor(red+green)) } function factor(num) { var i, res = 1 for (i = 1; i <= num; i++) { res = res * i } return res } ---------------------------------------- Which is more likely to come first in Bingo: a horizontal row, or a vertical row? On a single card, any bingo is as likely as any other; you need the five specific numbers on that line to be chosen, and any one is as likely as any other. (If the center square is free, then the lines through the center are a bit easier, but since it affects one vertical line and one horizontal line, it does not change this result. For the rest of the analysis, I will ignore "free", but it will have little effect on the results.) However, bingo is not played on a single card, but on many different cards at the same time (even many different cards by the same player at one time, and there are typically many players). In a typical bingo game there may be hundreds or even thousands of cards in play at once. (I did stop by Paul's game long enough to sit in for one round, playing the minimum of 2 cards, and there were about 2500 cards in play at that time.) However, the cards have something in common. Due to Bingo tradition, the first column only has numbers in the range 1 to 15, the second column 16 to 30, and so on. As a result of this, there is less variability in the vertical rows across many cards than there is in the horizontal rows. If you look at the complete space of Bingo cards, there are only 15-choose-5 or 3003 different sets of 5 numbers that can appear in each vertical row of the Bingo card (though there are 5 different groups of 3003 for the 5 rows). By contrast, there are 15^5 = 759375 different horizontal rows, of which each card has 5. The result of this is that you are more likely to get multiple simultaneous winners in vertical rows -- either because two cards have the same set of five numbers in the row (which *must* happen on some cards if you have more than 3003 of them) or because they have overlapping sets and one of the overlapping numbers is called last (which becomes likely with much smaller numbers of cards). However, since each individual row is as likely as any other, this must mean that collectively, the horizontal rows are more likely to contain the first Bingo. As a typical example, suppose that after 25 numbers have been called, the distribution of the 25 numbers into the 5 groups 1-15 etc. is 3 in one, 4 in another, 5, 6, and 7. (This is probably the most likely distribution, since there are 120 permutations of it and fewer of all the more evenly-distributed ones.) Out of the 3003 different possible bingo sets in each vertical rows, in two of there it's still impossible to have a bingo, while in the others there are, respectively, 1, 6, and 21 of the 3003 sets which have already won, or a bit under 1% of all possible cards. Out of the 759375 possible horizontal sets, there are 3*4*5*6*7 = 2520 winning sets. Since there are 5 of these sets on each card, this gives about 1.7% winning cards. Since a typical Bingo has hundreds of cards, the winner is likely to occur before this. But earlier, the likelihood of a vertical winner is even slimmer. Suppose only 20 numbers are called in the distribution 2, 3, 4, 5, 6. Then only 1 of the 3003 vertical combinations in one group and 6 of another are winners (0.23%) while 2*3*4*5*6 = 720 horizontal sets are winners, or about 0.5% of all cards. A full analysis would have to perform a weighted average over all the possible distributions of numbers. This includes more widely distributed sets in which the likelihood of vertical bingos is greater and horizontal bingos is less, as well as flatter distributions in which horizontal bingos are more likely and vertical bingos are less likely. But this gives a rough feel for how the numbers go. Joseph DeVincentis -------------------------------------- Hi, Ed, I've started thinking about the question of what lines are likely to win in Bingo. I'm pretty sure the horizontal and diagonal 4-lines are most likely and equiprobable. I'm not sure about the vertical 4-line versus the horizontal 5-lines (it may depend on the number of boards in play). Is there any previous work? I didn't notice any references other than the mention of Paul Stephens's interest. But in http://www.mathpuzzle.com/gamble.html you write: > Getting a line in one direction isn't correlated to > another line, so we can use the same trick as the dice example. which is false. Events A and B are defined to be independent just when P(A&B)=P(A)P(B). But if A and B are any two lines (or disjoint subsets), then P(A&B) < P(A)P(B) (unless there are so few draws that all are zero). Consider, for instance, the situation after five draws, when P(A)>0, P(B)>0, and P(A&B)=0. Still, I suspect the events are approximately independent (especially after many draws), in which case your analysis is approximately correct. Dan Hoey ---------------------------------------- There's gotta be something basic I'm missing about the question (or some unwarranted assumption I'm making), as it seems obvious to me that the probability of a horizontal bingo would be the same as that of a vertical one. Also, the bingo analysis on your gambling odds page doesn't seem quite right to me. You list the probability of a line bingo in X draws as: 1 - (1-Urn[5,70,X,5])^8 * (1-Urn[4,71,X,4])^4 According to this, the probability of getting a bingo within the first 71 draws would be 0.957172348. But after 71 draws, a bingo is guaranteed. (There are only 4 numbers left unpicked, but five rows (or columns) on the bingo card. The problem is that there is a correlation between winning and losing on different lines, but your analysis assumes there isn't. For example, after five picks, winning one one line guarantees not winning on all the others. After 71 picks, not winning on four horizontal (or vertical) lines guarantees winning on the 5th. On another note, is the Urn[] function just the hypergeometric distribution calculated differently? Thanks, Joseph Kisenwether -------------------------------------------------- [ Note ... the following is based on a misinterpretation of Bingo (5x15 board). The analysis contains a few nice ideas, so I'm posting it. ] I provide several levels of proof that a column is much likelier to come before a row. My final analysis will give the exact probabilities: Level 0: There are more columns. Each is much easier to attain. Level 1: (Gives probability at least .84375) 1. If you pick 14 numbers out of your 75, the probability that the top row will be empty is less than 1/32. (The probability is 60/75*59/74*...*47/62) 2. If you pick 14 numbers out of your 75, the probability that there is some empty row is less than 5/32. (The probability that one of a collection events occurs is at most the sum of the individual probabilities. There are 5 events, each of probability less than 1/32.) 3. The probability that a column occurs before a row does is greater than 27/32 = .84375 (Let us suppose the bingo machine breaks down after 61 picks. Since there are 15 columns, Dirichlet's pigeonhole principle tells us that some column receives at least 5 balls and is full. However, 2. above tells us that the probability that there is some row that contains none of the 14 remaining balls is less than 5/32. Therefore with probability greater than 27/32 we have no full row.) Level 2: Gives probability .966656 1. The probability that among the first 54 balls there is a full column is: Sum_{j=1}^{j=10} Binomial(5*(15-j),21)*(-1)^(j+1)*Binomial(15,j) / Binomial(75,21) =82931251895342643/84141409366045768=0.985618 (We use the inclusion-exclusion principle to count the number of ways to pick 21 (=75-54) balls from 75 so that at least one column is empty. The Inclusion-Exclusion principle says that given a collection of sets A1,A2,...,An, the number of elements in their union is calculated by summing the sizes, subtracting the sum of the sizes of the pairwise intersections, adding the sizes of the 3-wise intersections and so on. If A1,A2,...,A15 are the sets of choices of 21 balls from 75 with column 1,2,...,15 empty, then the size of each j-wise intersection is Binomial(5*(15-j),21). The (-1)^(k+1) term comes from the alternating positive, negative nature of the inclusion-exclusion principle and the Binomial(15,j) from the number of j-wise intersections.) 2. The probability that among the first 54 balls there is a full row is: Sum_{j=1}^{j=3} Binomial(15*(5-j),21)*(-1)^(j+1)*Binomial(5,j) / Binomial(75,21) =62515034815532970771695159=0.018961 3. The probability that among the first 54 balls either a full row occurs or a full column does not occur is at most 0.033344. (From the above, a row occurs with probability 0.018961, and a column does not occur with probability 0.014382, giving a probability of either of at most 0.033344) 4. There is a probability at least 0.966656 of there being a full column before there is a full row. (If the machine breaks down after 54 balls, with probability 1-0.033344= 0.966656, the first full column has happened and the first full row has not happened.) Level 3: Gives probability 0.967058 1. The probability that by the ith drawing, you have no full row and a full column is precisely: Sum_{k=1}^{k=5}Sum_{l=1}^{l=14}Binomial(5,k)Binomial(15,l)(-1)^(k+l+1) Binomial(kl,75-i)/Binomial(75,i) (A slight extension of the inclusion-exclusion principle says that if G_1,G_2,...,G_g are a collection of good sets and B_1,B_2,...,B_b are a collection of bad sets, the number of points that are in some good set but no bad set is calculated by: summing over all non-empty subcollections of good sets {G_i: i in gamma} and all possibly-empty subcollections of bad sets {B_i: i in beta} (-1)^(1+|gamma|+|beta|) multiplied by the size of the total intersection of the subcollections. Here G_1,G_2,...,G_15 are the good sets consisting of choices of i points from 75 with column 1,2,...,15 full. B_1,B_2,...,B_5 are the bad sets consiting of choices of i points from 75 with column 1,2,...,5 full. To get our sum, we look at non-empty subcollections of 15-l sets (of which there are Binomial(15,l) for l between 1 and 14 exclusive) and possibly-empty subcollections of 5-k bad sets (of which there are Binomial(5,k) for k between 1 and 5 exclusively). Members of the intersection can be found by choosing the 75-i ommitted points from the k remaining rows and l remaining columns. Hence Binomial(lk,75-i) is the size of this intersection. ) 2. The probability of getting a column before a row is at least 0.9670582 (Set i=54.) Level 4: Gives precise probability of 0.997667 1. The probability that on the ith go on a board of size r x c there is no full row or full column is: sum_{k=0}^{k=r}sum_{l=0}^{l=c}Binomial(r,k)Binomial(c,l)(-1)^(k+l+r+c) Binomial(kl,cr-i)/Binomial(cr,i) (Simple use of Inclusion-Exclusion Principle) 2. The probability that on the ith go on a board of size r x c the first full column occurs is, and there is no full row is: sum_{k=0}^{k=r}sum_{l=0}^{l=c-1}(-1)^(k+l+r+c+1)Binomial(r,k) Binomial(c-1,l)Binomial(kl,(c-1)r-i)/Binomial(cr-1,i-1) (It is the probability that the other elements of the column have been filled: Binomial((c-1)r,i-r)/Binomial(cr-1,i-1) multiplied by the probability that no full columns or rows exist when you remove the relevant column.) 3. Therefore the probability that a full column occurs is sum_{i=0}^{i=rc}sum_{k=0}^{k=r}sum_{l=0}^{l=c-1} (-1)^(k+l+r+c+1)Binomial(r,k)Binomial(c-1,l)Binomial(kl,(c-1)r-i)/ Binomial(cr-1,i-1). (Add the above over all i) 4. Similarly the probability that the full row and column occur at the same time is sum_{i=0}^{i=rc}sum_{k=0}^{k=r-1}sum_{l=0}^{l=c-1}(-1)^(r+c+k+l) Binomial(r-1,k)Binomial(c-1,l)Binomial(kl,cr-i)/Binomial(cr-1,i-1) For the 5x15 rectangle, a column occurs first with probability 502530292980359140939/269109591435396365102064 (approx 0.997667) a row occurs first with probability 126186432521578618379823831264815079746329159797008 (approx 0.001867) they occur together with probability 1472155383922610547623/3162037699365907289949252 (approx 0.000466) The odds: 2147:1 that they occur at the same time, 535:1 that a row occurs first, 428:1 that a column does not occur first. L T Pebody