Dear Ed,

It's been quite some time since I last wrote, but I wanted to tell
you some things I discovered about Don Woods' Twenty Questions puzzle.
Inspired by Mark VanDine, I wrote a genetic algorithm to find solutions
to the puzzle, and came up with 53 with a score of 18 (but none scoring
19 or 20). That is enough to do a bit of statistical analysis of the
answers. I hope you find this interesting.

First, a few comments about the test, how I scored, and how I designed
the algorithm; then the answer sets, some statistics, and a few
additional comments:

1) Question 7 asks about the answer that appears most often. My
interpretation of the question is that if no single answer occurs
more frequently than all the others (i.e. there is a tie at the
top) then there is no correct answer to the question.

2) If the answers to questions 10 and 17 are both E then a paradox
arises. In those cases I stipulate that neither answer is correct
and neither is wrong. That has three main impacts:
a) The score for a set of answers including the paradoxical pair
is well defined (the score is the number of _correct_ answers,
per the puzzle instructions, which would not include 10 and 17).
b) As a result of (a), every set of answers has a definite score.
There is a finite (though large) number of distinct answer sets,
so the maximum possible score is well defined, and thus the the
correct answer to question 20 cannot be D.
c) Question 9 calls for excluding questions with wrong answers from
the sum it specifies; in the paradox case neither of the
paradoxical pair of answers is wrong, thus they are included in
the sum if otherwise appropriate.

3) The algorithm always counts E as an incorrect answer to question 20.
Although that may result in an incorrect score for some sets, if E
were a correct answer then that would imply that I want to steer the
evolution of solutions away from those with answer E.

4) Algorithm details:
I wrote the code in Java, which allowed me to make use of the
standard Set classes to easily cull duplicate answer sets and
perform straightforward sorts on various criteria. The framework
ends up being highly adaptable to various problems, the tricky
part being to write a routine to score the various candidates.
Perhaps C++ could have done all the same things for me, but I
don't know it.

My algorithm starts with a single random answer set in the gene
pool. At each generation, 100 mutants of each member of the gene
pool are created and scored. All the mutants are then sorted on
their scores, and the top four are selected for the next
generation's gene pool. Also added to the next generation's gene
pool is a single mutant from the middle (score-wise) of the list,
to throw in a little extra variation. Duplicates are culled within
each generation and among best answer sets.

The algorithm starts with the assumption that the correct answer to
question 20 is A, but if it sees an answer set with B or C as the
answer for 20 then it rescores those answers assuming that that
answer is correct. If the rescoring gives 19 (for B) or 20 (for C)
then the answer assumed correct is changed appropriately. (That
never happened.) I also tested with both 'B' and 'C' as the
initial correct answer for 20, but never saw an answer set with a
score greater than 17 (B) or 18 (C).

For the results I provide below, I set the program to start with a
new random parent after every 25000 generations. I ran it overnight
on my 1.4 GHz Athlon system, and it went through 290 complete cycles,
but no new solutions were found after the sixth cycle.

In the answer sets listed below, correct answers are capital letters
and incorrect ones are lower case. The answers are given as a
single contiguous string, in order by question number.

5) Answer sets scoring 18 points:

6) Some interesting statistics:
Every answer set has question 11 wrong. Moreover, they all have the
same wrong answer, B. Furthermore, all but two answer sets have
question 15 wrong. Other than 11 and 15, the only two questions
that have wrong answers in any of these sets are 1 and 13, once each.

The answer to question 16 is B in every set, correct every time. In
most or all of those cases there are other possible correct answers.
Corresponding to answer B, however, questions 7 and 8 have the same
answer in every set. In order for those answers to be correct (and
they are) every set has these properties: there is a single most
frequent answer, and each other answer occurs the same number of
times as another answer. ( A typical distribution of answer
frequencies is 6, 4, 4, 3, 3.) Moreover, questions 7 and 8 having
the same answer means there is no correct answer to question 11,
which is consistent with question 11 always being answered
incorrectly, but doesn't explain why every set has the _same_ wrong
answer. That may instead be because of question 4, which is
answered B in 52 of the 53 sets. That requires questions 11 and
16 to also have answer B.

The answer to question 19 is C in 51 of the sets and E in the other
two. That appears to be related to the above discussion of questions
4, 7, 8, 11, 15, and 16.

The tightly coupled questions 10 and 17 are always both answered
correctly, usually (43 times) as 10D, 17E. Both of the other two
alternatives with both answers correct are represented. 10D, 17E
is accompanied by 13C 41 times; the two alternatives are accompanied
by 13D 8 times, 13C once, and 13 A (incorrect) once. No tight
logical connection between 10,17 and 13 is apparent to me.

Question 1 has answer A 48 times. One of the other times the answer
is incorrect, and A would have been the only correct answer.

Question 2 never has answer A, B, or C. A would almost never be
correct because of the trend in question 4, but B would have been
correct in several places, and C in many.

7) What does it all mean?
Beats me. One of the most surprising things to me is the ubiquity
of the answers to 7 and 8 matching, especially considering the
demands that that puts on the distribution of answers and the fact
that that leads to question 11 having no correct answer.

It is possible that the observed results are characteristic of the
solution method. The fact that questions 4, 7, 8, 11, 15, 16, and
to some extent 19 are all coupled together fairly tightly would tend
to cause a genetic algorithm to "stick to" any answer set that
maximized the score for that subset of the questions. In 290 trials,
however, only one pattern for those answers emerged -- perhaps it is
the only one.

It may also be interesting to observe how the analysis of results
like these assisted in the analysis of the relationships among the
questions in a puzzle like this one.


John Bollinger



Hi, here are my best scores so far :

1.D* 2.C* 3.B* 4.A* 5.B* 6.A* 7.E* 8.C* 9.B
10.D* 11.B* 12.C* 13.A* 14.E 15.D* 16.E* 17.E* 18.B* 19.E* 20.C

18 :
assuming 20 (E) is false .
BUT as it is false I get a bonus point
(I have some difficulties with question 20)

1.B* 2.A* 3.D* 4.A* 5.C* 6.A* 7.D* 8.E 9.D
10.A* 11.E* 12.B* 13.A* 14.C* 15.D* 16.E* 17.D* 18.B* 19.E* 20.E

The (french) site has been fixed, and the translation is ok now :

best regard,

Jacques Tramu



18 :
> assuming 20 (E) is false .
> BUT as it is false I get a bonus point
> (I have some difficulties with question 20)
> 1.B* 2.A* 3.D* 4.A* 5.C* 6.A* 7.D* 8.E 9.D
> 10.A* 11.E* 12.B* 13.A* 14.C* 15.D* 16.E* 17.D* 18.B* 19.E* 20.E

Question 20 was designed to create difficulties. [:-)]
If 20 (E) is false, then it is possible to get the maximum score
while getting question 20 right. However, that does not mean you
get a point for 20 (E); it would just mean that some OTHER set of
answers can also achieve the maximum score.

To put it another way, suppose 20 (E) is correct. Then the only
way to get the top score is to get question 20 wrong. If that is
true, then the top score cannot be achieved by answering 20 (E),
since 20 (E) is not wrong. So if you do answer 20 (E), that answer
must be wrong. Thus your score of 18 (which I haven't checked the
rest of) is really only 17.

-- Don Woods


20 questions, score of 18,

I wrote a C program. I only found one solution with a score this high.
Due to possible bugs and draconian heuristics, I can't be sure there
isn't a better solution.

Robert Jenkins