Just back from an job interview, one question really sucks! Any idea? It is given 45 mintues to state your thought (either how to do it, or why you can not do it). (Type from what I remember)
1) You choose 5000 different intergers (you can choose any 5000 integers you want to use, just be sure they are different from each other). 2) Take 200 numbers from them by random to form a set. Repeat this many times so you will have many such set. If you like, you can assume these sets are saved to disk (cause there are "many" of them), and step 3 is to test them one by one. 3) For each set (200 numbers), can you design a method that can QUICKLY test whether a certain number k is in it or not? (k belongs to the initial 5000 numbers) 4) Repeat step 3 many times with different k. 5) Require: the memory and time is limited, which means: the most naive method of scaning through (linear or use a hashtable) each set to test k, and repeat for each set will not give you credit. 6) If your method needs some calculation time, make sure the test step takes as littile time as possible (if you have whatever "preprocess", they can take longer time). 7) If you think this can not be done (you can not do much better than the naive method above), please state your reasons (what will be the limit). ----------------- Try to think it over before read my "wrong" method, in case it may mislead you. :-)) .. .. .. .. .. What I think is to find a method to "hash" these 200 integers to one or two small values. My answer at that time, which unluckly did not work, was to use prime numbers: Select 5000 prime numbers. For each 200 randomly selected numbers, times them up to get a "hash value" v. When test, just see if v can be divided by k. So the test step only has a divide operation, and since we "hash" 200 numbers to one value, this does not need to use much memory since we only need to keep one v for each set. However, when I came back from interview, I found my method will not work with 5000 initial numbers: times 200 prime number will get value out of range even if we use "long" type data. Prime numbers are so big, even if I define my own number type, it is quite possible to use more memory than the 200 number set itself! Maybe this can not be done, I am thinking, but I can not say why it can not be done efficiently. :-(