OT: Probability question that vexes me

jiansonz

Diabloii.Net Member
OT: Probability question that vexes me

I can´t for the life of me figure this out - how to think to be able to solve it. :tongue:


The problem:

Imagine an urn with 256 scraps of paper in it. Each scrap has a unique number written on it, 1-256.

I draw a scrap and note the number. Then I put the scrap back into the urn, shuffle and draw another number, etc.

I assume that all numbers are equally likely to be drawn and that each 'drawing event' is independent on the others.


The question:

A. How many 'drawing events' do I need to perform until I, with 95% probability, have seen each and every number at least once?

B. Same, but with 99% probability.



(Note: I have no problem solving this for a specific number. (I hope!)

The probability for not drawing the number I am looking for is 255/256 each drawing event. I am looking for a 5% probability to not draw that number in x events.

Thus: (255/256)^x=0.05
x~=765.4
(meaning I need 766 drawing events to get the probabilty over the 95% mark)

But how do I do this for all the numbers at the same time!??!)
 

Kijya

Diabloii.Net Member
I'm still not an expert in this field but I'll give it a try:

Chance to find a new number (Y) when you've found X numbers:
Y = (256-X)/256

So far I think I should be correct :scratch: now we have a specific chance to find a new number baised on how many numbers we have drawn. So how many more tries (T) do we have to draw to find a new number when we have drawn X unique numbers?

T*Y = 1
T= 1/Y = 256/(256-X)

Note that I set 100% its factor 1, now we should have the avrage needed new tries to get a new number for each given X. Letting X go from 1 to 255 this give us a increasing number of new tries for each new X which atleast sounds logical if you ask me.

Now for the needed tries to find X unique numbers I would say that you'll have to add all the "new tries needed" togheter:
n
?(256/(256-K)) <o:p></o:p>
k=1


Hmm, don't know if that could answer any of your questions or not but might get you some new ideas you could try out. I hate probability mathematics ...
From my equations I can atm just think about that you would need about 772 tries to get ~95% of the numbers (243 numbers). My equations doesn't work for finding the last one ether ...
 

Psychic Watch

Diabloii.Net Member
This thread might be of some use here.


I'm not sure if it was conclusive, but the gist seemed to be that after N runs giving X% probability, you'd have seen X% of the possible choices.


So maybe you had the answer from the start? To see 95% of the 256 choices, 766 runs?
 

ReservoirDog

Diabloii.Net Member
The problem:

Imagine an urn with 256 scraps of paper in it. Each scrap has a unique number written on it, 1-256.

I draw a scrap and note the number. Then I put the scrap back into the urn, shuffle and draw another number, etc.

I assume that all numbers are equally likely to be drawn and that each 'drawing event' is independent on the others.


The question:

A. How many 'drawing events' do I need to perform until I, with 95% probability, have seen each and every number at least once?

B. Same, but with 99% probability.
Each draw is independent of the previous draw and therefore would not reduce or increase the probability of any number coming out.



 

jiansonz

Diabloii.Net Member
Each draw is independent of the previous draw and therefore would not reduce or increase the probability of any number coming out.
True, but (as Kijya pointed out) for each "new number" (one I didn´t have before) I draw, the harder it gets to find the ones I am still missing.


Using Kijya´s reasoning, I have calculated an expected value for the number of draws needed. It´s 1568 draws.

OneFromBeyond said:
Is that a 95% probability or a 95% confidence interval?
Well, if I can manage to calcuate a confidence interval from my EV of 1568, then I have the answer...



 

jgreg7

Diabloii.Net Member
I usually love these sorts of problems, but couldn't figure it out either!

However, it bugged me enough that I ran some simulations. 1600 draws only matched all 256 numbers in about 60% of the trials. It takes about 2225 draws to reach 95% probability of success, and ~2800 for 99% probability.

Where did this come up?
 

Kijya

Diabloii.Net Member
I usually love these sorts of problems, but couldn't figure it out either!

However, it bugged me enough that I ran some simulations. 1600 draws only matched all 256 numbers in about 60% of the trials. It takes about 2225 draws to reach 95% probability of success, and ~2800 for 99% probability.

Where did this come up?
My calculations are baised on avrage chance to get a new number when you've drawn X numbers. I'm not quite sure what "avrage" in this case really means, maybe it's 50%/50%. Then the 60% you calcuated wouldn't be far off. I think :scratch:

If you want 95% probability that "avrage" sould be modefied in some way if you ask me.

hmm ...



 

jakotaco

Diabloii.Net Member
hmm, lets say there were X pieces of paper in the urn

then there would be X^X different ways of drawing X notes, each as probable.

out of those X^X ways of drawing X notes there would be a (X/X*(X-1)/X*(X-2)/X...*(...)*(1/X))= (X!)/(X^X) chance to pick any of the X! combinations of drawing one of each.

If you were to draw another note (draw X+1 notes : X different numbers) there would be X^(X+1) different ways of drawing the X+1 notes out of the urn, out of which there would be ((X^2+X/2)*X!)/(X^X)) chance that it would be 1 of each. (can someone control these numbers?)

Nevermind, this ain't going anywhere... besides I just wasted about one hour of the little time I have to study for a really big test tomorrow... in biochemistry
 

Psychic Watch

Diabloii.Net Member
I usually love these sorts of problems, but couldn't figure it out either!

However, it bugged me enough that I ran some simulations. 1600 draws only matched all 256 numbers in about 60% of the trials. It takes about 2225 draws to reach 95% probability of success, and ~2800 for 99% probability.

Where did this come up?

What was the average "completeness" of the 256 numbers at each of those stages? As in, how many of the 256 numbers -were- seen after N trials?



 

jgreg7

Diabloii.Net Member
No worries. I still wish I could have found an elegant, closed form solution. If one turns up please post it for my peace of mind. :wink3:
What was the average "completeness" of the 256 numbers at each of those stages? As in, how many of the 256 numbers -were- seen after N trials?
I didn't track it that way. I just had the computer pick N slips, and then score it as a success (is all 256 were picked at least once) or failure. Here's how the data came out:
Code:
#drawn  sims   success
1000    100    0%
1500    100    38%
1600    100    60%
2000    100    91%
2200    250    92%
2225    250    94%
2300    250    96%
2500    100    98%
2600    250    97%
2700    250    98%
2750    250    100%
2800    250    99%
3000    100    100%


 

Psychic Watch

Diabloii.Net Member
Some further simulations confirmed the "completeness" idea - one will see X% of the possible choices after N trials giving X% chance of success.


A bit of digging turned up an "expected value" closed-form solution:

...the expected fraction of colors that have not been drawn after k draws:

d(0,k) = [(N-1)/N]^k


To have 95% confidence of leaving none of the N colors undrawn, we
need the expected value of d(0,k) to be just 5% of 1/N. Thus, with
N=256 colors, the required value of k is

k = log(.05/256)/log(255/256) = 2182.2
From this page.


The 1568 number was from this formula:
N*(ln N + 0.5772)


The only quibble is that the numbers don't quite match up to a 50% confidence level (1594). Or maybe the expected value isn't necessarily 50/50?
 

jgreg7

Diabloii.Net Member
A bit of digging turned up an "expected value" closed-form solution:
Nice find!:thumbsup:

I'll have to read the page in more detail to fully understand the underlying math. However by applying the same formula I get:
k(99%) = ln(.01/256)/ln(255/256) = 2594
k(50%) = ln(.5/256)/ln(255/256) = 1594​
So the 50% value does seem to match your earlier figure for the expected value. (Did you use base 10 logs instead of natural logs?) I'm pleased that the actual values match my simulations reasonably well, considering how few runs I did of each condition.



 
Top