Discrete Distributions

Discrete random numbers take on only a countable number of values (typically integers). Each distribution has associated with it a probability mass function (pmf), pm (k; •), that is defined as the probability that the returned random number is k. The arguments after k represent possible parameters to the distribution. Thus, let X (•) represent the random number generator for a particular distribution. Then, pm (k; •) = Probability {X (•) = k} .

It will be useful to define the discrete indicator function, (k), where S is a set of integers (often represented by an interval). (k) = 1 if k G S, otherwise (k) = 0. This convenient notation isolates the relevance of a particular functional form to a certain range. Also, the formulas below make use of the following definition:

This random number models the number of successes in n independent trials of a random experiment where the probability of success in each experiment is p.

geometric (p, size=None)

This random number models the number of (independent) attempts required to obtain a success where the probability of success on each attempt is p.

hypergeometric (ngood, nbad, nsample, size=None)

Imagine a probability theorists favorite urn filled with ng "good" objects and nb "bad" objects. In other words there are two types of objects in a jar. The hypergeometric random number models how many "good" objects will be present when N items are taken out of the urn without replacement.

logseries (p, size=None)

A random number whose pmf with terms proportional to the Taylor series expansion of log (1 — p). It has been used in biological studies to model the species abundance distribution.

pk multinomial (n, pvals, size=None)

This generator produces random vectors of length N where N = len (pvals). The shape of the returned array is always the shape indicated by size + (N,). The multinomial distribution is a generalization of the binomial distribution. This time, n trials of an experiment are independently repeated but each trial results in N possible integers k1, k2,..., kN with ^N=1 k = n.

pm (ki, k2,..., kN; •) = Probability {X (•) = [ki, k2, • • • , kN]}

where pvals = [p1,p2,... ,pN]. It must be true that ^N 1 pj = 1. Therefore, as long as ^N=11 pj < 1, the last <

negative_binomial (n, p, size=None)

as long as ^N=i1 pj < 1, the last entry in pvals is computed as 1 — ^N=i1 pj.

Models the number of extra independent trials (beyond n) required to accumulate a total of n successes where the probability of success on each trial is p. Equivalently, this random number models the number of failures encountered while accumulating n successes during independent trials of the experiment that succeeds with probability, p.

pm (k; n,p)=( k + n - 1 )p" (1 — p)k /[0,TO) (k) .

poisson (lam=1.0, size=None)

This random number counts the number of successes in n independent experiments (where the probability of success in each experiment is p) in the limit as n —> ^ and p — 0 gets very small such that A = np > 0 is a constant. It can be used, for example, to model how many typographical errors are on each page of a book.

The probability mass function of this random number (also called the zeta distribution) is

where oo 1

is the Riemann zeta function. Zipf distributions have been shown to characterize use of words in a natural language (like English), the popularity of library books, and even the use of the web. The Zipf distribution describes collections that have a few items whose probability of selection is very high, a medium number of items whose probability of selection is medium, and a huge number of items whose probability of selection is very low.

0 0

Post a comment

  • Receive news updates via email from this site