以一定百分比从整数列表中选择数字? [复制]

xiaoxingxing python 534

原文标题Select numbers from an integer list in a certain percentage? [duplicate]

我需要编写一个加权版本的 random.choice(列表中的每个元素都有不同的被选中概率)。这是我想出的:

def weightedChoice(choices):
    """Like random.choice, but each element can have a different chance of
    being selected.

    choices can be any iterable containing iterables with two items each.
    Technically, they can have more than two items, the rest will just be
    ignored.  The first item is the thing being chosen, the second item is
    its weight.  The weights can be any numeric values, what matters is the
    relative differences between them.
    space = {}
    current = 0
    for choice, weight in choices:
        if weight > 0:
            space[current] = choice
            current += weight
    rand = random.uniform(0, current)
    for key in sorted(space.keys() + [current]):
        if rand < key:
            return choice
        choice = space[key]
    return None




    从 1.7.0 版本开始,NumPy 有了一个支持概率分布的选择函数。

    from numpy.random import choice
    draw = choice(list_of_candidates, number_of_items_to_pick,

    请注意,probability_distribution 是与 list_of_candidates 顺序相同的序列。您还可以使用关键字 replace=False 来更改行为,以便绘制的项目不会被替换。

    def choose_index(probabilies):
        cmf = probabilies[0]
        choice = random.random()
        for k in xrange(len(probabilies)):
            if choice <= cmf:
                return k
                cmf += probabilies[k+1]

    无需对您的概率进行排序或使用您的 cmf 创建一个向量,一旦找到它的选择,它就会终止。内存:O(1),时间:O(N),平均运行时间~N/2。


    def choose_index(weights):
        probabilities = weights / sum(weights)
        cmf = probabilies[0]
        choice = random.random()
        for k in xrange(len(probabilies)):
            if choice <= cmf:
                return k
                cmf += probabilies[k+1]
    如果您的加权选择列表是相对静态的,并且您想要频繁采样,则可以执行一个 O(N) 预处理步骤,然后使用此相关答案中的函数在 O(1) 中进行选择。

    # run only when `choices` changes.
    preprocessed_data = prep(weight for _,weight in choices)
    # O(1) selection
    value = choices[sample(preprocessed_data)][0]
    如果你碰巧有 Python 3,并且害怕安装 numpy 或编写自己的循环,你可以这样做:

    import itertools, bisect, random
    def weighted_choice(choices):
       weights = list(zip(*choices))[1]
       return choices[bisect.bisect(list(itertools.accumulate(weights)),
                                    random.uniform(0, sum(weights)))][0]


    import random
    import bisect
        range = xrange
    def weighted_choice(choices):
        total, cumulative = 0, []
        for c,w in choices:
            total += w
            cumulative.append((total, c))
        r = random.uniform(0, total)
        # return index
        return bisect.bisect(cumulative, (r,))
        # return item string
        #return choices[bisect.bisect(cumulative, (r,))][0]
    # define choices and relative weights
    choices = [("WHITE",90), ("RED",8), ("GREEN",2)]
    tally = [0 for item in choices]
    n = 100000
    # tally up n weighted choices
    for i in range(n):
        tally[weighted_choice(choices)] += 1
    print([t/sum(tally)*100 for t in tally])
    这是另一个使用 numpy 的 weighted_choice 版本。传入权重向量,它将返回一个由 0 组成的数组,其中包含一个 1,表示选择了哪个 bin。代码默认只进行一次抽奖,但您可以传入要进行的抽奖次数,并且将返回每个抽奖箱的计数。

    如果权重向量的总和不为 1,则将对其进行归一化以使其达到。

    import numpy as np
    def weighted_choice(weights, n=1):
        if np.sum(weights)!=1:
            weights = weights/np.sum(weights)
        draws = np.random.random_sample(size=n)
        weights = np.cumsum(weights)
        weights = np.insert(weights,0,0.0)
        counts = np.histogram(draws, bins=weights)
    import random
    def weighted_choice(choices, weights):
        total = sum(weights)
        treshold = random.uniform(0, total)
        for k, weight in enumerate(weights):
            total -= weight
            if total < treshold:
                return choices[k]
    import numpy as np
    weights = [0.1, 0.3, 0.5] #weights for the item at index 0,1,2
    # sum of weights should be <=1, you can also divide each weight by sum of all weights to standardise it to <=1 constraint.
    trials = 1 #number of trials
    num_item = 1 #number of items that can be picked in each trial
    selected_item_arr = np.random.multinomial(num_item, weights, trials)
    # gives number of times an item was selected at a particular index
    # this assumes selection with replacement
    # one possible output
    # selected_item_arr
    # array([[0, 0, 1]])
    # say if trials = 5, the the possible output could be 
    # selected_item_arr
    # array([[1, 0, 0],
    #   [0, 0, 1],
    #   [0, 0, 1],
    #   [0, 1, 0],
    #   [0, 0, 1]])

    现在让我们假设,我们必须在 1 次试验中抽取 3 个项目。您可以假设存在三个球 R、G、B,它们的重量比由重量数组给出,可能会出现以下结果:

    num_item = 3
    trials = 1
    selected_item_arr = np.random.multinomial(num_item, weights, trials)
    # selected_item_arr can give output like :
    # array([[1, 0, 2]])


    num_binomial_trial = 5
    weights = [0.1,0.9] #say an unfair coin weights for H/T
    num_experiment_set = 1
    selected_item_arr = np.random.multinomial(num_binomial_trial, weights, num_experiment_set)
    # possible output
    # selected_item_arr
    # array([[1, 4]])
    # i.e H came 1 time and T came 4 times in 5 binomial trials. And one set contains 5 binomial trails.
    一种方法是对所有权重的总和进行随机化,然后将这些值用作每个 var 的限制点。这是作为生成器的粗略实现。

    def rand_weighted(weights):
        Generator which uses the weights to generate a
        weighted random values
        sum_weights = sum(weights.values())
        cum_weights = {}
        current_weight = 0
        for key, value in sorted(weights.iteritems()):
            current_weight += value
            cum_weights[key] = current_weight
        while True:
            sel = int(random.uniform(0, 1) * sum_weights)
            for key, value in sorted(cum_weights.iteritems()):
                if sel < value:
            yield key
    我需要做一些非常快速非常简单的事情,从寻找我最终构建这个模板的想法开始。这个想法是从 api 以 json 的形式接收加权值,这里是由 dict 模拟的。

    然后将其转换为一个列表,其中每个值与其权重成比例地重复,然后使用 random.choice 从列表中选择一个值。

    我尝试运行 10、100 和 1000 次迭代。分布似乎相当稳固。

    def weighted_choice(weighted_dict):
        """Input example: dict(apples=60, oranges=30, pineapples=10)"""
        weight_list = []
        for key in weighted_dict.keys():
            weight_list += [key] * weighted_dict[key]
        return random.choice(weight_list)
    使用 numpy

    def choice(items, weights):
        return items[np.argmin((np.cumsum(weights) / sum(weights)) < np.random.rand())]
