以一定百分比从整数列表中选择数字？ [复制]

xiaoxingxing 2年前 python 534

原文标题 ：Select numbers from an integer list in a certain percentage? [duplicate]

我需要编写一个加权版本的 random.choice（列表中的每个元素都有不同的被选中概率）。这是我想出的：

def weightedChoice(choices):
    """Like random.choice, but each element can have a different chance of
    being selected.

    choices can be any iterable containing iterables with two items each.
    Technically, they can have more than two items, the rest will just be
    ignored.  The first item is the thing being chosen, the second item is
    its weight.  The weights can be any numeric values, what matters is the
    relative differences between them.
    """
    space = {}
    current = 0
    for choice, weight in choices:
        if weight > 0:
            space[current] = choice
            current += weight
    rand = random.uniform(0, current)
    for key in sorted(space.keys() + [current]):
        if rand < key:
            return choice
        choice = space[key]
    return None

这个功能对我来说似乎过于复杂，而且丑陋。我希望这里的每个人都可以提供一些改进它或替代方法的建议。效率对我来说并不像代码的简洁性和可读性那么重要。

原文链接：https://stackoverflow.com//questions/71454106/select-numbers-from-an-integer-list-in-a-certain-percentage

我来回复

Ronan Paixão 评论
该回答已被采纳！

从 1.7.0 版本开始，NumPy 有了一个支持概率分布的选择函数。
```
from numpy.random import choice
draw = choice(list_of_candidates, number_of_items_to_pick,
              p=probability_distribution)
```
请注意，probability_distribution 是与 list_of_candidates 顺序相同的序列。您还可以使用关键字 replace=False 来更改行为，以便绘制的项目不会被替换。
2年前 0条评论

Aku 评论

我可能来不及贡献任何有用的东西，但这里有一个简单、简短且非常有效的片段：

def choose_index(probabilies):
    cmf = probabilies[0]
    choice = random.random()
    for k in xrange(len(probabilies)):
        if choice <= cmf:
            return k
        else:
            cmf += probabilies[k+1]

无需对您的概率进行排序或使用您的 cmf 创建一个向量，一旦找到它的选择，它就会终止。内存：O(1)，时间：O(N)，平均运行时间~N/2。

如果您有权重，只需添加一行：

def choose_index(weights):
    probabilities = weights / sum(weights)
    cmf = probabilies[0]
    choice = random.random()
    for k in xrange(len(probabilies)):
        if choice <= cmf:
            return k
        else:
            cmf += probabilies[k+1]

2年前 0条评论

AShelly 评论
如果您的加权选择列表是相对静态的，并且您想要频繁采样，则可以执行一个 O(N) 预处理步骤，然后使用此相关答案中的函数在 O(1) 中进行选择。
```
# run only when `choices` changes.
preprocessed_data = prep(weight for _,weight in choices)

# O(1) selection
value = choices[sample(preprocessed_data)][0]
```
2年前 0条评论
personal_cloud 评论
如果你碰巧有 Python 3，并且害怕安装 numpy 或编写自己的循环，你可以这样做：
```
import itertools, bisect, random

def weighted_choice(choices):
   weights = list(zip(*choices))[1]
   return choices[bisect.bisect(list(itertools.accumulate(weights)),
                                random.uniform(0, sum(weights)))][0]
```
因为你可以用一袋管道适配器制造任何东西！虽然……我必须承认，内德的回答虽然稍长，但更容易理解。
2年前 0条评论

Tony Veijalainen 评论

我查看了指出的其他线程并在我的编码风格中提出了这种变化，这将返回选择的索引以进行统计，但返回字符串很简单（注释返回替代）：

import random
import bisect

try:
    range = xrange
except:
    pass

def weighted_choice(choices):
    total, cumulative = 0, []
    for c,w in choices:
        total += w
        cumulative.append((total, c))
    r = random.uniform(0, total)
    # return index
    return bisect.bisect(cumulative, (r,))
    # return item string
    #return choices[bisect.bisect(cumulative, (r,))][0]

# define choices and relative weights
choices = [("WHITE",90), ("RED",8), ("GREEN",2)]

tally = [0 for item in choices]

n = 100000
# tally up n weighted choices
for i in range(n):
    tally[weighted_choice(choices)] += 1

print([t/sum(tally)*100 for t in tally])

2年前 0条评论

murphsp1 评论
这是另一个使用 numpy 的 weighted_choice 版本。传入权重向量，它将返回一个由 0 组成的数组，其中包含一个 1，表示选择了哪个 bin。代码默认只进行一次抽奖，但您可以传入要进行的抽奖次数，并且将返回每个抽奖箱的计数。

如果权重向量的总和不为 1，则将对其进行归一化以使其达到。
```
import numpy as np

def weighted_choice(weights, n=1):
    if np.sum(weights)!=1:
        weights = weights/np.sum(weights)

    draws = np.random.random_sample(size=n)

    weights = np.cumsum(weights)
    weights = np.insert(weights,0,0.0)

    counts = np.histogram(draws, bins=weights)
    return(counts[0])
```
2年前 0条评论

Mark 评论

一个通用的解决方案：

import random
def weighted_choice(choices, weights):
    total = sum(weights)
    treshold = random.uniform(0, total)
    for k, weight in enumerate(weights):
        total -= weight
        if total < treshold:
            return choices[k]

2年前 0条评论

Nsquare 评论

另一种方法是，假设我们的权重与元素数组中的元素具有相同的索引。

import numpy as np
weights = [0.1, 0.3, 0.5] #weights for the item at index 0,1,2
# sum of weights should be <=1, you can also divide each weight by sum of all weights to standardise it to <=1 constraint.
trials = 1 #number of trials
num_item = 1 #number of items that can be picked in each trial
selected_item_arr = np.random.multinomial(num_item, weights, trials)
# gives number of times an item was selected at a particular index
# this assumes selection with replacement
# one possible output
# selected_item_arr
# array([[0, 0, 1]])
# say if trials = 5, the the possible output could be 
# selected_item_arr
# array([[1, 0, 0],
#   [0, 0, 1],
#   [0, 0, 1],
#   [0, 1, 0],
#   [0, 0, 1]])

现在让我们假设，我们必须在 1 次试验中抽取 3 个项目。您可以假设存在三个球 R、G、B，它们的重量比由重量数组给出，可能会出现以下结果：

num_item = 3
trials = 1
selected_item_arr = np.random.multinomial(num_item, weights, trials)
# selected_item_arr can give output like :
# array([[1, 0, 2]])

您还可以将要选择的项目数视为一组中的二项式/多项式试验的数量。所以，上面的例子仍然可以作为

num_binomial_trial = 5
weights = [0.1,0.9] #say an unfair coin weights for H/T
num_experiment_set = 1
selected_item_arr = np.random.multinomial(num_binomial_trial, weights, num_experiment_set)
# possible output
# selected_item_arr
# array([[1, 4]])
# i.e H came 1 time and T came 4 times in 5 binomial trials. And one set contains 5 binomial trails.

2年前 0条评论

Perennial 评论

一种方法是对所有权重的总和进行随机化，然后将这些值用作每个 var 的限制点。这是作为生成器的粗略实现。

def rand_weighted(weights):
    """
    Generator which uses the weights to generate a
    weighted random values
    """
    sum_weights = sum(weights.values())
    cum_weights = {}
    current_weight = 0
    for key, value in sorted(weights.iteritems()):
        current_weight += value
        cum_weights[key] = current_weight
    while True:
        sel = int(random.uniform(0, 1) * sum_weights)
        for key, value in sorted(cum_weights.iteritems()):
            if sel < value:
                break
        yield key

2年前 0条评论

Stas Baskin 评论
我需要做一些非常快速非常简单的事情，从寻找我最终构建这个模板的想法开始。这个想法是从 api 以 json 的形式接收加权值，这里是由 dict 模拟的。

然后将其转换为一个列表，其中每个值与其权重成比例地重复，然后使用 random.choice 从列表中选择一个值。

我尝试运行 10、100 和 1000 次迭代。分布似乎相当稳固。
```
def weighted_choice(weighted_dict):
    """Input example: dict(apples=60, oranges=30, pineapples=10)"""
    weight_list = []
    for key in weighted_dict.keys():
        weight_list += [key] * weighted_dict[key]
    return random.choice(weight_list)
```
2年前 0条评论

blue_note 评论

使用 numpy

def choice(items, weights):
    return items[np.argmin((np.cumsum(weights) / sum(weights)) < np.random.rand())]

2年前 0条评论

以一定百分比从整数列表中选择数字？ [复制]

回复

相关问题