Python Numpy: beginning ending indexes of True value in boolean array [duplicate]

Good evening, Is there a efficient way of getting all beginning and ending indexes of True value in boolean array? Let’s say I have this array: x = np.array([nan, 11, 13, nan, nan, nan, 9, 3, nan, 3, …

问题描述:

Good evening,

Is there a efficient way of getting all beginning and ending indexes of True value in boolean array?
Let’s say I have this array:
x = np.array([nan, 11, 13, nan, nan, nan, 9, 3, nan, 3, 4, nan])

I use np.isnan(x) so I get:
[True, False, F, T, T, T, F, F, T, F, F, T]

I would like to have at the end an array or list with only indexes of nan -> i.e one index if single, or beginning index and ending index if consecutive nan values:
[0, [3, 5], 8, 11]

Do I have to loop on the array myself and write a function or is there a numpy and efficient way of doing it?

I have already something running but as I have to deal with hundred of thousands of values per array and multiples array also, it takes time.

解决方案 1:[1]

You can use groupby from itertools module:

lst = []
for mask, grp in groupby(zip(np.arange(len(x)), np.isnan(x)), key=lambda x: x[1]):
    if mask == True:  # only for NaN
        idx = [idx for idx, _ in grp]
        lst.append([idx[0], idx[-1]] if len(idx) > 1 else idx[0])

Output:

>>> lst
[0, [3, 5], 8, 11]

解决方案 2:[2]

You can use boolean operations shifting the np.isnan output on the left/right:

# if the value a NaN?
m = np.isnan(x)
# is the preceding value not a NaN?
m2 = np.r_[False, ~m[:-1]]
# is the following value not a NaN?
m3 = np.r_[~m[1:], False]

out = np.where((m&m2)|(m&m3))[0]

Output:

array([ 0,  3,  5,  8, 11])

解决方案 3:[3]

I have a function in my library haggis called haggis.npy_util.mask2runs which does almost exactly what you want:

mask2runs(np.isnan(x))

The result will be a two-column array containing the start (inclusive) and end (exclusive) indices for each run:

[[0, 1]
 [3, 6],
 [8, 12]]

You can get the lengths of each run directly by subtraction, or by adding an argument return_lengths=True.

The function is not complicated, and you can replicate it in a one-liner for the example that you have:

 runs = numpy.flatnonzero(np.diff(numpy.r_[numpy.int8(0), mask.view(numpy.int8), numpy.int8(0)])).reshape(-1, 2)

参考链接:

Copyright Notice: This article follows StackOverflow’s copyright notice requirements and is licensed under CC BY-SA 3.0.

Article Source: StackOverflow

[1] Corralien

[2] mozway

[3] Mad Physicist

共计人评分,平均

到目前为止还没有投票!成为第一位评论此文章。

(0)
社会演员多的头像社会演员多普通用户
上一篇 2023年4月29日
下一篇 2023年4月29日

相关推荐