Capture index in multiple groups [duplicate]

Using re.finditer(r”(.)(.)\2\1″, “ABBAABBA”), it yields matches with the span (0,4) and (4,8). However, the expected result are the spans (0,4), (2,4), (4,8), as BAAB also matches …

问题描述:

I’m trying to find every 10 digit series of numbers within a larger series of numbers using re in Python 2.6.

I’m easily able to grab no overlapping matches, but I want every match in the number series. Eg.

in “123456789123456789”

I should get the following list:

[1234567891,2345678912,3456789123,4567891234,5678912345,6789123456,7891234567,8912345678,9123456789]

I’ve found references to a “lookahead”, but the examples I’ve seen only show pairs of numbers rather than larger groupings and I haven’t been able to convert them beyond the two digits.

解决方案 1:[1]

Piggybacking on the accepted answer, the following currently works as well

import re
s = "123456789123456789"
matches = re.findall(r'(?=(\d{10}))',s)
results = [int(match) for match in matches]

解决方案 2[最佳方案][2]

Use a capturing group inside a lookahead. The lookahead captures the text you’re interested in, but the actual match is technically the zero-width substring before the lookahead, so the matches are technically non-overlapping:

import re 
s = "123456789123456789"
matches = re.finditer(r'(?=(\d{10}))', s)
results = [int(match.group(1)) for match in matches]
# results: 
# [1234567891,
#  2345678912,
#  3456789123,
#  4567891234,
#  5678912345,
#  6789123456,
#  7891234567,
#  8912345678,
#  9123456789]

解决方案 3:[3]

I’m fond of regexes, but they are not needed here.

Simply

s =  "123456789123456789"

n = 10
li = [ s[i:i+n] for i in xrange(len(s)-n+1) ]
print '\n'.join(li)

result

1234567891
2345678912
3456789123
4567891234
5678912345
6789123456
7891234567
8912345678
9123456789

解决方案 4:[4]

You can also try using the third-party regex module (not re), which supports overlapping matches.

>>> import regex as re
>>> s = "123456789123456789"
>>> matches = re.findall(r'\d{10}', s, overlapped=True)
>>> for match in matches: print(match)  # print match
...
1234567891
2345678912
3456789123
4567891234
5678912345
6789123456
7891234567
8912345678
9123456789

解决方案 5:[5]

conventional way:

import re


S = '123456789123456789'
result = []
while len(S):
    m = re.search(r'\d{10}', S)
    if m:
        result.append(int(m.group()))
        S = S[m.start() + 1:]
    else:
        break
print(result)

解决方案 6:[6]

This is the much referenced Duplicate linked everywhere. It’s a fallacy to conflate overlapping 12345 and with matches 1,12,123,1234... which are indeed overlapping, but can never be matched using (?=(\d+)) regex, in any regex engine. Therefore the word Overlapping is not true in that sense. Perhaps its wise to not use this constantly referenced duplicate by an uneducated user, since it does not pertain to Overlapped Matches.

参考链接:

Copyright Notice: This article follows StackOverflow’s copyright notice requirements and is licensed under CC BY-SA 3.0.

Article Source: StackOverflow

[1] Michael

[2] mechanical_meat

[3] eyquem

[4] David C

[5] Avi Cohen

[6] sln

共计人评分,平均

到目前为止还没有投票!成为第一位评论此文章。

(0)
乘风的头像乘风管理团队
上一篇 2023年12月13日
下一篇 2023年12月13日

相关推荐