在 Python 中用完整形式替换首字母缩略词

xiaoxingxing 2年前 python 234

原文标题 ：Replacing acronyms with their full forms in Python

我有一个首字母缩略词词典，其中keys作为首字母缩写词和values作为完整形式。

我想用完整的表格替换text_list中的首字母缩略词以到达ouput_list

acronym_dict = {
    'QUO': 'Quotation',
    'IN': 'India',
    'SW': 'Software',
    'RE': 'Regular Expression'
}

text_list = [
    'The status QUO has changed', 'I SWEAR, This is part of IN_SW',
    'The update does not belong to the SW, version, branch',
    'This is a RE_Text'
]

output_list = [
    'The status Quotation has changed',
    'I SWEAR, This is part of India_Software',
    'The update does not belong to the Software, version, branch',
    'This is Regular Expression_Text'
]

我写了一个方法来做到这一点

import string
def remove_punctuations(text):
    punct_str = string.punctuation  # !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
    for punctuation in punct_str:
        text = text.replace(punctuation, ' ')
    return text.strip()

def replace_single_acronym(text, acronym, fullform):
    words = text.split()
    return_words = []
    for w in words:
        if remove_punctuations(w).lower() == acronym.lower():
            return_words.append(w.replace(acronym, fullform))
        else:
            return_words.append(w)
    return " ".join(return_words)

my_op_list = []
for text in text_list:
    for acronym in acronym_dict.keys():
        text = replace_single_acronym(text, acronym, acronym_dict[acronym])
    my_op_list.append(text)

理想情况下output_list和my_op_list应该看起来一样。它打印以下结果（在 2 个实例中失败）

['The status Quotation has changed',
 'I SWEAR, This is part of IN_SW',
 'The update does not belong to the Software, version, branch',
 'This is a RE_Text']

此外，该方法replace_single_acronym在 1000 个text_list项的语料库上非常慢。

有人可以帮助我调整方法以正确有效地进行吗？

原文链接：https://stackoverflow.com//questions/71555412/replacing-acronyms-with-their-full-forms-in-python

我来回复

Daweo 评论
您可以通过以下方式将函数作为第二个参数传递给此任务使用re.sub
```
import re
acronym_dict = {
    'QUO': 'Quotation',
    'IN': 'India',
    'SW': 'Software',
    'RE': 'Regular Expression'
}

text_list = [
    'The status QUO has changed', 'I SWEAR, This is part of IN_SW',
    'The update does not belong to the SW, version, branch',
    'This is a RE_Text'
]
def get_full_name(m):
    return acronym_dict.get(m.group(1),m.group(1))
def replace_acronyms(text):
    return re.sub(r'(?<![A-Z])([A-Z]+)(?![A-Z])', get_full_name, text)
output_list = [replace_acronyms(i) for i in text_list]
print(output_list)
```
输出：
```
['The status Quotation has changed', 'I SWEAR, This is part of India_Software', 'The update does not belong to the Software, version, branch', 'This is a Regular Expression_Text']
```
说明：在我使用的模式中，有两个零长度断言和一个捕获组，它确实找到了一个或多个大写 ASCII 字母，这些字母前面没有 ASCII 大写字母（负向后看），后面也没有 ASCII 大写字母（负向前看） ).get_full_name是用作re.sub的第二个参数的函数，因此它确实接受单个参数，即匹配。m.group(1)表示我在模式中使用的唯一捕获组的内容，它是首字母缩写词，我使用.getdict的函数所以如果给出的首字母缩写词是出现在 dict 键中然后使用相应的值，如果不只是使用该首字母缩写词 ie。不要更改任何内容。
2年前 0条评论

在 Python 中用完整形式替换首字母缩略词

回复

相关问题