如何在python中获取关键字之前或之后的两个单词的列表?

xiaoxingxing nlp 547

原文标题How to get a list of two words before or after a keyword in python?

我收集了这些数据,我试图确定关键字是否准确,它之前和之后的两个词是什么

数据 = pd.read_csv(‘jobs.csv’)

查看(数据)

Job Discerption
Engineer the job requires x,y,z…..
Driver this job need a high-school and Communication skills

数据长度约为10k

比如关键词“Communication”我能不能找到Communication前后的词,让结果看起来像这样

Job Discerption after before
Engineer the job requires x,y,z NA NA
Driver this job need a high-school and Communication skills skills high-school, and

Na,因为关键字不存在

我厌倦了熊猫和正则表达式,但没有什么对我有用:/

我非常感谢您的帮助

原文链接:https://stackoverflow.com//questions/71676195/how-to-get-a-list-of-two-words-before-or-after-a-keyword-in-python

回复

我来回复
  • Stef的头像
    Stef 评论

    您可以使用Series.map通过对每个元素应用函数来将一列映射到另一列。

    如果一个元素是一个单词列表,你可以使用list.index找到你要查找的关键字的位置,然后列表切片sentence[i-2:i]获取给定索引之前的两个单词。

    import pandas as pd
    
    data = pd.DataFrame({
        'Job': ['Engineer', 'Driver'],
        'Description': ['the job requires x,y,z', 'this job need a high-school and Communication skills']
    })
    
    def get_two_words_before(sentence, word):
        sentence = sentence.split()
        if word in sentence:
            i = sentence.index(word)
            return sentence[i-2:i]
        else:
            return []
    
    def get_two_words_after(sentence, word):
        sentence = sentence.split()
        if word in sentence:
            i = sentence.index(word)
            return sentence[i+1:i+3]
        else:
            return []
    
    data['before'] = data['Description'].map(lambda x: get_two_words_before(x, 'Communication'))
    
    data['after'] = data['Description'].map(lambda x: get_two_words_after(x, 'Communication'))
    
    print(data)
    

    输出:

            Job             Description              before     after
    0  Engineer  the job requires x,y,z                  []        []
    1    Driver  this job need a hig...  [high-school, and]  [skills]
    
    2年前 0条评论