如何在python中获取关键字之前或之后的两个单词的列表?
nlp 547
原文标题 :How to get a list of two words before or after a keyword in python?
我收集了这些数据,我试图确定关键字是否准确,它之前和之后的两个词是什么
数据 = pd.read_csv(‘jobs.csv’)
查看(数据)
Job | Discerption |
---|---|
Engineer | the job requires x,y,z….. |
Driver | this job need a high-school and Communication skills |
数据长度约为10k
比如关键词“Communication”我能不能找到Communication前后的词,让结果看起来像这样
Job | Discerption | after | before |
---|---|---|---|
Engineer | the job requires x,y,z | NA | NA |
Driver | this job need a high-school and Communication skills | skills | high-school, and |
Na,因为关键字不存在
我厌倦了熊猫和正则表达式,但没有什么对我有用:/
我非常感谢您的帮助
回复
我来回复-
Stef 评论
您可以使用
Series.map
通过对每个元素应用函数来将一列映射到另一列。如果一个元素是一个单词列表,你可以使用
list.index
找到你要查找的关键字的位置,然后列表切片sentence[i-2:i]
获取给定索引之前的两个单词。import pandas as pd data = pd.DataFrame({ 'Job': ['Engineer', 'Driver'], 'Description': ['the job requires x,y,z', 'this job need a high-school and Communication skills'] }) def get_two_words_before(sentence, word): sentence = sentence.split() if word in sentence: i = sentence.index(word) return sentence[i-2:i] else: return [] def get_two_words_after(sentence, word): sentence = sentence.split() if word in sentence: i = sentence.index(word) return sentence[i+1:i+3] else: return [] data['before'] = data['Description'].map(lambda x: get_two_words_before(x, 'Communication')) data['after'] = data['Description'].map(lambda x: get_two_words_after(x, 'Communication')) print(data)
输出:
Job Description before after 0 Engineer the job requires x,y,z [] [] 1 Driver this job need a hig... [high-school, and] [skills]
2年前