使用python从邮递员中提取正文内容
python 188
原文标题 :Extract content of body from Postman using python
在得到 Postman 的响应后,我正在尝试从正文中检索特定代码。我想检索此 ID:00163E7B0F671EDA82E31CA5B621A4B3 并将其写入 csv 文件
正文内容如下:
<?xml version="1.0" encoding="utf-8"?>
<feed xml:base="https://example.com">
<id>https://example/CorporateAccountCollection</id>
<title type="text">CorporateAccountCollection</title>
<updated>2022-03-11T12:54:02Z</updated>
<author>
<name/>
</author>
<link href="CorporateAccountCollection" rel="self" title="CorporateAccountCollection"/>
<entry m:etag="W/"datetimeoffset'2020-02-06T12%3A46%3A35.0949040Z'"">
<id>https://example/CorporateAccountCollection('00163E7B0F671EDA82E31CA5B621A4B3')</id>
<title type="text">CorporateAccountCollection('00163E7B0F671EDA82E31CA5B621A4B3')</title>
<updated>2022-03-11T12:54:02Z</updated>
<category term="c4codata.CorporateAccount" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme"/>
<link href="CorporateAccountCollection('00163E7B0F671EDA82E31CA5B621A4B3')" rel="edit" title="CorporateAccount"/>
<content type="application/xml">
</content>
</entry>
</feed>
ID 在多个地方重复但相同。请帮忙。
回复
我来回复-
Dennis 评论
要发出 HTTP 请求,请使用 Python 的请求库。
解析XML响应,使用内置的xml.etree库。查询
<id>
标签,可以使用XPath。请求示例:
import requests r = requests.get('https://api.github.com/events') print(r.text) # parse your response with xml parser
完整示例:
import xml.etree.ElementTree as ET import csv import requests response = requests.get('https://example.com/your_path') root: ET.Element = ET.fromstring(response.text) ids = [] for id_element in root.findall('.//id'): ids.append(id_element.text.strip()) with open('output.csv', 'w') as file: writer = csv.writer(file) writer.writerow(['id']) for id in ids: writer.writerow([id])
2年前 -
Jack Taylor 评论
您可以使用requests 下载XML 内容,xml.etree.elementtree 来解析XML,并使用正则表达式从生成的URL 中重新解析ID。最后,您可以使用csv 模块将生成的ID 写入CSV 文件。我已经包括下面的一些代码可以做到这一点。
正如我对您的问题所评论的那样,XML 无效,因此 XML 解析代码包含解决此问题的技巧。如果您收到有效的 XML,您可以删除该行。
此外,您应该将
http://www.example.com
替换为您从中获取 XML 的任何 URL。import csv import re import xml.etree.ElementTree as ET import requests def fetch_xml(url): response = requests.get(url) response.raise_for_status() return response.text def parse_xml(xml_text): # Hack to fix broken XML xml_text = xml_text.replace("<entry m:etag=", "<entry etag=", 1) # Get the ID child of the entry element root = ET.fromstring(xml_text) id_element = root.find("./entry/id") return id_element.text def parse_url(url): match = re.search("'([0-9A-F]+)'", url) if not match: raise ValueError(f"Could not parse ID from URL {url}") return match.group(1) def write_csv(path, collection_id): with open(path, "w", encoding="utf-8", newline="") as csv_file: writer = csv.writer(csv_file) writer.writerow(["collection_id"]) writer.writerow([collection_id]) def main(): xml_text = fetch_xml("http://www.example.com") url = parse_xml(xml_text) collection_id = parse_url(url) write_csv("result.csv", collection_id) if __name__ == "__main__": main()
2年前