使用python从邮递员中提取正文内容

社会演员多 2年前 python 188

原文标题 ：Extract content of body from Postman using python

在得到 Postman 的响应后，我正在尝试从正文中检索特定代码。我想检索此 ID：00163E7B0F671EDA82E31CA5B621A4B3 并将其写入 csv 文件

正文内容如下：

<?xml version="1.0" encoding="utf-8"?>
<feed xml:base="https://example.com">
    <id>https://example/CorporateAccountCollection</id>
    <title type="text">CorporateAccountCollection</title>
    <updated>2022-03-11T12:54:02Z</updated>
    <author>
        <name/>
    </author>
    <link href="CorporateAccountCollection" rel="self" title="CorporateAccountCollection"/>
    <entry m:etag="W/&quot;datetimeoffset'2020-02-06T12%3A46%3A35.0949040Z'&quot;">
        <id>https://example/CorporateAccountCollection('00163E7B0F671EDA82E31CA5B621A4B3')</id>
        <title type="text">CorporateAccountCollection('00163E7B0F671EDA82E31CA5B621A4B3')</title>
        <updated>2022-03-11T12:54:02Z</updated>
        <category term="c4codata.CorporateAccount" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme"/>
        <link href="CorporateAccountCollection('00163E7B0F671EDA82E31CA5B621A4B3')" rel="edit" title="CorporateAccount"/>
        <content type="application/xml">
        </content>
    </entry>
</feed>

ID 在多个地方重复但相同。请帮忙。

原文链接：https://stackoverflow.com//questions/71464199/extract-content-of-body-from-postman-using-python

我来回复

Dennis 评论

要发出 HTTP 请求，请使用 Python 的请求库。

解析XML响应，使用内置的xml.etree库。查询<id>标签，可以使用XPath。

请求示例：

import requests

r = requests.get('https://api.github.com/events')
print(r.text)  # parse your response with xml parser

完整示例：

import xml.etree.ElementTree as ET
import csv
import requests

response = requests.get('https://example.com/your_path')
root: ET.Element = ET.fromstring(response.text)

ids = []
for id_element in root.findall('.//id'):
    ids.append(id_element.text.strip())

with open('output.csv', 'w') as file:
    writer = csv.writer(file)
    writer.writerow(['id'])
    for id in ids:
        writer.writerow([id])

2年前 0条评论

Jack Taylor 评论

您可以使用requests 下载XML 内容，xml.etree.elementtree 来解析XML，并使用正则表达式从生成的URL 中重新解析ID。最后，您可以使用csv 模块将生成的ID 写入CSV 文件。我已经包括下面的一些代码可以做到这一点。

正如我对您的问题所评论的那样，XML 无效，因此 XML 解析代码包含解决此问题的技巧。如果您收到有效的 XML，您可以删除该行。

此外，您应该将http://www.example.com替换为您从中获取 XML 的任何 URL。

import csv
import re
import xml.etree.ElementTree as ET

import requests

def fetch_xml(url):
    response = requests.get(url)
    response.raise_for_status()
    return response.text

def parse_xml(xml_text):
    # Hack to fix broken XML
    xml_text = xml_text.replace("<entry m:etag=", "<entry etag=", 1)

    # Get the ID child of the entry element
    root = ET.fromstring(xml_text)
    id_element = root.find("./entry/id")
    return id_element.text

def parse_url(url):
    match = re.search("'([0-9A-F]+)'", url)
    if not match:
        raise ValueError(f"Could not parse ID from URL {url}")
    return match.group(1)

def write_csv(path, collection_id):
    with open(path, "w", encoding="utf-8", newline="") as csv_file:
        writer = csv.writer(csv_file)
        writer.writerow(["collection_id"])
        writer.writerow([collection_id])

def main():
    xml_text = fetch_xml("http://www.example.com")
    url = parse_xml(xml_text)
    collection_id = parse_url(url)
    write_csv("result.csv", collection_id)

if __name__ == "__main__":
    main()

2年前 0条评论

使用python从邮递员中提取正文内容

回复

相关问题