目录
xml例子
xml = '''<?xml version='1.0' encoding='utf-8'?>
<data>
<row>
<shape>square</shape>
<degrees>360</degrees>
<sides>4.0</sides>
</row>
<row>
<shape>circle</shape>
<degrees>360</degrees>
</row>
<row>
<shape>triangle</shape>
<degrees>180</degrees>
<sides>3.0</sides>
</row>
</data>'''
方法一:利用cElementTree
from xml.etree import cElementTree as ET
import pandas as pd
# 读取xml字符串
root = ET.fromstring(text=xml)
# 读取xml文件
# tree = ET.ElementTree(file="text.xml")
# root = tree.getroot()
data = list()
for child in root:
data1 = list()
for son in child:
data1.append(son.text)
data.append(data1)
df = pd.DataFrame(data, columns=['shape', 'degrees', 'sides'])
print(df)
输出结果:
shape degrees sides
0 square 360 4.0
1 circle 360 NaN
2 triangle 180 3.0
如果 shape 、degrees、sides 不是按照一定规律排列,这样取数据容易出错。
比如将最后一组 degrees、 shape 、sides ,
输出结果便会变成:
shape degrees sides
0 square 360 4.0
1 circle 360 None
2 180 triangle 3.0
方法二:利用read_xml()
import pandas as pd
df = pd.read_xml(xml)
print(df)
输出结果:
shape degrees sides
0 square 360 4.0
1 circle 360 NaN
2 triangle 180 3.0
方法三:利用pd.json_normalize()
-
将xml转为类似json的格式
-
利用pd.json_normalize() 读到dataframe
def fun1(root):
dic1 = dict()
for child in root:
if bool(child) is True: # 有下一层
print(child.tag)
dic2 = fun1(child) # 自己调用自己
value = dic1.get(child.tag) # 存在返回,不存在返回None
if value: # 存在
value.append(dic2)
dic1[child.tag] = value
else:
dic1[child.tag] = [dic2]
else:
dic1[child.tag] = child.text
return dic1
if __name__ == '__main__':
from xml.etree import cElementTree as ET
import pandas as pd
root = ET.fromstring(text=xml)
dic1 = fun1(root)
df = pd.json_normalize(dic1['row'])
print(df)
输出结果:
shape degrees sides
0 square 360 4.0
1 circle 360 NaN
2 triangle 180 3.0
文章出处登录后可见!
已经登录?立即刷新