问题描述:
I have data about bonds
"Bond(secid='SU26238RMFS4', shortname='ОФЗ 26238', amortization=[Amortization(date=datetime.date(2041, 5, 15), value=1000, initialfacevalue=1000)], coupons=[Coupon(date=2021-12-08, value=34.04), Coupon(date=2022-06-08, value=35.4), Coupon(date=2022-12-07, value=35.4), Coupon(date=2023-06-07, value=35.4), Coupon(date=2023-12-06, value=35.4), Coupon(date=2024-06-05, value=35.4), Coupon(date=2024-12-04, value=35.4), Coupon(date=2025-06-04, value=35.4), Coupon(date=2025-12-03, value=35.4), Coupon(date=2026-06-03, value=35.4), Coupon(date=2026-12-02, value=35.4), Coupon(date=2027-06-02, value=35.4), Coupon(date=2027-12-01, value=35.4), Coupon(date=2028-05-31, value=35.4), Coupon(date=2028-11-29, value=35.4), Coupon(date=2029-05-30, value=35.4), Coupon(date=2029-11-28, value=35.4), Coupon(date=2030-05-29, value=35.4), Coupon(date=2030-11-27, value=35.4), Coupon(date=2031-05-28, value=35.4), Coupon(date=2031-11-26, value=35.4), Coupon(date=2032-05-26, value=35.4), Coupon(date=2032-11-24, value=35.4), Coupon(date=2033-05-25, value=35.4), Coupon(date=2033-11-23, value=35.4), Coupon(date=2034-05-24, value=35.4), Coupon(date=2034-11-22, value=35.4), Coupon(date=2035-05-23, value=35.4), Coupon(date=2035-11-21, value=35.4), Coupon(date=2036-05-21, value=35.4), Coupon(date=2036-11-19, value=35.4), Coupon(date=2037-05-20, value=35.4), Coupon(date=2037-11-18, value=35.4), Coupon(date=2038-05-19, value=35.4), Coupon(date=2038-11-17, value=35.4), Coupon(date=2039-05-18, value=35.4), Coupon(date=2039-11-16, value=35.4), Coupon(date=2040-05-16, value=35.4), Coupon(date=2040-11-14, value=35.4), Coupon(date=2041-05-15, value=35.4)], offers=[])"
I need to get DataFrame with all coupons
date value
2021-12-08 34.04
2022-06-08 35.4
etc
I know how to split it by split() and then merge one by one. It takes a lof of time
Can I do this more optimally?
解决方案 1:[1]
You could try using a regex find all approach:
lst = re.findall(r'Coupon\(date=(.*?), value=(.*?)\)', data)
This would leave you with a 2D list, which you could then easily convert to a Pandas data frame.
print(lst)
# [('2021-12-08', '34.04'), ('2022-06-08', '35.4'), ('2022-12-07', '35.4'),
# ...]
解决方案 2:[2]
You can achieve this by creating dataframe using pandas
from datetime import datetime
import pandas as pd
bond_data = "Bond(secid='SU26238RMFS4', shortname='ОФЗ 26238', amortization=[Amortization(date=datetime.date(2041, 5, 15), value=1000, initialfacevalue=1000)], coupons=[Coupon(date=2021-12-08, value=34.04), Coupon(date=2022-06-08, value=35.4), ...]"
coupons = [coupon.strip("Coupon(date=").rstrip(")").split(", value=") for coupon in bond_data.split("Coupon(date=")[1:]]
df = pd.DataFrame(coupons, columns=["date", "value"])
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
print(df)
参考链接:
Copyright Notice: This article follows StackOverflow’s copyright notice requirements and is licensed under CC BY-SA 3.0.
Article Source: StackOverflow
[1] Tim Biegeleisen
[2] Mahboob Nur