顾客购买服装的分析与预测
【实验内容】
采用决策树算法,对“双十一”期间顾客是否买服装的数据集进行分析与预测。
顾客购买服装数据集:包含review(商品评价变量)、discount(打折程度)、needed(是否必需)、shipping(是否包邮)、buy(是否购买)。
【实验要求】
1.读取顾客购买服装的数据集(数据集路径:data/data76088/3_buy.csv),探索数据。
2.分别用ID3算法和CART算法进行决策树模型的配置、模型的训练、模型的预测、模型的评估。
3.扩展内容(选做):对不同算法生成的决策树结构图进行可视化。
import pandas as pd
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import numpy as np
from sklearn import tree # 导入决策树包
from jupyterthemes import jtplot
jtplot.style(theme='monokai') # 选择一个绘图主题
读取顾客购买服装的数据集
data = pd.read_csv("./datasets/3_buy.csv")
data
review | discount | needed | shipping | buy | |
---|---|---|---|---|---|
0 | 3 | 3 | 0 | 1 | 1 |
1 | 3 | 3 | 0 | 0 | 1 |
2 | 2 | 3 | 0 | 1 | 0 |
3 | 1 | 2 | 0 | 1 | 0 |
4 | 1 | 1 | 1 | 1 | 0 |
5 | 1 | 1 | 1 | 0 | 1 |
6 | 2 | 1 | 1 | 0 | 0 |
7 | 3 | 2 | 0 | 1 | 1 |
8 | 3 | 1 | 1 | 1 | 0 |
9 | 1 | 2 | 1 | 1 | 0 |
10 | 3 | 2 | 1 | 0 | 0 |
11 | 2 | 2 | 0 | 0 | 1 |
12 | 2 | 3 | 1 | 1 | 0 |
13 | 1 | 2 | 0 | 0 | 1 |
分别用ID3算法和CART算法进行决策树模型的配置、模型的训练、模型的预测、模型的评估
数据集分割
x, y = np.split(data, indices_or_sections=(4,), axis=1)
# print(x)
# print(y)
x_train, x_test, y_train, y_test = train_test_split(
x, y, test_size=0.30)
print("x_train.shape:", x_train.shape)
print("y_train.shape:", y_train.shape)
print("x_test.shape:", x_test.shape)
print("y_test.shape:", y_test.shape)
x_train.shape: (9, 4)
y_train.shape: (9, 1)
x_test.shape: (5, 4)
y_test.shape: (5, 1)
配置模型
clf_CART = tree.DecisionTreeClassifier(criterion = 'gini',max_depth=4) #CART基尼系数
clf_ID3 = tree.DecisionTreeClassifier(criterion = 'entropy',max_depth=4) #ID3信息熵
训练模型
clf_CART.fit(x_train, y_train) #模型训练
clf_ID3.fit(x_train, y_train) #模型训练
DecisionTreeClassifier(criterion='entropy', max_depth=4)
模型预测
predictions_CART = clf_CART.predict(x_test) # 模型测试
print("predictions_CART",predictions_CART)
predictions_ID3 = clf_ID3.predict(x_test) # 模型测试
print("predictions_ID3",predictions_ID3)
predictions_CART [0 0 1 0 0]
predictions_ID3 [0 0 1 0 0]
模型评估
from sklearn.metrics import accuracy_score # 导入准确率评价指标
print('Accuracy of CART: %s'% accuracy_score(y_test, predictions_CART))
from sklearn.metrics import accuracy_score # 导入准确率评价指标
print('Accuracy of ID3: %s'% accuracy_score(y_test, predictions_ID3))
Accuracy of CART: 0.8
Accuracy of ID3: 0.8
文章出处登录后可见!
已经登录?立即刷新