Python中的for循环逻辑回归
machine-learning 485
原文标题 :For loop Logistic regression in Python
我想在数据框上构建一个 for 循环,目标是创建一个具有每只股票准确度分数的 df。
一只股票的模型工作正常,但 for 循环没有做任何事情。下面是 df 的输出,这不是完整的 df。
Date Close ticker rating price returns direction long direction
2021-02-06 21.8 AD.AS 1 21.8 -0.02 -1 1
2021-02-06 21.8 AD.AS 1 21.8 -0.02 -1 1
2021-02-06 21.8 APPL 1 153 -0.02 -1 1
2021-02-06 21.8 APPL 1 153 -0.02 -1 1
stock_df['ticker'].unique()
array(['CSCO', 'IBM', 'AMZN', 'AD.AS'], dtype=object)
下面是代码,for循环抛出错误:
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
以下是我现在拥有的代码:
#for loop test
#Split data into training and test sets
stock_df = stock_df.dropna()
result = {}
# loop on every type
for ticker in stock_df['ticker'].unique():
# slice
stock_slice = stock_df[stock_df['ticker'] == ticker]
X = stock_df_slice.drop(['long direction', 'BuyFlag','SellFlag', 'Date', 'ticker'], axis=1)
y = stock_df_slice['BuyFlag']
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size = 0.25, random_state = 5, shuffle=False)
#Creatig duplicate for back testing
X_test_2 = X_test
#building logistic regression model on training data
result[ticker]=model1 = LogisticRegression(random_state=0, multi_class='ovr', penalty='none', solver='newton-cg', class_weight={0:0.6, 1:0.4}).fit(X_train, y_train)
result[ticker]=preds_buy = model1.predict(X_test)
#Accuracy statistics
print('Accuracy Score:', metrics.accuracy_score(y_test, preds_buy))
#Create classification report
class_report=classification_report(y_test, preds_buy)
print(class_report)
# build dataframe with all your results
final_df = pd.DataFrame(result)