7-逻辑回归实践

1.逻辑回归是怎么防止过拟合的？为什么正则化可以防止过拟合？

防止过拟合的方法：可以通过进行特征选择；交叉验证；正则化；加大样本量

过拟合的时候，拟合函数的系数往往非常大，因为过拟合就是拟合函数需要顾忌每一个点，最终形成的拟合函数波动很大。即在某些很小的区间里，函数值的变化很剧烈，这就意味着函数在某些小区间里的导数值（绝对值）非常大，由于自变量值可大可小，所以只有系数足够大，才能保证导数值很大。而正则化是通过约束参数的范数使其不要太大，所以可以在一定程度上减少过拟合情况。

2.用logiftic回归来进行实践操作，数据不限

使用泰坦尼克号数据集进行逻辑回归分析

实现代码：

from sklearn.linear_model import LogisticRegression #从机器学习中导入逻辑回归
from sklearn.model_selection import train_test_split #拆分为训练集合测试集
from sklearn.preprocessing import StandardScaler #标准化
from sklearn.metrics import classification_report #召回率
import pandas as pd

# 读取数据
data = pd.read_csv('.\data\titanic_data.csv')

# 数值型转换，性别女为0 男为1，有缺失值的行直接删除
data.loc[data['Sex'] == 'male', 'Sex'] = 0
data.loc[data['Sex'] == 'female', 'Sex'] = 1
#去除空值
data = data.dropna()
#查看数据集
print(data)

# 数据集划分
x_train, x_test, y_train, y_test = train_test_split(data.iloc[:, 1:], data.iloc[:, 0], test_size=0.3)

# 进行标准化处理，特征值和目标值进行标准化处理（需要分别处理），实例标准化API
std = StandardScaler()
x_train = std.fit_transform(x_train)
x_test = std.transform(x_test)

# 构建模型
lg = LogisticRegression()
#训练模型
lg.fit(x_train, y_train)
#训练系数
print("训练系数：
",lg.coef_)
#数据集准确率
train_score=lg.score(x_train,y_train)
print("训练集准确率：
",train_score)
test_score=lg.score(x_test,y_test)
print("测试集准确率：
",test_score)

#预测
lg_predict = lg.predict(x_test)
print('召回率：', classification_report(y_test, lg_predict, target_names=['存活', '遇难']))

运行结果：