Streamlit和机器学习结合应用案例：心脏病预测模型

2025-1-7

本文将介绍如何使用Streamlit和机器学习结合，从零开始构建并复现一个心脏病预测的分类模型。这个案例将展示如何利用Streamlit快速开发一个交互式的Web应用，让用户输入相关数据，模型将预测用户是否可能患有心脏病。

环境准备

首先，确保你的环境中安装了以下库：

Streamlit：用于构建Web应用。
Scikit-learn：用于机器学习模型。
Numpy：用于数据处理。
Pandas：用于数据处理。

可以通过以下命令安装所需库：

pip install streamlit scikit-learn numpy pandas

数据准备

我们将使用著名的UCI心脏病数据集，该数据集包含了关于心脏病患者的各种生理参数和诊断结果。你可以从UCI机器学习库下载该数据集。

模型构建

1. 数据预处理

首先，我们需要加载数据并进行预处理。这包括处理缺失值、编码类别特征等。

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

# 加载数据
data = pd.read_csv('heart.csv')

# 处理类别特征
label_encoders = {}
for column in data.select_dtypes(include=['object']).columns:
    label_encoders[column] = LabelEncoder()
    data[column] = label_encoders[column].fit_transform(data[column])

# 划分特征和标签
X = data.drop('target', axis=1)
y = data['target']

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

2. 模型训练

我们将使用随机森林作为我们的分类模型。

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# 训练模型
model = RandomForestClassifier(n_estimators=10)
model.fit(X_train, y_train)

# 预测测试集
y_pred = model.predict(X_test)

# 评估模型
accuracy = accuracy_score(y_test, y_pred)
print(f"模型准确率: {accuracy}")

# 保存模型
import joblib
joblib.dump(model, "RFC.pkl")

输出

模型准确率: 0.8688524590163934

Streamlit应用开发

1. 构建Streamlit应用

我们将构建一个Streamlit应用，让用户输入数据并获得预测结果。

import streamlit as st
import joblib
import numpy as np

# 加载模型
model = joblib.load('RFC.pkl')

# 类别特征到数值的映射     注：在Streamlit应用中，用户输入的是中文字符串，而不是编码后的数值。
cp_options = {
    "典型胸痛": 0,
    "非典型胸痛": 1,
    "非胸痛": 2
}

restecg_options = {
    "正常": 0,
    "ST-T波异常": 1,
    "左心室肥大": 2
}

slope_options = {
    "上升": 1,
    "平坦": 2,
    "下降": 3
}

thal_options = {
    "正常": 1,
    "固定缺陷": 2,
    "可逆缺陷": 3
}

# Streamlit应用界面
st.title('心脏病预测器')
st.write('请输入以下信息以预测心脏病风险：')

# 用户输入
age = st.number_input("年龄")
sex = st.selectbox("性别 (0=女性, 1=男性)", options=[0, 1], format_func=lambda x: '女性' if x == 0 else '男性')
cp = st.selectbox("胸痛类型", options=list(cp_options.keys()), format_func=lambda x: x)
trestbps = st.number_input("静息血压")
chol = st.number_input("血清胆固醇")
fbs = st.selectbox("空腹血糖>120 mg/dl", options=[0, 1], format_func=lambda x: '否' if x == 0 else '是')
restecg = st.selectbox("静息心电图结果", options=list(restecg_options.keys()), format_func=lambda x: x)
thalach = st.number_input("最大心率")
exang = st.selectbox("运动诱发心绞痛", options=[0, 1], format_func=lambda x: '否' if x == 0 else '是')
oldpeak = st.number_input("运动相对静息ST段下降")
slope = st.selectbox("运动峰值ST段斜率", options=list(slope_options.keys()), format_func=lambda x: x)
ca = st.number_input("荧光透视显示的主要血管数")
thal = st.selectbox("Thal", options=list(thal_options.keys()), format_func=lambda x: x)

# 将类别特征转换为数值
cp_value = cp_options[cp]
restecg_value = restecg_options[restecg]
slope_value = slope_options[slope]
thal_value = thal_options[thal]

# 构建特征数组
features = np.array([[age, sex, cp_value, trestbps, chol, fbs, restecg_value, thalach, exang, oldpeak, slope_value, ca, thal_value]])

# 预测
if st.button("预测"):
    prediction = model.predict(features)
    st.write(f"预测结果: {'有心脏病' if prediction[0] == 1 else '无心脏病'}")