Cardiovascular-Heart-Diseas…/main.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Cardiovascular Heart Disease Prediction Model (Stacking Model)\n",
    "`Last Updated: 11th February 2023`\n",
    "\n",
    "### **THIS RESEARCH IS PUBLISHED PLEASE [CLICK HERE](https://ijrpr.com/uploads/V4ISSUE2/IJRPR9920.pdf) FOR READING THE PAPER**\n",
    "\n",
    "### About this dataset:\n",
    "This dataset contains detailed information on the risk factors for cardiovascular disease. It includes information on age, gender, height, weight, blood pressure values, cholesterol levels, glucose levels, smoking habits and alcohol consumption of over 70 thousand individuals. Additionally it outlines if the person is active or not and if he or she has any cardiovascular diseases. This dataset provides a great resource for researchers to apply modern machine learning techniques to explore the potential relations between risk factors and cardiovascular disease that can ultimately lead to improved understanding of this serious health issue and design better preventive measures\n",
    "\n",
    "### Aim:\n",
    "This dataset can be used to explore the risk factors of cardiovascular disease in adults. The aim is to understand how certain demographic factors, health behaviors and biological markers affect the development of heart disease.\n",
    "\n",
    "### Dataset Description: This dataset contains 70,000 records of patients with the following 13 features:\n",
    "\n",
    "1. Age: Age of participant (integer)\n",
    "2. Gender: Gender of participant (male/female).\n",
    "3. Height: Height measured in centimeters (integer)\n",
    "4. Weight: Weight measured in kilograms (integer)\n",
    "5. Ap_hi: Systolic blood pressure reading taken from patient (integer)\n",
    "6. Ap_lo : Diastolic blood pressure reading taken from patient (integer)\n",
    "7. Cholesterol : Total cholesterol level read as mg/dl on a scale 0 - 5+ units( integer). Each unit denoting increase/decrease by 20 mg/dL respectively.\n",
    "8. Gluc : Glucose level read as mmol/l on a scale 0 - 16+ units( integer). Each unit denoting increase Decreaseby 1 mmol/L respectively.\n",
    "9. Smoke  : Whether person smokes or not(binary; 0= No , 1=Yes).\n",
    "10. Alco  : Whether person drinks alcohol or not(binary; 0 =No ,1 =Yes ).\n",
    "11. Active : whether person physically active or not( Binary ;0 =No,1 = Yes ).\n",
    "12. Cardio  : whether person suffers from cardiovascular diseases or not(Binary ;0 – no , 1 ‑yes )."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## IMPORTING LIBRARIES AND LOADING DATA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np \n",
    "import pandas as pd \n",
    "import os\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.metrics import accuracy_score\n",
    "from sklearn.metrics import matthews_corrcoef\n",
    "from sklearn.metrics import f1_score\n",
    "\n",
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "from sklearn.svm import LinearSVC\n",
    "from sklearn.ensemble import RandomForestClassifier\n",
    "from sklearn.tree import DecisionTreeClassifier\n",
    "from sklearn.neural_network import MLPClassifier\n",
    "from sklearn.ensemble import StackingClassifier\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "\n",
    "import joblib as joblib"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>index</th>\n",
       "      <th>id</th>\n",
       "      <th>age</th>\n",
       "      <th>gender</th>\n",
       "      <th>height</th>\n",
       "      <th>weight</th>\n",
       "      <th>ap_hi</th>\n",
       "      <th>ap_lo</th>\n",
       "      <th>cholesterol</th>\n",
       "      <th>gluc</th>\n",
       "      <th>smoke</th>\n",
       "      <th>alco</th>\n",
       "      <th>active</th>\n",
       "      <th>cardio</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>18393</td>\n",
       "      <td>2</td>\n",
       "      <td>168</td>\n",
       "      <td>62.0</td>\n",
       "      <td>110</td>\n",
       "      <td>80</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>20228</td>\n",
       "      <td>1</td>\n",
       "      <td>156</td>\n",
       "      <td>85.0</td>\n",
       "      <td>140</td>\n",
       "      <td>90</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>18857</td>\n",
       "      <td>1</td>\n",
       "      <td>165</td>\n",
       "      <td>64.0</td>\n",
       "      <td>130</td>\n",
       "      <td>70</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>17623</td>\n",
       "      <td>2</td>\n",
       "      <td>169</td>\n",
       "      <td>82.0</td>\n",
       "      <td>150</td>\n",
       "      <td>100</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>17474</td>\n",
       "      <td>1</td>\n",
       "      <td>156</td>\n",
       "      <td>56.0</td>\n",
       "      <td>100</td>\n",
       "      <td>60</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5</td>\n",
       "      <td>8</td>\n",
       "      <td>21914</td>\n",
       "      <td>1</td>\n",
       "      <td>151</td>\n",
       "      <td>67.0</td>\n",
       "      <td>120</td>\n",
       "      <td>80</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>22113</td>\n",
       "      <td>1</td>\n",
       "      <td>157</td>\n",
       "      <td>93.0</td>\n",
       "      <td>130</td>\n",
       "      <td>80</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>7</td>\n",
       "      <td>12</td>\n",
       "      <td>22584</td>\n",
       "      <td>2</td>\n",
       "      <td>178</td>\n",
       "      <td>95.0</td>\n",
       "      <td>130</td>\n",
       "      <td>90</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>8</td>\n",
       "      <td>13</td>\n",
       "      <td>17668</td>\n",
       "      <td>1</td>\n",
       "      <td>158</td>\n",
       "      <td>71.0</td>\n",
       "      <td>110</td>\n",
       "      <td>70</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>9</td>\n",
       "      <td>14</td>\n",
       "      <td>19834</td>\n",
       "      <td>1</td>\n",
       "      <td>164</td>\n",
       "      <td>68.0</td>\n",
       "      <td>110</td>\n",
       "      <td>60</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   index  id    age  gender  height  weight  ap_hi  ap_lo  cholesterol  gluc  \\\n",
       "0      0   0  18393       2     168    62.0    110     80            1     1   \n",
       "1      1   1  20228       1     156    85.0    140     90            3     1   \n",
       "2      2   2  18857       1     165    64.0    130     70            3     1   \n",
       "3      3   3  17623       2     169    82.0    150    100            1     1   \n",
       "4      4   4  17474       1     156    56.0    100     60            1     1   \n",
       "5      5   8  21914       1     151    67.0    120     80            2     2   \n",
       "6      6   9  22113       1     157    93.0    130     80            3     1   \n",
       "7      7  12  22584       2     178    95.0    130     90            3     3   \n",
       "8      8  13  17668       1     158    71.0    110     70            1     1   \n",
       "9      9  14  19834       1     164    68.0    110     60            1     1   \n",
       "\n",
       "   smoke  alco  active  cardio  \n",
       "0      0     0       1       0  \n",
       "1      0     0       1       1  \n",
       "2      0     0       0       1  \n",
       "3      0     0       1       1  \n",
       "4      0     0       0       0  \n",
       "5      0     0       0       0  \n",
       "6      0     0       1       0  \n",
       "7      0     0       1       1  \n",
       "8      0     0       1       0  \n",
       "9      0     0       0       0  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('heart_data.csv')\n",
    "df.head(10)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## DATA EXPLORATION"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 70000 entries, 0 to 69999\n",
      "Data columns (total 14 columns):\n",
      " #   Column       Non-Null Count  Dtype  \n",
      "---  ------       --------------  -----  \n",
      " 0   index        70000 non-null  int64  \n",
      " 1   id           70000 non-null  int64  \n",
      " 2   age          70000 non-null  int64  \n",
      " 3   gender       70000 non-null  int64  \n",
      " 4   height       70000 non-null  int64  \n",
      " 5   weight       70000 non-null  float64\n",
      " 6   ap_hi        70000 non-null  int64  \n",
      " 7   ap_lo        70000 non-null  int64  \n",
      " 8   cholesterol  70000 non-null  int64  \n",
      " 9   gluc         70000 non-null  int64  \n",
      " 10  smoke        70000 non-null  int64  \n",
      " 11  alco         70000 non-null  int64  \n",
      " 12  active       70000 non-null  int64  \n",
      " 13  cardio       70000 non-null  int64  \n",
      "dtypes: float64(1), int64(13)\n",
      "memory usage: 7.5 MB\n"
     ]
    }
   ],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>index</th>\n",
       "      <th>id</th>\n",
       "      <th>age</th>\n",
       "      <th>gender</th>\n",
       "      <th>height</th>\n",
       "      <th>weight</th>\n",
       "      <th>ap_hi</th>\n",
       "      <th>ap_lo</th>\n",
       "      <th>cholesterol</th>\n",
       "      <th>gluc</th>\n",
       "      <th>smoke</th>\n",
       "      <th>alco</th>\n",
       "      <th>active</th>\n",
       "      <th>cardio</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>70000.000000</td>\n",
       "      <td>70000.000000</td>\n",
       "      <td>70000.000000</td>\n",
       "      <td>70000.000000</td>\n",
       "      <td>70000.000000</td>\n",
       "      <td>70000.000000</td>\n",
       "      <td>70000.000000</td>\n",
       "      <td>70000.000000</td>\n",
       "      <td>70000.000000</td>\n",
       "      <td>70000.000000</td>\n",
       "      <td>70000.000000</td>\n",
       "      <td>70000.000000</td>\n",
       "      <td>70000.000000</td>\n",
       "      <td>70000.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>34999.500000</td>\n",
       "      <td>49972.419900</td>\n",
       "      <td>19468.865814</td>\n",
       "      <td>1.349571</td>\n",
       "      <td>164.359229</td>\n",
       "      <td>74.205690</td>\n",
       "      <td>128.817286</td>\n",
       "      <td>96.630414</td>\n",
       "      <td>1.366871</td>\n",
       "      <td>1.226457</td>\n",
       "      <td>0.088129</td>\n",
       "      <td>0.053771</td>\n",
       "      <td>0.803729</td>\n",
       "      <td>0.499700</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>20207.403759</td>\n",
       "      <td>28851.302323</td>\n",
       "      <td>2467.251667</td>\n",
       "      <td>0.476838</td>\n",
       "      <td>8.210126</td>\n",
       "      <td>14.395757</td>\n",
       "      <td>154.011419</td>\n",
       "      <td>188.472530</td>\n",
       "      <td>0.680250</td>\n",
       "      <td>0.572270</td>\n",
       "      <td>0.283484</td>\n",
       "      <td>0.225568</td>\n",
       "      <td>0.397179</td>\n",
       "      <td>0.500003</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>10798.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>55.000000</td>\n",
       "      <td>10.000000</td>\n",
       "      <td>-150.000000</td>\n",
       "      <td>-70.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>17499.750000</td>\n",
       "      <td>25006.750000</td>\n",
       "      <td>17664.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>159.000000</td>\n",
       "      <td>65.000000</td>\n",
       "      <td>120.000000</td>\n",
       "      <td>80.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>34999.500000</td>\n",
       "      <td>50001.500000</td>\n",
       "      <td>19703.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>165.000000</td>\n",
       "      <td>72.000000</td>\n",
       "      <td>120.000000</td>\n",
       "      <td>80.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>52499.250000</td>\n",
       "      <td>74889.250000</td>\n",
       "      <td>21327.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>170.000000</td>\n",
       "      <td>82.000000</td>\n",
       "      <td>140.000000</td>\n",
       "      <td>90.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>69999.000000</td>\n",
       "      <td>99999.000000</td>\n",
       "      <td>23713.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>250.000000</td>\n",
       "      <td>200.000000</td>\n",
       "      <td>16020.000000</td>\n",
       "      <td>11000.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              index            id           age        gender        height  \\\n",
       "count  70000.000000  70000.000000  70000.000000  70000.000000  70000.000000   \n",
       "mean   34999.500000  49972.419900  19468.865814      1.349571    164.359229   \n",
       "std    20207.403759  28851.302323   2467.251667      0.476838      8.210126   \n",
       "min        0.000000      0.000000  10798.000000      1.000000     55.000000   \n",
       "25%    17499.750000  25006.750000  17664.000000      1.000000    159.000000   \n",
       "50%    34999.500000  50001.500000  19703.000000      1.000000    165.000000   \n",
       "75%    52499.250000  74889.250000  21327.000000      2.000000    170.000000   \n",
       "max    69999.000000  99999.000000  23713.000000      2.000000    250.000000   \n",
       "\n",
       "             weight         ap_hi         ap_lo   cholesterol          gluc  \\\n",
       "count  70000.000000  70000.000000  70000.000000  70000.000000  70000.000000   \n",
       "mean      74.205690    128.817286     96.630414      1.366871      1.226457   \n",
       "std       14.395757    154.011419    188.472530      0.680250      0.572270   \n",
       "min       10.000000   -150.000000    -70.000000      1.000000      1.000000   \n",
       "25%       65.000000    120.000000     80.000000      1.000000      1.000000   \n",
       "50%       72.000000    120.000000     80.000000      1.000000      1.000000   \n",
       "75%       82.000000    140.000000     90.000000      2.000000      1.000000   \n",
       "max      200.000000  16020.000000  11000.000000      3.000000      3.000000   \n",
       "\n",
       "              smoke          alco        active        cardio  \n",
       "count  70000.000000  70000.000000  70000.000000  70000.000000  \n",
       "mean       0.088129      0.053771      0.803729      0.499700  \n",
       "std        0.283484      0.225568      0.397179      0.500003  \n",
       "min        0.000000      0.000000      0.000000      0.000000  \n",
       "25%        0.000000      0.000000      1.000000      0.000000  \n",
       "50%        0.000000      0.000000      1.000000      0.000000  \n",
       "75%        0.000000      0.000000      1.000000      1.000000  \n",
       "max        1.000000      1.000000      1.000000      1.000000  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "index          0\n",
       "id             0\n",
       "age            0\n",
       "gender         0\n",
       "height         0\n",
       "weight         0\n",
       "ap_hi          0\n",
       "ap_lo          0\n",
       "cholesterol    0\n",
       "gluc           0\n",
       "smoke          0\n",
       "alco           0\n",
       "active         0\n",
       "cardio         0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.isnull().sum()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## DATA PREPROCESSING"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = df.drop(['index','id'], axis = 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.to_csv('datasetCleaned.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "X = df.drop(['cardio'], axis = 1)\n",
    "y = df['cardio']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "X.to_csv('splitX.csv')\n",
    "y.to_csv('splity.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>gender</th>\n",
       "      <th>height</th>\n",
       "      <th>weight</th>\n",
       "      <th>ap_hi</th>\n",
       "      <th>ap_lo</th>\n",
       "      <th>cholesterol</th>\n",
       "      <th>gluc</th>\n",
       "      <th>smoke</th>\n",
       "      <th>alco</th>\n",
       "      <th>active</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>18393</td>\n",
       "      <td>2</td>\n",
       "      <td>168</td>\n",
       "      <td>62.0</td>\n",
       "      <td>110</td>\n",
       "      <td>80</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>20228</td>\n",
       "      <td>1</td>\n",
       "      <td>156</td>\n",
       "      <td>85.0</td>\n",
       "      <td>140</td>\n",
       "      <td>90</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>18857</td>\n",
       "      <td>1</td>\n",
       "      <td>165</td>\n",
       "      <td>64.0</td>\n",
       "      <td>130</td>\n",
       "      <td>70</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>17623</td>\n",
       "      <td>2</td>\n",
       "      <td>169</td>\n",
       "      <td>82.0</td>\n",
       "      <td>150</td>\n",
       "      <td>100</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>17474</td>\n",
       "      <td>1</td>\n",
       "      <td>156</td>\n",
       "      <td>56.0</td>\n",
       "      <td>100</td>\n",
       "      <td>60</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>21914</td>\n",
       "      <td>1</td>\n",
       "      <td>151</td>\n",
       "      <td>67.0</td>\n",
       "      <td>120</td>\n",
       "      <td>80</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>22113</td>\n",
       "      <td>1</td>\n",
       "      <td>157</td>\n",
       "      <td>93.0</td>\n",
       "      <td>130</td>\n",
       "      <td>80</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>22584</td>\n",
       "      <td>2</td>\n",
       "      <td>178</td>\n",
       "      <td>95.0</td>\n",
       "      <td>130</td>\n",
       "      <td>90</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>17668</td>\n",
       "      <td>1</td>\n",
       "      <td>158</td>\n",
       "      <td>71.0</td>\n",
       "      <td>110</td>\n",
       "      <td>70</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>19834</td>\n",
       "      <td>1</td>\n",
       "      <td>164</td>\n",
       "      <td>68.0</td>\n",
       "      <td>110</td>\n",
       "      <td>60</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     age  gender  height  weight  ap_hi  ap_lo  cholesterol  gluc  smoke  \\\n",
       "0  18393       2     168    62.0    110     80            1     1      0   \n",
       "1  20228       1     156    85.0    140     90            3     1      0   \n",
       "2  18857       1     165    64.0    130     70            3     1      0   \n",
       "3  17623       2     169    82.0    150    100            1     1      0   \n",
       "4  17474       1     156    56.0    100     60            1     1      0   \n",
       "5  21914       1     151    67.0    120     80            2     2      0   \n",
       "6  22113       1     157    93.0    130     80            3     1      0   \n",
       "7  22584       2     178    95.0    130     90            3     3      0   \n",
       "8  17668       1     158    71.0    110     70            1     1      0   \n",
       "9  19834       1     164    68.0    110     60            1     1      0   \n",
       "\n",
       "   alco  active  \n",
       "0     0       1  \n",
       "1     0       1  \n",
       "2     0       0  \n",
       "3     0       1  \n",
       "4     0       0  \n",
       "5     0       0  \n",
       "6     0       1  \n",
       "7     0       1  \n",
       "8     0       1  \n",
       "9     0       0  "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X.head(10)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Target and Feature values / Train Test Split"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((56000, 11), (14000, 11))"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_train, X_test, y_train , y_test = train_test_split(X,y, test_size = 0.2, random_state = 42)\n",
    "X_train.shape, X_test.shape"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## MODEL BUILDING"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### KNeighborsClassifier\n",
    "\n",
    "Classifier implementing the k-nearest neighbors vote.\n",
    "\n",
    "class sklearn.neighbors.KNeighborsClassifier(n_neighbors=5, *, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=None)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model performance for Training set\n",
      "- Accuracy: 0.8160535714285714\n",
      "- MCC: 0.6321696476040224\n",
      "- F1 score: 0.8160418618700654\n",
      "----------------------------------\n",
      "Model performance for Test set\n",
      "- Accuracy: 0.6652142857142858\n",
      "- MCC: 0.33048635555172207\n",
      "- F1 score: 0.6651977900484768\n"
     ]
    }
   ],
   "source": [
    "knn = KNeighborsClassifier(3) # Define classifier\n",
    "knn.fit(X_train, y_train) # Train model\n",
    "\n",
    "# Make predictions\n",
    "y_train_pred = knn.predict(X_train)\n",
    "y_test_pred = knn.predict(X_test)\n",
    "\n",
    "# Training set performance\n",
    "knn_train_accuracy = accuracy_score(y_train, y_train_pred) # Calculate Accuracy\n",
    "knn_train_mcc = matthews_corrcoef(y_train, y_train_pred) # Calculate MCC\n",
    "knn_train_f1 = f1_score(y_train, y_train_pred, average='weighted') # Calculate F1-score\n",
    "\n",
    "# Test set performance\n",
    "knn_test_accuracy = accuracy_score(y_test, y_test_pred) # Calculate Accuracy\n",
    "knn_test_mcc = matthews_corrcoef(y_test, y_test_pred) # Calculate MCC\n",
    "knn_test_f1 = f1_score(y_test, y_test_pred, average='weighted') # Calculate F1-score\n",
    "\n",
    "print('Model performance for Training set')\n",
    "print('- Accuracy: %s' % knn_train_accuracy)\n",
    "print('- MCC: %s' % knn_train_mcc)\n",
    "print('- F1 score: %s' % knn_train_f1)\n",
    "print('----------------------------------')\n",
    "print('Model performance for Test set')\n",
    "print('- Accuracy: %s' % knn_test_accuracy)\n",
    "print('- MCC: %s' % knn_test_mcc)\n",
    "print('- F1 score: %s' % knn_test_f1)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Linear Support Vector Classification (LinearSVC)\n",
    "\n",
    "Linear Support Vector Classification.\n",
    "\n",
    "Similar to SVC with parameter kernel=’linear’, but implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples.\n",
    "\n",
    "This class supports both dense and sparse input and the multiclass support is handled according to a one-vs-the-rest scheme.\n",
    "\n",
    "class sklearn.svm.LinearSVC(penalty='l2', loss='squared_hinge', *, dual=True, tol=0.0001, C=1.0, multi_class='ovr', fit_intercept=True, intercept_scaling=1, class_weight=None, verbose=0, random_state=None, max_iter=1000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model performance for Training set\n",
      "- Accuracy: 0.5605178571428572\n",
      "- MCC: 0.1982763778835479\n",
      "- F1 score: 0.4796980585847672\n",
      "----------------------------------\n",
      "Model performance for Test set\n",
      "- Accuracy: 0.5639285714285714\n",
      "- MCC: 0.20614681188435108\n",
      "- F1 score: 0.4833516841135736\n"
     ]
    }
   ],
   "source": [
    "svm_rbf = LinearSVC(C=1)\n",
    "svm_rbf.fit(X_train, y_train)\n",
    "\n",
    "# Make predictions\n",
    "y_train_pred = svm_rbf.predict(X_train)\n",
    "y_test_pred = svm_rbf.predict(X_test)\n",
    "\n",
    "# Training set performance\n",
    "svm_rbf_train_accuracy = accuracy_score(y_train, y_train_pred) # Calculate Accuracy\n",
    "svm_rbf_train_mcc = matthews_corrcoef(y_train, y_train_pred) # Calculate MCC\n",
    "svm_rbf_train_f1 = f1_score(y_train, y_train_pred, average='weighted') # Calculate F1-score\n",
    "\n",
    "# Test set performance\n",
    "svm_rbf_test_accuracy = accuracy_score(y_test, y_test_pred) # Calculate Accuracy\n",
    "svm_rbf_test_mcc = matthews_corrcoef(y_test, y_test_pred) # Calculate MCC\n",
    "svm_rbf_test_f1 = f1_score(y_test, y_test_pred, average='weighted') # Calculate F1-score\n",
    "\n",
    "print('Model performance for Training set')\n",
    "print('- Accuracy: %s' % svm_rbf_train_accuracy)\n",
    "print('- MCC: %s' % svm_rbf_train_mcc)\n",
    "print('- F1 score: %s' % svm_rbf_train_f1)\n",
    "print('----------------------------------')\n",
    "print('Model performance for Test set')\n",
    "print('- Accuracy: %s' % svm_rbf_test_accuracy)\n",
    "print('- MCC: %s' % svm_rbf_test_mcc)\n",
    "print('- F1 score: %s' % svm_rbf_test_f1)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Decision Tree Classifier\n",
    "\n",
    "A decision tree classifier.\n",
    "Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. A tree can be seen as a piecewise constant approximation.\n",
    "\n",
    "class sklearn.tree.DecisionTreeClassifier(*, criterion='gini', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, class_weight=None, ccp_alpha=0.0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model performance for Training set\n",
      "- Accuracy: 0.7315178571428571\n",
      "- MCC: 0.4659467768577868\n",
      "- F1 score: 0.7306405474693207\n",
      "----------------------------------\n",
      "Model performance for Test set\n",
      "- Accuracy: 0.7345714285714285\n",
      "- MCC: 0.4719694304599963\n",
      "- F1 score: 0.7338280385263793\n"
     ]
    }
   ],
   "source": [
    "dt = DecisionTreeClassifier(max_depth=5) # Define classifier\n",
    "dt.fit(X_train, y_train) # Train model\n",
    "\n",
    "# Make predictions\n",
    "y_train_pred = dt.predict(X_train)\n",
    "y_test_pred = dt.predict(X_test)\n",
    "\n",
    "# Training set performance\n",
    "dt_train_accuracy = accuracy_score(y_train, y_train_pred) # Calculate Accuracy\n",
    "dt_train_mcc = matthews_corrcoef(y_train, y_train_pred) # Calculate MCC\n",
    "dt_train_f1 = f1_score(y_train, y_train_pred, average='weighted') # Calculate F1-score\n",
    "\n",
    "# Test set performance\n",
    "dt_test_accuracy = accuracy_score(y_test, y_test_pred) # Calculate Accuracy\n",
    "dt_test_mcc = matthews_corrcoef(y_test, y_test_pred) # Calculate MCC\n",
    "dt_test_f1 = f1_score(y_test, y_test_pred, average='weighted') # Calculate F1-score\n",
    "\n",
    "print('Model performance for Training set')\n",
    "print('- Accuracy: %s' % dt_train_accuracy)\n",
    "print('- MCC: %s' % dt_train_mcc)\n",
    "print('- F1 score: %s' % dt_train_f1)\n",
    "print('----------------------------------')\n",
    "print('Model performance for Test set')\n",
    "print('- Accuracy: %s' % dt_test_accuracy)\n",
    "print('- MCC: %s' % dt_test_mcc)\n",
    "print('- F1 score: %s' % dt_test_f1)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Random Forest Classifier\n",
    "\n",
    "A random forest classifier.\n",
    "\n",
    "A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is controlled with the max_samples parameter if bootstrap=True (default), otherwise the whole dataset is used to build each tree.\n",
    "\n",
    "class sklearn.ensemble.RandomForestClassifier(n_estimators=100, *, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='sqrt', max_leaf_nodes=None, min_impurity_decrease=0.0, bootstrap=True, oob_score=False, n_jobs=None, random_state=None, verbose=0, warm_start=False, class_weight=None, ccp_alpha=0.0, max_samples=None)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model performance for Training set\n",
      "- Accuracy: 0.9791964285714285\n",
      "- MCC: 0.9587219000239561\n",
      "- F1 score: 0.9791925222267412\n",
      "----------------------------------\n",
      "Model performance for Test set\n",
      "- Accuracy: 0.6987142857142857\n",
      "- MCC: 0.3992035028287897\n",
      "- F1 score: 0.6981032127638641\n"
     ]
    }
   ],
   "source": [
    "rf = RandomForestClassifier(n_estimators=10) # Define classifier\n",
    "rf.fit(X_train, y_train) # Train model\n",
    "\n",
    "# Make predictions\n",
    "y_train_pred = rf.predict(X_train)\n",
    "y_test_pred = rf.predict(X_test)\n",
    "\n",
    "# Training set performance\n",
    "rf_train_accuracy = accuracy_score(y_train, y_train_pred) # Calculate Accuracy\n",
    "rf_train_mcc = matthews_corrcoef(y_train, y_train_pred) # Calculate MCC\n",
    "rf_train_f1 = f1_score(y_train, y_train_pred, average='weighted') # Calculate F1-score\n",
    "\n",
    "# Test set performance\n",
    "rf_test_accuracy = accuracy_score(y_test, y_test_pred) # Calculate Accuracy\n",
    "rf_test_mcc = matthews_corrcoef(y_test, y_test_pred) # Calculate MCC\n",
    "rf_test_f1 = f1_score(y_test, y_test_pred, average='weighted') # Calculate F1-score\n",
    "\n",
    "print('Model performance for Training set')\n",
    "print('- Accuracy: %s' % rf_train_accuracy)\n",
    "print('- MCC: %s' % rf_train_mcc)\n",
    "print('- F1 score: %s' % rf_train_f1)\n",
    "print('----------------------------------')\n",
    "print('Model performance for Test set')\n",
    "print('- Accuracy: %s' % rf_test_accuracy)\n",
    "print('- MCC: %s' % rf_test_mcc)\n",
    "print('- F1 score: %s' % rf_test_f1)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Multi-layer Perceptron classifier\n",
    "\n",
    "This model optimizes the log-loss function using LBFGS or stochastic gradient descent.\n",
    "\n",
    "class sklearn.neural_network.MLPClassifier(hidden_layer_sizes=(100,), activation='relu', *, solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=10, max_fun=15000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model performance for Training set\n",
      "- Accuracy: 0.687125\n",
      "- MCC: 0.4003987827689168\n",
      "- F1 score: 0.676753367823394\n",
      "----------------------------------\n",
      "Model performance for Test set\n",
      "- Accuracy: 0.6843571428571429\n",
      "- MCC: 0.3954742763523196\n",
      "- F1 score: 0.6739382133165988\n"
     ]
    }
   ],
   "source": [
    "mlp = MLPClassifier(alpha=1, max_iter=1000)\n",
    "mlp.fit(X_train, y_train)\n",
    "\n",
    "# Make predictions\n",
    "y_train_pred = mlp.predict(X_train)\n",
    "y_test_pred = mlp.predict(X_test)\n",
    "\n",
    "# Training set performance\n",
    "mlp_train_accuracy = accuracy_score(y_train, y_train_pred) # Calculate Accuracy\n",
    "mlp_train_mcc = matthews_corrcoef(y_train, y_train_pred) # Calculate MCC\n",
    "mlp_train_f1 = f1_score(y_train, y_train_pred, average='weighted') # Calculate F1-score\n",
    "\n",
    "# Test set performance\n",
    "mlp_test_accuracy = accuracy_score(y_test, y_test_pred) # Calculate Accuracy\n",
    "mlp_test_mcc = matthews_corrcoef(y_test, y_test_pred) # Calculate MCC\n",
    "mlp_test_f1 = f1_score(y_test, y_test_pred, average='weighted') # Calculate F1-score\n",
    "\n",
    "print('Model performance for Training set')\n",
    "print('- Accuracy: %s' % mlp_train_accuracy)\n",
    "print('- MCC: %s' % mlp_train_mcc)\n",
    "print('- F1 score: %s' % mlp_train_f1)\n",
    "print('----------------------------------')\n",
    "print('Model performance for Test set')\n",
    "print('- Accuracy: %s' % mlp_test_accuracy)\n",
    "print('- MCC: %s' % mlp_test_mcc)\n",
    "print('- F1 score: %s' % mlp_test_f1)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Stacking Classifier\n",
    "\n",
    "Stack of estimators with a final classifier.\n",
    "\n",
    "Stacked generalization consists in stacking the output of individual estimator and use a classifier to compute the final prediction. Stacking allows to use the strength of each individual estimator by using their output as input of a final estimator.\n",
    "\n",
    "Note that estimators_ are fitted on the full X while final_estimator_ is trained using cross-validated predictions of the base estimators using cross_val_predict.\n",
    "\n",
    "class sklearn.ensemble.StackingClassifier(estimators, final_estimator=None, *, cv=None, stack_method='auto', n_jobs=None, passthrough=False, verbose=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model performance for Training set\n",
      "- Accuracy: 0.7776785714285714\n",
      "- MCC: 0.5613383240343626\n",
      "- F1 score: 0.7764597470054544\n",
      "----------------------------------\n",
      "Model performance for Test set\n",
      "- Accuracy: 0.7359285714285714\n",
      "- MCC: 0.47849049367542906\n",
      "- F1 score: 0.7341756948223885\n"
     ]
    }
   ],
   "source": [
    "estimator_list = [\n",
    "    ('knn',knn),\n",
    "    ('svm_rbf',svm_rbf),\n",
    "    ('dt',dt),\n",
    "    ('rf',rf),\n",
    "    ('mlp',mlp) ]\n",
    "\n",
    "# Build stack model\n",
    "stack_model = StackingClassifier(\n",
    "    estimators=estimator_list, final_estimator=LogisticRegression()\n",
    ")\n",
    "\n",
    "# Train stacked model\n",
    "stack_model.fit(X_train, y_train)\n",
    "\n",
    "# Make predictions\n",
    "y_train_pred = stack_model.predict(X_train)\n",
    "y_test_pred = stack_model.predict(X_test)\n",
    "\n",
    "# Training set model performance\n",
    "stack_model_train_accuracy = accuracy_score(y_train, y_train_pred) # Calculate Accuracy\n",
    "stack_model_train_mcc = matthews_corrcoef(y_train, y_train_pred) # Calculate MCC\n",
    "stack_model_train_f1 = f1_score(y_train, y_train_pred, average='weighted') # Calculate F1-score\n",
    "\n",
    "# Test set model performance\n",
    "stack_model_test_accuracy = accuracy_score(y_test, y_test_pred) # Calculate Accuracy\n",
    "stack_model_test_mcc = matthews_corrcoef(y_test, y_test_pred) # Calculate MCC\n",
    "stack_model_test_f1 = f1_score(y_test, y_test_pred, average='weighted') # Calculate F1-score\n",
    "\n",
    "print('Model performance for Training set')\n",
    "print('- Accuracy: %s' % stack_model_train_accuracy)\n",
    "print('- MCC: %s' % stack_model_train_mcc)\n",
    "print('- F1 score: %s' % stack_model_train_f1)\n",
    "print('----------------------------------')\n",
    "print('Model performance for Test set')\n",
    "print('- Accuracy: %s' % stack_model_test_accuracy)\n",
    "print('- MCC: %s' % stack_model_test_mcc)\n",
    "print('- F1 score: %s' % stack_model_test_f1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run the Line 2 only then model changed!!!\n",
    "joblib.dump(stack_model, 'Cardiovascular-prediction-model.joblib')\n",
    "\n",
    "model = joblib.load('Cardiovascular-prediction-model.joblib') # Model File"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## PREDICTION TESTING"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "######################################################### PROGRAM STARTED #########################################################\n",
      "\n",
      "Array 0 = [22197     0   126   162 12137  1320     2     3     1     1     1]\n",
      "Model predicts Disease =  [1]\n",
      "######################################################### PROGRAM FINISHED #########################################################\n"
     ]
    }
   ],
   "source": [
    "import random\n",
    "\n",
    "count=0\n",
    "array=[]\n",
    "predictionOutcome=[0]\n",
    "\n",
    "print(\"######################################################### PROGRAM STARTED #########################################################\")\n",
    "\n",
    "while predictionOutcome == [0]:\n",
    "    arr1=np.random.randint(23714, size=1)\n",
    "    arr2=np.random.randint(2, size=1)\n",
    "    arr3=np.random.randint(251, size=1)\n",
    "    arr4=np.random.randint(201, size=1)\n",
    "    arr5=np.random.randint(16021, size=1)\n",
    "    arr6=np.random.randint(11001, size=1)\n",
    "    arr7=np.random.randint(6, size=1)\n",
    "    arr8=np.random.randint(17, size=1)\n",
    "    arr9=np.random.randint(2, size=1)\n",
    "    arr10=np.random.randint(2, size=1)\n",
    "    arr11=np.random.randint(2, size=1)\n",
    "    \n",
    "    array = np.concatenate((arr1, arr2, arr3, arr4, arr5, arr6, arr7, arr8, arr9, arr10, arr11), axis=0)\n",
    "    \n",
    "    print(\"\\nArray %d =\" %count, array)\n",
    "    \n",
    "    predictionOutcome = model.predict([array])\n",
    "    \n",
    "    if predictionOutcome == 0:\n",
    "        print(\"Model predicts NO Disease = \", predictionOutcome)\n",
    "        print(\"###################################################################################################################################\")\n",
    "    else:\n",
    "        print(\"Model predicts Disease = \", predictionOutcome)\n",
    "    \n",
    "    count+=1\n",
    "else:\n",
    "    print(\"######################################################### PROGRAM FINISHED #########################################################\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "CardiovascularHeartDisease",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.13"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "289ca0050d77f4c45e5b96d3b6db111060794819dc287c32d07051ebffa70bf3"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}