[UPDATE] Depreciated import to newer import

2022-08-08 16:54:18 +05:30
parent 84645739ce
commit 077a54f454
1 changed files with 46 additions and 51 deletions
--- a/PneumoniaClassificationModel/main.ipynb
+++ b/PneumoniaClassificationModel/main.ipynb
@@ -25,92 +25,87 @@
    "Run the following cell to load the necessary packages. Make sure to change the Accelerator on the right to TPU."
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "TensorFlow is a powerful tool to develop any machine learning pipeline, and today we will go over how to load Image+CSV combined datasets, how to use Keras preprocessing layers for image augmentation, and how to use pre-trained models for image classification.\n",
+    "\n",
+    "Skeleton code for the DataGenerator Sequence subclass is credited to Xie29's NB.\n",
+    "\n",
+    "Run the following cell to import the necessary packages. We will be using the GPU accelerator to efficiently train our model. Remember to change the accelerator on the right to GPU. We won't be using a TPU for this notebook because data generators are not safe to run on multiple replicas. If a TPU is not used, change the TPU_used variable to False."
+   ]
+  },
  {
   "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
+      "Tensorflow version : 2.9.1\n",
      "Number of replicas: 1\n",
      "2.9.1\n"
     ]
    }
   ],
   "source": [
-    "import re\n",
+    "from email.mime import image\n",
    "import os\n",
+    "import PIL\n",
+    "import time\n",
+    "import math\n",
+    "import warnings\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import tensorflow as tf\n",
    "import matplotlib.pyplot as plt\n",
    "from sklearn.model_selection import train_test_split\n",
+    "from keras.utils import load_img\n",
+    "\n",
+    "SEED = 1337\n",
+    "print('Tensorflow version : {}'.format(tf.__version__))\n",
    "\n",
    "try:\n",
    "    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()\n",
-    "    print('Device:', tpu.master())\n",
    "    tf.config.experimental_connect_to_cluster(tpu)\n",
    "    tf.tpu.experimental.initialize_tpu_system(tpu)\n",
    "    strategy = tf.distribute.experimental.TPUStrategy(tpu)\n",
-    "except:\n",
-    "    strategy = tf.distribute.get_strategy()\n",
-    "print('Number of replicas:', strategy.num_replicas_in_sync)\n",
+    "except ValueError:\n",
+    "    strategy = tf.distribute.get_strategy() # for CPU and single GPU\n",
+    "    print('Number of replicas:', strategy.num_replicas_in_sync)\n",
    "    \n",
    "print(tf.__version__)"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Data loading"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The Chest X-ray data we are using from Cell divides the data into train, val, and test files. There are only 16 files in the validation folder, and we would prefer to have a less extreme division between the training and the validation set. We will append the validation files and create a new split that resembes the standard 80:20 division instead.\n",
+    "\n",
+    "The first step is to load in our data. The original PANDA dataset contains large images and masks that specify which area of the mask led to the ISUP grade (determines the severity of the cancer). Since the original images contain a lot of white space and extraneous data that is not necessary for our model, we will be using tiles to condense the images. Basically, the tiles are small sections of the masked areas, and these tiles can be concatenated together so the only the masked sections of the original image remains."
+   ]
+  },
  {
   "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
-    "AUTOTUNE = tf.data.experimental.AUTOTUNE\n",
-    "BATCH_SIZE = 16 * strategy.num_replicas_in_sync\n",
-    "IMAGE_SIZE = [180, 180]\n",
-    "EPOCHS = 25"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Load the data"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The Chest X-ray data we are using from Cell divides the data into train, val, and test files. There are only 16 files in the validation folder, and we would prefer to have a less extreme division between the training and the validation set. We will append the validation files and create a new split that resembes the standard 80:20 division instead."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "ename": "ValueError",
-     "evalue": "With n_samples=0, test_size=0.2 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[1;31mValueError\u001b[0m                                Traceback (most recent call last)",
-      "\u001b[1;32mx:\\Maneesha\\GitHub\\ML Project\\PneumoniaClassificationModel\\main.ipynb Cell 8\u001b[0m in \u001b[0;36m<cell line: 4>\u001b[1;34m()\u001b[0m\n\u001b[0;32m      <a href='vscode-notebook-cell:/x%3A/Maneesha/GitHub/ML%20Project/PneumoniaClassificationModel/main.ipynb#X11sZmlsZQ%3D%3D?line=0'>1</a>\u001b[0m filenames \u001b[39m=\u001b[39m tf\u001b[39m.\u001b[39mio\u001b[39m.\u001b[39mgfile\u001b[39m.\u001b[39mglob(\u001b[39mstr\u001b[39m(\u001b[39m'\u001b[39m\u001b[39mPneumoniaClassificationModel/DataFrames/train/\u001b[39m\u001b[39m'\u001b[39m))\n\u001b[0;32m      <a href='vscode-notebook-cell:/x%3A/Maneesha/GitHub/ML%20Project/PneumoniaClassificationModel/main.ipynb#X11sZmlsZQ%3D%3D?line=1'>2</a>\u001b[0m filenames\u001b[39m.\u001b[39mextend(tf\u001b[39m.\u001b[39mio\u001b[39m.\u001b[39mgfile\u001b[39m.\u001b[39mglob(\u001b[39mstr\u001b[39m(\u001b[39m'\u001b[39m\u001b[39mPneumoniaClassificationModel/DataFrames/val/\u001b[39m\u001b[39m'\u001b[39m)))\n\u001b[1;32m----> <a href='vscode-notebook-cell:/x%3A/Maneesha/GitHub/ML%20Project/PneumoniaClassificationModel/main.ipynb#X11sZmlsZQ%3D%3D?line=3'>4</a>\u001b[0m train_filenames, val_filenames \u001b[39m=\u001b[39m train_test_split(filenames, test_size\u001b[39m=\u001b[39;49m\u001b[39m0.2\u001b[39;49m)\n",
-      "File \u001b[1;32my:\\Anaconda\\lib\\site-packages\\sklearn\\model_selection\\_split.py:2433\u001b[0m, in \u001b[0;36mtrain_test_split\u001b[1;34m(test_size, train_size, random_state, shuffle, stratify, *arrays)\u001b[0m\n\u001b[0;32m   2430\u001b[0m arrays \u001b[39m=\u001b[39m indexable(\u001b[39m*\u001b[39marrays)\n\u001b[0;32m   2432\u001b[0m n_samples \u001b[39m=\u001b[39m _num_samples(arrays[\u001b[39m0\u001b[39m])\n\u001b[1;32m-> 2433\u001b[0m n_train, n_test \u001b[39m=\u001b[39m _validate_shuffle_split(\n\u001b[0;32m   2434\u001b[0m     n_samples, test_size, train_size, default_test_size\u001b[39m=\u001b[39;49m\u001b[39m0.25\u001b[39;49m\n\u001b[0;32m   2435\u001b[0m )\n\u001b[0;32m   2437\u001b[0m \u001b[39mif\u001b[39;00m shuffle \u001b[39mis\u001b[39;00m \u001b[39mFalse\u001b[39;00m:\n\u001b[0;32m   2438\u001b[0m     \u001b[39mif\u001b[39;00m stratify \u001b[39mis\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39mNone\u001b[39;00m:\n",
-      "File \u001b[1;32my:\\Anaconda\\lib\\site-packages\\sklearn\\model_selection\\_split.py:2111\u001b[0m, in \u001b[0;36m_validate_shuffle_split\u001b[1;34m(n_samples, test_size, train_size, default_test_size)\u001b[0m\n\u001b[0;32m   2108\u001b[0m n_train, n_test \u001b[39m=\u001b[39m \u001b[39mint\u001b[39m(n_train), \u001b[39mint\u001b[39m(n_test)\n\u001b[0;32m   2110\u001b[0m \u001b[39mif\u001b[39;00m n_train \u001b[39m==\u001b[39m \u001b[39m0\u001b[39m:\n\u001b[1;32m-> 2111\u001b[0m     \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(\n\u001b[0;32m   2112\u001b[0m         \u001b[39m\"\u001b[39m\u001b[39mWith n_samples=\u001b[39m\u001b[39m{}\u001b[39;00m\u001b[39m, test_size=\u001b[39m\u001b[39m{}\u001b[39;00m\u001b[39m and train_size=\u001b[39m\u001b[39m{}\u001b[39;00m\u001b[39m, the \u001b[39m\u001b[39m\"\u001b[39m\n\u001b[0;32m   2113\u001b[0m         \u001b[39m\"\u001b[39m\u001b[39mresulting train set will be empty. Adjust any of the \u001b[39m\u001b[39m\"\u001b[39m\n\u001b[0;32m   2114\u001b[0m         \u001b[39m\"\u001b[39m\u001b[39maforementioned parameters.\u001b[39m\u001b[39m\"\u001b[39m\u001b[39m.\u001b[39mformat(n_samples, test_size, train_size)\n\u001b[0;32m   2115\u001b[0m     )\n\u001b[0;32m   2117\u001b[0m \u001b[39mreturn\u001b[39;00m n_train, n_test\n",
-      "\u001b[1;31mValueError\u001b[0m: With n_samples=0, test_size=0.2 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters."
-     ]
-    }
-   ],
-   "source": [
-    "filenames = tf.io.gfile.glob(str('PneumoniaClassificationModel/DataFrames/train/'))\n",
-    "filenames.extend(tf.io.gfile.glob(str('PneumoniaClassificationModel/DataFrames/val/')))\n",
-    "\n",
-    "train_filenames, val_filenames = train_test_split(filenames, test_size=0.2)"
+    "MAIN_DIR = '../input/prostate-cancer-grade-assessment'\n",
+    "TRAIN_IMG_DIR = '../input/panda-tiles/train'\n",
+    "TRAIN_MASKS_DIR = '../input/panda-tiles/masks'\n",
+    "train_csv = pd.read_csv(os.path.join(MAIN_DIR, 'train.csv'))"
   ]
  }
 ],