Let’s Start With the Data
I would say the largest shortcoming that I foresaw from the start was lack of data and COVID-19 X-ray images. However, I decided to move forward with the project anyways in hopes of larger datasets in the future that I can tune this network to.
I found a great resource online that had compiled a dataset of 25 Posterieranterior COVID-19 infected lung x-rays and 25 x-rays of healthy lungs to feed into my neural network ultimately. Please refer to this helpful post from pyimagesearch.com for more details.
The csv file comes from a Kaggle dataset that I found with the same COVID-19 infected lung x-rays as the author of the above pyimagesearch post used to compile the data.
Find the kaggle dataset here → https://www.kaggle.com/bachrr/covid-chest-xray#metadata.csv
After importing the dataset, below I have matched up all of the healthy/covid images with their corresponding labels using the package cv2.
I then go on to establish parameters (which I had tuned throughout the project to find optimum results). These parameters are number of epochs (or times passing through CNN), learning rate (which is how drastically the weights are affected as the network learns), and batch size (which is number of training samples used in one iteration).
I then initialize my feature and label vectors. My feature vectors consist of the attributes that are being used to determine the outcome, or prediction of the network, which is the label vector.
As I built these initial vectors, I resized the x-ray images to be 224×224 pixels so that they were standardized and uniform.
In order to work with the classes covid/healthy in a way that they were numerical, but not ordinal, I used one-hot encoding, which basically creates more attributes, or dimensions, your dataset is working with by number of unique labels and either fills them with a value of 1 or 0 based on whether that entry is associated with the corresponding label. Here I used LabelBinarizer() as I had two classes. (Binary)