At the end of 2019, I got a very bad flu which then ended up in an acute bronchitis, from which I recovered very slowly. During the rest of my life, after my childhood, I didn't use to get flu often and any way it always didn't last more than 3 days, with no aftermath. After this experience I became curious to learn more about lung diseases and, being my job in Data Science, also try to address a specific use case. I found on Kaggle this data set of chest X-Ray images to play with: I tried to build a first simple model (CNN) to perform binary classification (normal patient or pneumonia) for those images and to use it to go on with my experimentation about XAI techniques. The first release of the model, in spite of a 90% accuracy, showed through the confusion matrix and other metrics such as precision and recall, that it was very good at recognizing X-ray images from normal patients as normal, but not the same on classifying pneumonia X-ray images as such. This behavior was confirmed by starting using the model to make predictions on test images. Before going further, I started applying some XAI techniques, such as SHAP (SHapley Additive exPlanations) and then asked for support to a friend of mine who's a an experienced Radiologist. This collaboration gave me insights than I (as a Biomedical Engineer with extensive experience in Software Engineering and Data Science in other sectors (such as Biotech Manufacturing, Healthcare Insurance and Cloud Operations, just to mention a few), but not a Medical Doctor nor a X-ray expert) couldn't have figure out myself and that led me to find better solutions. First of all, I started with the wrong assumption, after manually reviewing the training, validation and test data set, that all of the X-ray images have been done using the same projection, while most part of the normal, such this one in figure 1
Figure 1
are in PA (posteroanterior) projection, and others related to pneumonia have been taken in AP (anteroposterior) projection (patient most probably not in condition to stand), which then resulted in a different contrast and slight different position of the lungs if compared to PA views. With reference to figure 2 (patient affected by bacterial pneumonia)
Figure 2
Figure 3
which always used to show a red area (which represents a group of features (pixels) in the input image that tried to make the model diverging from the predicted value) below the right lung, I got notification of something that my eyes, being not trained to analyze X-ray images, didn't catch: that red area highlights the presence of something external, such as a plastic tube. These are just few examples, but the continuous feedback from a Radiologist led me to learn a lot on this subject and start to achieve better results before moving to something more complex such as a multi-classifier which could detect also COVID-19, but that's for another story. This post is a reminder for Data Scientists to always try to improve their knowledge in the specific sector and related problems they focus on by starting a collaboration with SMEs. This way any ongoing effort to try to solve COVID-19 related problems could be really productive and not just a simple exercise of style or a Kaggle competition surrogate.
Comments
Post a Comment