GSoC 2020 with LibreHealth : Final Report

Low Powered Models for Disease Detection and Classification for Radiology Images Project Description - The aim of this project is to create Deep Learning models for detection and classification of radiology images. The models must be compressed such that they can be deployed to low powered devices like ARM devices, Android devices, etc. Compression techniques such as Quantization and Pruning can be used.     Mentors - Priyanshu Sinha Saptarshi Purkayastha Judy Gichoya Geeta Priya Padmanabhan Tech Stack -  Numpy Pandas PyDicom Tensorflow Tensorflow-Lite/ Tensorflow-Model-Optimization Docker Qemu Project Link - Click here Commits - Click here    Merge Requests - Click here Why to do this - There has been a lot of progress in developing Machine Learning models that predict the medical condition of a patient based upon specific inputs relevant to the diagnosis of that condition. However, these models have drawbacks while deployment in real-time on edge devices. Firstly, they have been trai

Week 12 : Coding Period

 This week I was successfully able to make my models produce good results. I was evaluating them on the wrong structure of data. Yes, yes the data format that I was working on since ages did not prove useful to me. But it was a building block to the generators that I finally created. After that, I re-evaluated my original model and viola! I got an AUC-ROC score of 0.81. As this generator model proved useful, I immediately parallelized my operations.  I was running a total of 6 notebooks. 2 on PLHI server, 2 locally, 1 Kaggle notebook and 1 Colab notebook.  The PLHI server did not have the complete data so I exported a subset over there and re-trained my pruning model. My local system had the data but lacked processing power. So I ran model evaluations locally. Each Int-8 model took over 24 hours.  The Kaggle notebook had data and processing power but the data was not in the correct structure to be fed to models. So I tested inference scripts here. Colab lacked the data but had processi

Week 11 : Coding Period

This week I was finally able to get my data in the correct format to be fed to the pretrained Chexnet model. And the solution to my bug was mentioned on some random issue on Github only a few days ago. Luckily, due to this, I was able to evaluate my models. Therefore, I was able to finish all types of compression techniques on this dataset. However, after evaluation, the result of the model is very poor. It is unable to classify the diseases and gets a auroc score of roughly 0.5. The cause of this is dataset imbalance which is clearly observed in the models bias towards certain classes of the disease. I need to discuss this with my mentor. I tried methods such as top-3 and top-5 score to get a better result. But the performance remains fairly constant.  I have also started with writing scripts for this dataset. This should get completed in approximately 4-5 days. Simultaneously, I have also been working on methods to test my model on edge devices. As soon as I finish my scripts, I will

Week 10 : Coding Period

This week I presented my work to Saptarshi and Judy and took their feedback. I have a lot to work on. I began working on the ChexNet model for Chest-XRay14 dataset. I have finished quantizing these models into Dynamic, Float16 and Int8 models. As I am using a pre-trained model, I am putting time into getting the input and output data formats correct for the model, inference, and other purposes. Currently, I am running scripts to prune these models. This should get done soon. I have also restructured my scripts for the RSNA Pneumonia Detection dataset. Along with this, I added the inference scripts. I will raise an MR for this after discussing with my mentor. I have to write scripts after I finish working on the Chest-XRay14 dataset for all the steps - preprocessing, quantizing, pruning, evaluation, inference, etc. Once this is done, I want to evaluate all my models on ARM devices using Qemu emulator. I will check parameters such as FPS, latency and power consumption. Till then, Happy C

Week 9 - Coding Period

Update on the project - This week I tried converting the Chexnet Pytorch model to Onnx and then to Tensorflow again. As Onnx does not support DataParallel, I tried this conversion without using it. I wrote a regex to convert the state dictionary keys of the Pytorch model as the model was outdated and the keys needed to be renamed. I was able to convert the model to Onnx format. Inspite of these changes, I ran into a lot of errors while converting this new model to Tensorflow. So I dropped this model and began working on the purely Tensorflow model. This is not the standard model, but it gives considerably good results. I am currently converting the output format of this model to represent the actual class names instead of probability values and compressing these too.   I added modifications to the scripts to convert pruned models to quantized ones. I need to clean this code to specify the different input formats for models - hdf5 and h5 models. I have evaluated my models for size and a

Week 8 : Coding Period

This week I focused on the Chest-Xray14 dataset and its available models. The benchmark model for this dataset is the Chexnet model. This was availabale on Github in 2 formats - Pytorch and Tensorflow. So I downloaded the Pytorch model and set up a pipeline to convert this model from Pytorch to ONNX to Tensorflow. Part 1 of converting the Pytorch model to ONNX was implemented successfully inspite of a lot of bugs (because DataParallel is not supported by ONNX). Now the part of converting the ONNX model to Tensorflow is generating a lot of errors. Errors that sometimes have no solutions available on the internet! This section will require me to look into it thoroughly. During this course, I also searched for available Keras models of Chexnet. The one that I found did not have any concrete results. Because the creator did not add a threshold/classifier. The model simply outputs scores per class. Another fishy aspect of this model is that it is only 28MB in size. How can such a heavy,

Week 7 : Coding Period

I have finished converting the pruned models to tflite format. I have done dynamic, float16 and int8 quantization. I have also evaluated their results to be presented shortly. Currently I am working on the chest-xray14 models. I am using onnx to convert benchmark pytorch models to tensorflow format by using an intermediate onnx format. I am also implementing knowledge distillation by writing my own code for it. The available code is having issues storing the soft targets and I am finding a way to work around that. The code available on Github is in pytorch and cannot be used on my models. The available code in tensorflow is outdated and is not working correctly. So for the next week, I will be working on chestx-ray14 as well as knowledge distillation simultaneously. The knowledge distillation script will be reusable for both the datasets. I hope to complete these modules in the upcoming days. Happy Coding :)