A Generic Study on Diabetic Retinopathy Detection

Diabetic Retinopathy is one of a major cause of visual defects in the growing population which affects the light perception part of the retina. It affects both types of diabetes mellitus. It occurs when high blood sugar levels damage the blood vessels in retina causing them to swell and leak or stop blood flow through them. It starts with no or mild vision problems and can eventually cause blindness if not treated.With the advancements in technology, automated detection and analysis of the stage of Diabetic Retinopathy will help in early detection and treatment. Almost 75% of the patients with diabetes have the risk of being affected by this disease. With early detection this disease can be prevented. Currently DR detection is a traditional and manual, time-consuming process. It requires a trained technician to analyze the color fundus image of retina.With the ever growing population, DR detection is very high in demand to prevent blindness. In this paper we aim to review the existing methodologies and techniques for detection. Also a system for the detection of the 4 stages of DR is proposed.


Introduction
Diabetic Retinopathy have another common household name diabetics with eye disease, which is a problem that causes effect to retina because of diabetes mellitus. This disease have turned into a major issue and also the cause of blindness among citizens of developed countries. If we look at the statistics almost 80% of people that have continuous diabetes for 20 years or more are very likely to generate this condition in their eyes. As per research by doctors and comprehensive monitoring over a period of time it has come to notice that minimum of 90% early registered issues can be decreased with better treatment and controlling. Among total number of people who are diagnosed with this disease every 12% of face complete blindness and this disease is very common among people of age 20 to 64. Symptoms for it are non-noticeable in the beginning, the process of blinding by DR is split into three stages NPDR, it can be detected using fundus photography for objective documentation of the fundus finding.
Anyone who is suffering from type 1 and type 2 diabetes is under threat to Diabetic Retinopathy. During the pregnancy time this disease could be problematic for females with diabetic disease, as per NIH recommendations that all female about to have child birth if are diagnosed with diabetes should comprehensive eye examination. Diabetic Retinopathy is a majorly caused by internal damage of the small blood vessels and neuron inside the retina. Whenever someone get diabetes of any kind the first change occurs in in later stages the neurons starts to dysfunction and result in changes of functioning of outer retina and some visible changes in visual functioning.
The increase in glucose amount in the eye is extremely harmful for the small nerve cells as the glucose causes inflammation which further causes poor blood sugar control and damages the nerves by blocking their flow of blood.
As this problem of analysis through images appeared to be similar to a computer vision problem, so we went through many different-different kind of research papers, few surveys and data sets which helped us in building one complete machine learning model using deep learning algorithms for allowing the use of computer vision technology, this allowed us to automate the process of identifying the problem much more accurately through the images. Throughout the survey we did extensive reading on several different techniques used for curing and prognosis of Diabetic Retinopathy, following are the medication methods: 3. Ophthalmoscopy, in this test slit lamps are used so that the doctor can take a close look at the retina. It is also done by wearing a special headset with bright light, looks through using special magnifying glass, wide view of the retina.
These are methods which are used for checking the problem if it is possible for someone to have the disease. If an untimely failure of machines and instrument occurs it will be difficult and almost next to impossible for testing the patients eyes and also many times the eye specialist also misses the parts and fail to identify Diabetic Retinopathy.
This catastrophic issue could be solved by reducing the amount of time spent on analysis part after capturing the images which is a manual process, in order to automate the process we used machine learning and deep learning with the help Deep convolution network architecture to make one model which is responsible for learning the images captured through the process of testing and then making prediction using machine learning algorithms. It will allow us to quickly identify the eyes and will scale up the operations process. It is a computer vision task and all the algorithms used by us are related to computer vision task, we were successful in using InceptionV3 model which was released by Google in 2014. As we planned to do it as a supervised learning project.
In this paper we have used the Standard dataset given on the Kaggle website which was part of a competition in 2014. It is a dataset of size 82.23GB. It contains large set of high-resolution retina images along with left and right eye images for every subject. Also it contains the images under the 4 categories as below:-None Diabetic Retinopathy, Mild Retinopathy, Moderate Retinopathy, Severe Retinopathy. proliferative Diabetic Retinopathy. Which allowed the dataset creators for making labels for images much more accurately and classifying them in these certain categories and features.
Throughout the paper the different-different measuring techniques are discussed and main focus have been given to the one that produce optimal results.

Early detection and Multistage classification of Diabetic Retinopathy using Random Forest Classifier
In this paper image mrophological operations and thresholding techniques are used for segmenting the retinal features like microaneurysms, haemorrhages. After this classification is done using Random forest classifiers into 4 layers, normal,Mild-NPDR,Moderate-NPDR, and PDR.The paper claims to obtain accuracy of 91.2%.

Image Pre-processing:-
It was done to make the input of the retinal images suitable so that features can be extracted from them.The RGB image was converted to greyscale images and the images was re-scaled to 560x720 & aspect ratio preserved. Also the 3 channels of RGB images are extracted and Green Channel is seleccted as the anomalies are properly visible in this channel.
For noise in the images, it was separated using a median filter. It was used because it is capable of removeing the noise in the images and also restore edges.
The contrast was improved using CLAHE(contrast limited adaptive histogram equalization) technique.

Segmentation-
In this paper they applied segmentation of vessels and MA -Haemorrhage Segmentation.
For segmentation of Vessels-For the detection of red lesions, the separation of the blood vessels was a major step. For this Top hat transformation with suitable threshlod was used producing the resultant images with blood vessels along with MA and haemorrhages. Other structures in the images are removed by passing it through a suitable threshold that would remove the structures having area less than the value of threshold specified.
For MA & Haemorrhage segmentation-These are segmented by considering their circular shape and sharp texture features. Canny edge detector was used for edge detection and broken edges were filled using morphological close operation using disc shaped structuring element.
Any remaining blood vessel fragment retained was removed using a morphological open operation.

Feature Extraction
Here after the segmentation phase, statistical attributes are obtained from the images. With the help of GLCM(Gray Level Co-occurrence matrix), textual features was extracted .
After feature extraction, features are selected using the attribute selection filter. In the work Best First search method was used.

Classification-
The features are then input to the random forest classifier for classifying them into the 4 stages

Result-
The averag precision and recall values obtained was 0.908 and 0.912 for the random forest classifier.

CLASS
NORMA L

Future work -
It was proposed in the work that by using more number of images from heterogeneous databases along with better pre-processing techniques can be employed for better accuracy.

Automated Identification of Diabetic Retinopathy Using Deep Learning
In this paper we see that it develops a robust diagnostic technology for the detection of DR. The authors have proposed to develop deep learning algorithm which will process color fundus images and classify them into 2 categories, namely-DR and No DR.
For this they used supervised learning with decision tree classification model. They used the dataset with 75,137 fundus images publicly available. It contained images from all the stages of DR.
With the dataset, the images were pre-processed to make it fit for the model to be trained on. For this cropping was applied to the inner retinal circle. The images were scaled to 512x512 size.
Along with this many features were encoded like rotational invariance,contrast.
To train the models on features, feature extraction was done fom global average pooling layer. A total of 1024 features were got.
In the model as said earlier, they have used descision tree classification. They classified the images into 2 categories-DR and No DR. The reason given for choosing Decision tree classification was due to its speed of implementation and less overfitting.

Result-
Thier algorithm's result show the potential of automated feature-learning systems.

Future Work-
In this paper , the future work is implied to involve various other feature factors like, genetic factors, patient history, duration of diabetes, haemoglobin AIC value to produce a better accuracy result.

Algorithms for digital image processing in diabetic retinopathy
The authors used five different methods for processing the images of different patients obtained from a series of tests. The parameters on which it was tested are, pre-processing, localization, separation of the nerves, separation of the nerve, distribution of the nerve. They intended to set up a framework for better algorithm design.
Diabetic retinopathy is a major reason for blindness among people of working-class and age, it is also asymptomatic till the last stage of the process of deprecating vision. Due to which an annual test is organized by the UK NSC(National Screening Committee) as the population metrics are expected to increase so the classical method will not be viable for later in this century.
So using computer vision in place of the classical method turns out to be the best option, during the process of detecting the problem of diabetics, all the doctors look for small intraretinal dot haemorrhages, big blot haemorrhages, all of which are red cells and whitish cells, cells and fabric wool spots which are nerve fibre layer microinfarct.
Using these as the features of the machine learning model and training the model on the images with selected fields would produce the result much more accurately and allow the computer to learn some patterns hidden in the images, for further improvements.
The methods defined by the author, in the beginning, are used in the process of preparing the model and the dataset, images brightness is altered, noise is reduced, image segmentation is done for feature extraction. The complete report was the survey of almost 125 research papers, and the authors were successful in creating one complete framework for the classification of the images for recognition of Diabetic patient's eyes.
Pre-processing was checked on improving of non-uniform illumination, Colour normalization, contrast enhancement. Localization and segmentation of the optic disk were done on characteristics for the optic disk, optic disk localization, optic disk segmentation.
Localization of the macula and fovea was done on the characteristics of the macula and fovea. Localization and segmentation of retinopathy were done on microaneurysms and haemorrhages, and exudates/ cotton wool spots.
An ensemble-based system for automatic screening of diabetic retinopathy Diabetes have taken a shape of a common problem among people of several ages, more than 300 million people are affected by this problem, which further leads to partial blindness due to growth of cells over the eye and form a layer which blinds the sight.
In this report authors were able to help the patient for diabetic retinopathy using before testing the distance of macula centre and the optic disk centre as novel components.
Automatic grading of colour fundus images regarding Diabetic retinopathy was proposed. Ensembled based learning system was optimal for this use case and it was also used in image processing and decision making. All the features gets classified using an ensemble of classifiers which turns out to be very efficient for the task. Authors gave an ensemble-based automatic Diabetic Retinopathy analysis system. Similar to the SOTA methods, image level, lesion-specific and anatomical components, are used together. The validation of the methods are done on openly available datasets, Messidor, with an outstanding area under the ROC curve was achieved.
The system is very sensitive and highly accurate almost is comparable with large industry level model on basis of results and the amount of Data used.

Proposed International Clinical Diabetic Retinopathy and Diabetic Macular Edema Disease Severity Scales :
For the authors to create consensus for disease severity identifying systems for a diabetics retinopathy and diabetic cells which could be possibly used anywhere around the globe.
All of it was done by initial classification system, which was based on Early Treatment Diabetic Retinopathy Study. From this report a five stage disease severity identification for diabetic retinopathy. Diabetic macular edema is identified as most obviously present or absent.
If the training and tools allowed to check and validate the decision making process, macular edema is also seen as a function of its distance from the central macula. The process was genuinely in search for continuous international official identification systems for diabetic retinopathy and diabetic macular edema which were supported proper medical results and reports.
The system was successful to provide a mean of appropriately categorizing diabetics retinopathy by improving the screening of individual with diabetics and communication and discussion among individuals caring for the patients.

DREAM: Diabetic Retinopathy Analysis using Machine Learning
It is a computer dependent screening system, which analysis images of diabetic retinopathy using common Machine Learning algorithms, SVM(support vector machine), kNN(k-nearest neighbor), GMM(Gaussian Mixture Model), AdaBoost is also used for identifying the lesions and non-lesions in the images.
The authors were able to identify GMM and kNN as the best classifiers for red lesions selections. Using all this techniques finally the number of features required to identify very reduced to a much smaller amount. After selecting 30 out of 78 features non-lesions or false positives were rejected then in second step bright lesions were classified as hard exudates and cotton wool spots, and the red lesions are classified as hemorrhages and microaneurysms.
All of the lesion classification is related to the unbalanced data sets and SVM. The system for the problem was tested on more than 1200 images which were publicly available. The results are 100% sensitive, 53.16% specific and 0.904 AUC, compared to the best reported 96% sensitive, 51% specific and 0.875 AUC for classifying images as with or without the Diabetic Retinopathy.
When doing the test for Sen/Spec using two hierarchical step bright lesion classification on DIARBETDB1 Data Set. Authors were able to get all the statistics in the differentcategories which are discussed above and with varying number of features tested first with 5 they achieved kNN and SVM results to be best, then it was tested on 10, 15, 20, 25, 30 features sequentially with different-different algorithms, eventually when the average was calculated for the results, SVM had topped the outcome with max number of feature with 98% accuracy.
As there were two datasets, and both of them were tested and after the first test it was evident that SVM algorithm will give the best outcome, so the researchers combined two algorithm like combining SVM+kNN, SVM+GMM. The results were really drastically changed due to combined performance by two algorithms and they were also able to achieve good result on DIARETB1 dataset which was not possible earlier. After combination of these algorithm DIARETB1 dataset had an accuracy of prediction at 95.33%. The MESSIDOR dataset had 94% accuracy.
Finally the authors were successful in enabling their self-made DREAM system to use 30 feature out of 78 features and score them using Adaboost, it also required classification of bright and red lesion with AUC greater than 0.83 by help of algorithms such as k-nearest neighbour, Gaussian mixture modelling and Bayesian combination of probabilistic classifiers. It also optimized the time complexity of the code by almost a factor of 2 or 94% by feature reduction operation.

Diabetic Retinopathy Detection Using Prognosis of Microaneurysm and Early Diagnosis System for Non-Proliferative Diabetic Retinopathy Based on Deep Learning Algorithms
Diabetic Retinopathy is a result of high blood glucose level, which eventually causes microvascular complication and permanent vision loss. But if these issues could be identified earlier they can be prevented but the main issue with this is that it's asymptomatic and can't be identified until later stages. Due to many advancements in the field of computer vision it has become a better solution to medical problems. The system presented in the report was successful in analyzing the presence of microaneurysm in fundus image using convolution neural network algorithm that show deep learning a core component accelerated with the help of Graphical processing unit. It helped in reducing the latency and increasing the performance for medical image detection and segmentation. Semantics allowed to divide the images into separate pixels now on the common semantics of different images we identify the feature of microaneurysm. The report was able to represent analysis of Microaneurysm also the advance treatment systems capable of making Deep Convolution Neural Network for the continuous segmentation of fundus images that help in Nonproliferative diabetic retinopathy with high efficiency and accuracy. Introducing LOG and MF filter and using them in optimal way possible resulted into processing after the suggestion was made which only increased the accuracy. Also it helped in, use of changing curvelets resulted in a recognition of dark lesions. Data for non-Mas changes at large scale, the set of non-microaneurysm training considered to be a topic. The big data set not only consumes heavy resources but also a lot of capital used to allocate these resources. A open PCA based unregulated identification method for identifying clots was developed. As the model shows Microaneurysm have been developed, any changes from the standard Microaneurysm are detected by not changing monitoring, a scarce PCA is used to find the latent structure of microaneurysm data. The algorithm on which the results were test are followed: sparse principal component analysis based unsupervised classification approach.
Early Treatment Diabetic Retinopathy Study.
Deep Convolution Neural Network.
Multiscale AM-FM Methods for Diabetic Retinopathy Lesion Detection. Ensemble-based system.
Prognosis of Microaneurysm and early diagnosis system for non -proliferative diabetic retinopathy.

Our Proposed Method
Dataset Overview-For all supervised learning projects, the most important factor for the success of the project is the dataset. For the proposed paper we have used the dataset provided on kaggle website. In the compete setion , we found a competition of Diabetic Retinopathy Detection. It is a dataset of size 82.23GB.
It contains large set of high-resolution retina images along with left and right eye images for every subject. As stated above that the dataset was un-balanced, the following image shows the extent of this problem Graph 1-Graph sowing the imbalance nature of the dataset. It shows the amount of images in the 5 categories.
As we can see that category 0-NO DR has the most images compared to the rest of the categories.
So to overcome this problem we had to increase the number of images in the other categories. To achieve this the images were rotated and mirrored to create new images from the existing ones. This helps to create a balanced dataset for using with almost same number of images in each category.
After this we were able to get the dataset with almost balanced number of images in all the 5 categories. From the above sample images it can be claerly seen that the images were not of good quality. The images had black spots on them. Also as the all images not of being the same size, altered cropping was done to 256x256 to make them of equal size.
The images were blurred. So to get a less blurry image to restore the texture of the image, nonlocal means of denoising was applied. This helped to restore the texture by changing the values of the pixels to the mean average of the values of the neighbouring pixels.
Also the 3 channels of RGB images are extracted and Green Channel is seleccted as the anomalies are properly visible in this channel.
The resultant image looked like as below:- Clearly from this image we can see that the images have been processed and now can be used for training. All the noises in the images have been taken care of and the main features that will be used to train upon can clearly be used now.

Making the model
For making the model which would be trained on the images, we incorporated transfer learning technique.
Transfer learning is a new approach in deep learning where the models are pre-trained in 1 task is re-purposed to work for another task.
With them one of the biggest advantage being that there is no need for extra feature extraction step. This is because they are very deep neural networks where the initial layers act as feature extracter.
In our case we went on with the use of Incception V3 pre-trained neural network. It is a family of Inception neural networks where all the previous features of inception v1 and v2 are incroporates along with label smoothing, factorized 7x7 optimizer, RMSProp optimizer and BatchNorm.
Also in comparission with its counterparts like VGGNet, Inception networks work better and provide more computationally efficient both in the terms of paparmeters generated by the network and the economical cost incurred in terms of memory and other resources.
In this model we added furhter layers at the end to get the prediction into the 5 categories as we desired. After the model being compiled with trained it with our large dataset with 25 epooch and batch size of 16.

Results & Conclusion
With our model we came to the accuracy of 82%. The confusion matrix that we were able to achieve were We believe that the model that we made got good accuracy keeping in mind the large amount of dataset used upon wich it was tested.
We believe that the disease of DR can be cured if detected early. With our model we can provide accurate result as in which stage the DR is of.
We also came across various other techniques to develop the model while doing our Literature Survey which helped us gain insightful knowledge and also helped to make this model.

Future Work
In our work we had to work with a model where the dataset was very dis-balanced. To overcome this problem, we had to mirror and rotate the images to 90,180,270 degrees to increase the size of the classes which has comparatively fewer images.
The dataset used in this paper was very noisy. So with a better dataset with less noise can be used. Also some pre-processing techniques that can be employed to clear the datasets of the noises can help improve the accuracy significantly.
Also in the future we belive there to be instruments from where the stage of the disease can be classified instantaneously. It can be made by just incorpoating the scan of the images from the fundus cameras and testing the image with the Deep Learning Models to predict the levels. Though the prediction can never be 100% correct so with minimal human expertise and supervision the process of the disease detection can be speed up to meet the growing demands.