Abstract
This study investigates the feasibility of applying
state of the art deep learning techniques to detect precancerous
stages of squamous cell carcinoma (SCC) cancer in real time to
address the challenges while diagnosing SCC with subtle
appearance changes as well as video processing speed. Two deep
learning models are implemented, which are to determine
artefact of video frames and to detect, segment and classify those
no-artefact frames respectively. For detection of SCC, both
mask-RCNN and YOLOv3 architectures are implemented. In
addition, in order to ascertain one bounding box being detected
for one region of interest instead of multiple duplicated boxes, a
faster non-maxima suppression technique (NMS) is applied on
top of predictions. As a result, this developed system can process
videos at 16-20 frames per second. Three classes are classified,
which are ‘suspicious’, ‘high grade’ and ‘cancer’ of SCC. With
the resolution of 1920x1080 pixels of videos, the average
processing time while apply YOLOv3 is in the range of 0.064-
0.101 seconds per frame, i.e. 10-15 frames per second, while
running under Windows 10 operating system with 1 GPU
(GeForce GTX 1060). The averaged accuracies for classification
and detection are 85% and 74% respectively. Since YOLOv3
only provides bounding boxes, to delineate lesioned regions,
mask-RCNN is also evaluated. While better detection result is
achieved with 77% accuracy, the classification accuracy is
similar to that by YOLOYv3 with 84%. However, the processing
speed is more than 10 times slower with an average of 1.2 second
per frame due to creation of masks. The accuracy of
segmentation by mask-RCNN is 63%. These results are based
on the date sets of 350 images. Further improvement is hence in
need in the future by collecting, annotating or augmenting more
datasets.
state of the art deep learning techniques to detect precancerous
stages of squamous cell carcinoma (SCC) cancer in real time to
address the challenges while diagnosing SCC with subtle
appearance changes as well as video processing speed. Two deep
learning models are implemented, which are to determine
artefact of video frames and to detect, segment and classify those
no-artefact frames respectively. For detection of SCC, both
mask-RCNN and YOLOv3 architectures are implemented. In
addition, in order to ascertain one bounding box being detected
for one region of interest instead of multiple duplicated boxes, a
faster non-maxima suppression technique (NMS) is applied on
top of predictions. As a result, this developed system can process
videos at 16-20 frames per second. Three classes are classified,
which are ‘suspicious’, ‘high grade’ and ‘cancer’ of SCC. With
the resolution of 1920x1080 pixels of videos, the average
processing time while apply YOLOv3 is in the range of 0.064-
0.101 seconds per frame, i.e. 10-15 frames per second, while
running under Windows 10 operating system with 1 GPU
(GeForce GTX 1060). The averaged accuracies for classification
and detection are 85% and 74% respectively. Since YOLOv3
only provides bounding boxes, to delineate lesioned regions,
mask-RCNN is also evaluated. While better detection result is
achieved with 77% accuracy, the classification accuracy is
similar to that by YOLOYv3 with 84%. However, the processing
speed is more than 10 times slower with an average of 1.2 second
per frame due to creation of masks. The accuracy of
segmentation by mask-RCNN is 63%. These results are based
on the date sets of 350 images. Further improvement is hence in
need in the future by collecting, annotating or augmenting more
datasets.
Original language | English |
---|---|
Publication status | Accepted/In press - 14 Oct 2019 |
Event | The Eighteenth International Conference on Machine Learning and Applications - Boca Raton, United States Duration: 16 Dec 2019 → 19 Dec 2019 |
Conference
Conference | The Eighteenth International Conference on Machine Learning and Applications |
---|---|
Country/Territory | United States |
City | Boca Raton |
Period | 16/12/19 → 19/12/19 |
Keywords
- oesophagus endoscopy
- pre-cancer detection
- deep learning
- segmentation
- real-time video processing