Towards Real-Time Detection of Squamous PreCancers from Oesophageal Endoscopic Videos

Xiaohong Gao, Barbara Braden, Stephen Taylor, Wei Pang

Research output: Contribution to conferencePaper

Abstract

This study investigates the feasibility of applying
state of the art deep learning techniques to detect precancerous
stages of squamous cell carcinoma (SCC) cancer in real time to
address the challenges while diagnosing SCC with subtle
appearance changes as well as video processing speed. Two deep
learning models are implemented, which are to determine
artefact of video frames and to detect, segment and classify those
no-artefact frames respectively. For detection of SCC, both
mask-RCNN and YOLOv3 architectures are implemented. In
addition, in order to ascertain one bounding box being detected
for one region of interest instead of multiple duplicated boxes, a
faster non-maxima suppression technique (NMS) is applied on
top of predictions. As a result, this developed system can process
videos at 16-20 frames per second. Three classes are classified,
which are ‘suspicious’, ‘high grade’ and ‘cancer’ of SCC. With
the resolution of 1920x1080 pixels of videos, the average
processing time while apply YOLOv3 is in the range of 0.064-
0.101 seconds per frame, i.e. 10-15 frames per second, while
running under Windows 10 operating system with 1 GPU
(GeForce GTX 1060). The averaged accuracies for classification
and detection are 85% and 74% respectively. Since YOLOv3
only provides bounding boxes, to delineate lesioned regions,
mask-RCNN is also evaluated. While better detection result is
achieved with 77% accuracy, the classification accuracy is
similar to that by YOLOYv3 with 84%. However, the processing
speed is more than 10 times slower with an average of 1.2 second
per frame due to creation of masks. The accuracy of
segmentation by mask-RCNN is 63%. These results are based
on the date sets of 350 images. Further improvement is hence in
need in the future by collecting, annotating or augmenting more
datasets.
Original languageEnglish
Publication statusAccepted/In press - 14 Oct 2019
EventThe Eighteenth International Conference on Machine Learning and Applications - Boca Raton, United States
Duration: 16 Dec 201919 Dec 2019

Conference

ConferenceThe Eighteenth International Conference on Machine Learning and Applications
CountryUnited States
CityBoca Raton
Period16/12/1919/12/19

Fingerprint

Masks
Pixels
Epithelial Cells
Processing
Deep learning
Graphics processing unit

Keywords

  • oesophagus endoscopy
  • pre-cancer detection
  • deep learning
  • segmentation
  • real-time video processing

Cite this

Gao, X., Braden, B., Taylor, S., & Pang, W. (Accepted/In press). Towards Real-Time Detection of Squamous PreCancers from Oesophageal Endoscopic Videos. Paper presented at The Eighteenth International Conference on Machine Learning and Applications, Boca Raton, United States.

Towards Real-Time Detection of Squamous PreCancers from Oesophageal Endoscopic Videos. / Gao, Xiaohong; Braden, Barbara ; Taylor, Stephen ; Pang, Wei.

2019. Paper presented at The Eighteenth International Conference on Machine Learning and Applications, Boca Raton, United States.

Research output: Contribution to conferencePaper

Gao, X, Braden, B, Taylor, S & Pang, W 2019, 'Towards Real-Time Detection of Squamous PreCancers from Oesophageal Endoscopic Videos' Paper presented at The Eighteenth International Conference on Machine Learning and Applications, Boca Raton, United States, 16/12/19 - 19/12/19, .
Gao X, Braden B, Taylor S, Pang W. Towards Real-Time Detection of Squamous PreCancers from Oesophageal Endoscopic Videos. 2019. Paper presented at The Eighteenth International Conference on Machine Learning and Applications, Boca Raton, United States.
Gao, Xiaohong ; Braden, Barbara ; Taylor, Stephen ; Pang, Wei. / Towards Real-Time Detection of Squamous PreCancers from Oesophageal Endoscopic Videos. Paper presented at The Eighteenth International Conference on Machine Learning and Applications, Boca Raton, United States.
@conference{6ed31a0bc64542348678501cb149c424,
title = "Towards Real-Time Detection of Squamous PreCancers from Oesophageal Endoscopic Videos",
abstract = "This study investigates the feasibility of applyingstate of the art deep learning techniques to detect precancerousstages of squamous cell carcinoma (SCC) cancer in real time toaddress the challenges while diagnosing SCC with subtleappearance changes as well as video processing speed. Two deeplearning models are implemented, which are to determineartefact of video frames and to detect, segment and classify thoseno-artefact frames respectively. For detection of SCC, bothmask-RCNN and YOLOv3 architectures are implemented. Inaddition, in order to ascertain one bounding box being detectedfor one region of interest instead of multiple duplicated boxes, afaster non-maxima suppression technique (NMS) is applied ontop of predictions. As a result, this developed system can processvideos at 16-20 frames per second. Three classes are classified,which are ‘suspicious’, ‘high grade’ and ‘cancer’ of SCC. Withthe resolution of 1920x1080 pixels of videos, the averageprocessing time while apply YOLOv3 is in the range of 0.064-0.101 seconds per frame, i.e. 10-15 frames per second, whilerunning under Windows 10 operating system with 1 GPU(GeForce GTX 1060). The averaged accuracies for classificationand detection are 85{\%} and 74{\%} respectively. Since YOLOv3only provides bounding boxes, to delineate lesioned regions,mask-RCNN is also evaluated. While better detection result isachieved with 77{\%} accuracy, the classification accuracy issimilar to that by YOLOYv3 with 84{\%}. However, the processingspeed is more than 10 times slower with an average of 1.2 secondper frame due to creation of masks. The accuracy ofsegmentation by mask-RCNN is 63{\%}. These results are basedon the date sets of 350 images. Further improvement is hence inneed in the future by collecting, annotating or augmenting moredatasets.",
keywords = "oesophagus endoscopy, pre-cancer detection, deep learning, segmentation, real-time video processing",
author = "Xiaohong Gao and Barbara Braden and Stephen Taylor and Wei Pang",
note = "This project is funded by the Cancer Research UK (CRUK). Their financial support is gratefully acknowledged. **Make AAM open on publication**; The Eighteenth International Conference on Machine Learning and Applications ; Conference date: 16-12-2019 Through 19-12-2019",
year = "2019",
month = "10",
day = "14",
language = "English",

}

TY - CONF

T1 - Towards Real-Time Detection of Squamous PreCancers from Oesophageal Endoscopic Videos

AU - Gao, Xiaohong

AU - Braden, Barbara

AU - Taylor, Stephen

AU - Pang, Wei

N1 - This project is funded by the Cancer Research UK (CRUK). Their financial support is gratefully acknowledged. **Make AAM open on publication**

PY - 2019/10/14

Y1 - 2019/10/14

N2 - This study investigates the feasibility of applyingstate of the art deep learning techniques to detect precancerousstages of squamous cell carcinoma (SCC) cancer in real time toaddress the challenges while diagnosing SCC with subtleappearance changes as well as video processing speed. Two deeplearning models are implemented, which are to determineartefact of video frames and to detect, segment and classify thoseno-artefact frames respectively. For detection of SCC, bothmask-RCNN and YOLOv3 architectures are implemented. Inaddition, in order to ascertain one bounding box being detectedfor one region of interest instead of multiple duplicated boxes, afaster non-maxima suppression technique (NMS) is applied ontop of predictions. As a result, this developed system can processvideos at 16-20 frames per second. Three classes are classified,which are ‘suspicious’, ‘high grade’ and ‘cancer’ of SCC. Withthe resolution of 1920x1080 pixels of videos, the averageprocessing time while apply YOLOv3 is in the range of 0.064-0.101 seconds per frame, i.e. 10-15 frames per second, whilerunning under Windows 10 operating system with 1 GPU(GeForce GTX 1060). The averaged accuracies for classificationand detection are 85% and 74% respectively. Since YOLOv3only provides bounding boxes, to delineate lesioned regions,mask-RCNN is also evaluated. While better detection result isachieved with 77% accuracy, the classification accuracy issimilar to that by YOLOYv3 with 84%. However, the processingspeed is more than 10 times slower with an average of 1.2 secondper frame due to creation of masks. The accuracy ofsegmentation by mask-RCNN is 63%. These results are basedon the date sets of 350 images. Further improvement is hence inneed in the future by collecting, annotating or augmenting moredatasets.

AB - This study investigates the feasibility of applyingstate of the art deep learning techniques to detect precancerousstages of squamous cell carcinoma (SCC) cancer in real time toaddress the challenges while diagnosing SCC with subtleappearance changes as well as video processing speed. Two deeplearning models are implemented, which are to determineartefact of video frames and to detect, segment and classify thoseno-artefact frames respectively. For detection of SCC, bothmask-RCNN and YOLOv3 architectures are implemented. Inaddition, in order to ascertain one bounding box being detectedfor one region of interest instead of multiple duplicated boxes, afaster non-maxima suppression technique (NMS) is applied ontop of predictions. As a result, this developed system can processvideos at 16-20 frames per second. Three classes are classified,which are ‘suspicious’, ‘high grade’ and ‘cancer’ of SCC. Withthe resolution of 1920x1080 pixels of videos, the averageprocessing time while apply YOLOv3 is in the range of 0.064-0.101 seconds per frame, i.e. 10-15 frames per second, whilerunning under Windows 10 operating system with 1 GPU(GeForce GTX 1060). The averaged accuracies for classificationand detection are 85% and 74% respectively. Since YOLOv3only provides bounding boxes, to delineate lesioned regions,mask-RCNN is also evaluated. While better detection result isachieved with 77% accuracy, the classification accuracy issimilar to that by YOLOYv3 with 84%. However, the processingspeed is more than 10 times slower with an average of 1.2 secondper frame due to creation of masks. The accuracy ofsegmentation by mask-RCNN is 63%. These results are basedon the date sets of 350 images. Further improvement is hence inneed in the future by collecting, annotating or augmenting moredatasets.

KW - oesophagus endoscopy

KW - pre-cancer detection

KW - deep learning

KW - segmentation

KW - real-time video processing

M3 - Paper

ER -