Towards Real-Time Detection of Squamous PreCancers from Oesophageal Endoscopic Videos

Xiaohong Gao, Barbara Braden, Stephen Taylor, Wei Pang

Research output: Contribution to conferenceUnpublished paperpeer-review

1 Downloads (Pure)


This study investigates the feasibility of applying
state of the art deep learning techniques to detect precancerous
stages of squamous cell carcinoma (SCC) cancer in real time to
address the challenges while diagnosing SCC with subtle
appearance changes as well as video processing speed. Two deep
learning models are implemented, which are to determine
artefact of video frames and to detect, segment and classify those
no-artefact frames respectively. For detection of SCC, both
mask-RCNN and YOLOv3 architectures are implemented. In
addition, in order to ascertain one bounding box being detected
for one region of interest instead of multiple duplicated boxes, a
faster non-maxima suppression technique (NMS) is applied on
top of predictions. As a result, this developed system can process
videos at 16-20 frames per second. Three classes are classified,
which are ‘suspicious’, ‘high grade’ and ‘cancer’ of SCC. With
the resolution of 1920x1080 pixels of videos, the average
processing time while apply YOLOv3 is in the range of 0.064-
0.101 seconds per frame, i.e. 10-15 frames per second, while
running under Windows 10 operating system with 1 GPU
(GeForce GTX 1060). The averaged accuracies for classification
and detection are 85% and 74% respectively. Since YOLOv3
only provides bounding boxes, to delineate lesioned regions,
mask-RCNN is also evaluated. While better detection result is
achieved with 77% accuracy, the classification accuracy is
similar to that by YOLOYv3 with 84%. However, the processing
speed is more than 10 times slower with an average of 1.2 second
per frame due to creation of masks. The accuracy of
segmentation by mask-RCNN is 63%. These results are based
on the date sets of 350 images. Further improvement is hence in
need in the future by collecting, annotating or augmenting more
Original languageEnglish
Publication statusAccepted/In press - 14 Oct 2019
EventThe Eighteenth International Conference on Machine Learning and Applications - Boca Raton, United States
Duration: 16 Dec 201919 Dec 2019


ConferenceThe Eighteenth International Conference on Machine Learning and Applications
Country/TerritoryUnited States
CityBoca Raton


  • oesophagus endoscopy
  • pre-cancer detection
  • deep learning
  • segmentation
  • real-time video processing


Dive into the research topics of 'Towards Real-Time Detection of Squamous PreCancers from Oesophageal Endoscopic Videos'. Together they form a unique fingerprint.

Cite this