Towards Real-Time Detection of Squamous PreCancers from Oesophageal Endoscopic Videos

Xiaohong Gao; Barbara  Braden; Stephen  Taylor; Wei Pang

Towards Real-Time Detection of Squamous PreCancers from Oesophageal Endoscopic Videos

Xiaohong Gao, Barbara Braden, Stephen Taylor, Wei Pang

Research output: Contribution to conference › Unpublished paper › peer-review

3 Downloads (Pure)

Abstract

This study investigates the feasibility of applying
state of the art deep learning techniques to detect precancerous
stages of squamous cell carcinoma (SCC) cancer in real time to
address the challenges while diagnosing SCC with subtle
appearance changes as well as video processing speed. Two deep
learning models are implemented, which are to determine
artefact of video frames and to detect, segment and classify those
no-artefact frames respectively. For detection of SCC, both
mask-RCNN and YOLOv3 architectures are implemented. In
addition, in order to ascertain one bounding box being detected
for one region of interest instead of multiple duplicated boxes, a
faster non-maxima suppression technique (NMS) is applied on
top of predictions. As a result, this developed system can process
videos at 16-20 frames per second. Three classes are classified,
which are ‘suspicious’, ‘high grade’ and ‘cancer’ of SCC. With
the resolution of 1920x1080 pixels of videos, the average
processing time while apply YOLOv3 is in the range of 0.064-
0.101 seconds per frame, i.e. 10-15 frames per second, while
running under Windows 10 operating system with 1 GPU
(GeForce GTX 1060). The averaged accuracies for classification
and detection are 85% and 74% respectively. Since YOLOv3
only provides bounding boxes, to delineate lesioned regions,
mask-RCNN is also evaluated. While better detection result is
achieved with 77% accuracy, the classification accuracy is
similar to that by YOLOYv3 with 84%. However, the processing
speed is more than 10 times slower with an average of 1.2 second
per frame due to creation of masks. The accuracy of
segmentation by mask-RCNN is 63%. These results are based
on the date sets of 350 images. Further improvement is hence in
need in the future by collecting, annotating or augmenting more
datasets.

Original language	English
Publication status	Accepted/In press - 14 Oct 2019
Event	The Eighteenth International Conference on Machine Learning and Applications - Boca Raton, United States Duration: 16 Dec 2019 → 19 Dec 2019

Conference

Conference	The Eighteenth International Conference on Machine Learning and Applications
Country/Territory	United States
City	Boca Raton
Period	16/12/19 → 19/12/19

Bibliographical note

This project is funded by the Cancer Research UK (CRUK). Their financial support is gratefully acknowledged.

**Make AAM open on publication**

Keywords

oesophagus endoscopy
pre-cancer detection
deep learning
segmentation
real-time video processing

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

Conference_Paper_AAM
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Accepted author manuscript, 823 KBLicence: Other

Cite this

@conference{6ed31a0bc64542348678501cb149c424,

title = "Towards Real-Time Detection of Squamous PreCancers from Oesophageal Endoscopic Videos",

abstract = "This study investigates the feasibility of applyingstate of the art deep learning techniques to detect precancerousstages of squamous cell carcinoma (SCC) cancer in real time toaddress the challenges while diagnosing SCC with subtleappearance changes as well as video processing speed. Two deeplearning models are implemented, which are to determineartefact of video frames and to detect, segment and classify thoseno-artefact frames respectively. For detection of SCC, bothmask-RCNN and YOLOv3 architectures are implemented. Inaddition, in order to ascertain one bounding box being detectedfor one region of interest instead of multiple duplicated boxes, afaster non-maxima suppression technique (NMS) is applied ontop of predictions. As a result, this developed system can processvideos at 16-20 frames per second. Three classes are classified,which are {\textquoteleft}suspicious{\textquoteright}, {\textquoteleft}high grade{\textquoteright} and {\textquoteleft}cancer{\textquoteright} of SCC. Withthe resolution of 1920x1080 pixels of videos, the averageprocessing time while apply YOLOv3 is in the range of 0.064-0.101 seconds per frame, i.e. 10-15 frames per second, whilerunning under Windows 10 operating system with 1 GPU(GeForce GTX 1060). The averaged accuracies for classificationand detection are 85% and 74% respectively. Since YOLOv3only provides bounding boxes, to delineate lesioned regions,mask-RCNN is also evaluated. While better detection result isachieved with 77% accuracy, the classification accuracy issimilar to that by YOLOYv3 with 84%. However, the processingspeed is more than 10 times slower with an average of 1.2 secondper frame due to creation of masks. The accuracy ofsegmentation by mask-RCNN is 63%. These results are basedon the date sets of 350 images. Further improvement is hence inneed in the future by collecting, annotating or augmenting moredatasets.",

keywords = "oesophagus endoscopy, pre-cancer detection, deep learning, segmentation, real-time video processing",

author = "Xiaohong Gao and Barbara Braden and Stephen Taylor and Wei Pang",

note = "This project is funded by the Cancer Research UK (CRUK). Their financial support is gratefully acknowledged. **Make AAM open on publication**; The Eighteenth International Conference on Machine Learning and Applications ; Conference date: 16-12-2019 Through 19-12-2019",

year = "2019",

month = oct,

day = "14",

language = "English",

}

TY - CONF

T1 - Towards Real-Time Detection of Squamous PreCancers from Oesophageal Endoscopic Videos

AU - Gao, Xiaohong

AU - Braden, Barbara

AU - Taylor, Stephen

AU - Pang, Wei

N1 - This project is funded by the Cancer Research UK (CRUK). Their financial support is gratefully acknowledged. **Make AAM open on publication**

PY - 2019/10/14

Y1 - 2019/10/14

N2 - This study investigates the feasibility of applyingstate of the art deep learning techniques to detect precancerousstages of squamous cell carcinoma (SCC) cancer in real time toaddress the challenges while diagnosing SCC with subtleappearance changes as well as video processing speed. Two deeplearning models are implemented, which are to determineartefact of video frames and to detect, segment and classify thoseno-artefact frames respectively. For detection of SCC, bothmask-RCNN and YOLOv3 architectures are implemented. Inaddition, in order to ascertain one bounding box being detectedfor one region of interest instead of multiple duplicated boxes, afaster non-maxima suppression technique (NMS) is applied ontop of predictions. As a result, this developed system can processvideos at 16-20 frames per second. Three classes are classified,which are ‘suspicious’, ‘high grade’ and ‘cancer’ of SCC. Withthe resolution of 1920x1080 pixels of videos, the averageprocessing time while apply YOLOv3 is in the range of 0.064-0.101 seconds per frame, i.e. 10-15 frames per second, whilerunning under Windows 10 operating system with 1 GPU(GeForce GTX 1060). The averaged accuracies for classificationand detection are 85% and 74% respectively. Since YOLOv3only provides bounding boxes, to delineate lesioned regions,mask-RCNN is also evaluated. While better detection result isachieved with 77% accuracy, the classification accuracy issimilar to that by YOLOYv3 with 84%. However, the processingspeed is more than 10 times slower with an average of 1.2 secondper frame due to creation of masks. The accuracy ofsegmentation by mask-RCNN is 63%. These results are basedon the date sets of 350 images. Further improvement is hence inneed in the future by collecting, annotating or augmenting moredatasets.

AB - This study investigates the feasibility of applyingstate of the art deep learning techniques to detect precancerousstages of squamous cell carcinoma (SCC) cancer in real time toaddress the challenges while diagnosing SCC with subtleappearance changes as well as video processing speed. Two deeplearning models are implemented, which are to determineartefact of video frames and to detect, segment and classify thoseno-artefact frames respectively. For detection of SCC, bothmask-RCNN and YOLOv3 architectures are implemented. Inaddition, in order to ascertain one bounding box being detectedfor one region of interest instead of multiple duplicated boxes, afaster non-maxima suppression technique (NMS) is applied ontop of predictions. As a result, this developed system can processvideos at 16-20 frames per second. Three classes are classified,which are ‘suspicious’, ‘high grade’ and ‘cancer’ of SCC. Withthe resolution of 1920x1080 pixels of videos, the averageprocessing time while apply YOLOv3 is in the range of 0.064-0.101 seconds per frame, i.e. 10-15 frames per second, whilerunning under Windows 10 operating system with 1 GPU(GeForce GTX 1060). The averaged accuracies for classificationand detection are 85% and 74% respectively. Since YOLOv3only provides bounding boxes, to delineate lesioned regions,mask-RCNN is also evaluated. While better detection result isachieved with 77% accuracy, the classification accuracy issimilar to that by YOLOYv3 with 84%. However, the processingspeed is more than 10 times slower with an average of 1.2 secondper frame due to creation of masks. The accuracy ofsegmentation by mask-RCNN is 63%. These results are basedon the date sets of 350 images. Further improvement is hence inneed in the future by collecting, annotating or augmenting moredatasets.

KW - oesophagus endoscopy

KW - pre-cancer detection

KW - deep learning

KW - segmentation

KW - real-time video processing

M3 - Unpublished paper

T2 - The Eighteenth International Conference on Machine Learning and Applications

Y2 - 16 December 2019 through 19 December 2019

ER -

Towards Real-Time Detection of Squamous PreCancers from Oesophageal Endoscopic Videos

Abstract

Conference

Bibliographical note

Keywords

UN SDGs

Access to Document

Fingerprint

Cite this