Natural Language Processing to Extract Meaningful Information from a Corpus of Written Knowledge in Breast Cancer: Transforming Books into Data

G. Catanuto, Nicola Rocco* (Corresponding Author), Konstantina Balafa, Yazan Masannat, Andreas Karakatsanis, Anna Maglia, Peter Barry, Francesco Pappalardo, Maurizio B. Nava, Francesco Caruso

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Introduction: Books and papers are the most relevant source of theoretical knowledge for medical education. New technologies of artificial intelligence can be designed to assist in selected educational tasks, such as reading a corpus made up of multiple documents and extracting relevant information in a quantitative way.
Methods: Thirty experts were selected transparently using an online public call on the website of the sponsor organization and on its social media. Six books edited or co-edited by members of this panel containing a general knowledge of breast cancer or specific surgical knowledge have been acquired. This collection was used by a team of computer scientists to train an artificial neural network based on a technique called Word2Vec.
Results: The corpus of six books contained about 2.2 billion words for 300d vectors. A few tests were performed. We evaluated cosine similarity between different words.
Discussion: This work represents an initial attempt to derive formal information from textual corpus. It can be used to perform an augmented reading of the corpus of knowledge available in books and papers as part of a discipline. This can generate new hypothesis and provide an actual estimate of their association within the expert opinions. Word embedding can also be a good tool when used in accruing narrative information from clinical notes, reports, etc., and produce prediction about outcomes. More work is expected in this promising field to generate “real-world evidence.”

Original languageEnglish
Pages (from-to)209-212
Number of pages4
JournalBreast Care
Volume18
Issue number3
Early online date10 May 2023
DOIs
Publication statusPublished - Jun 2023

Keywords

  • breast cancer
  • artificial intelligence
  • medical education
  • Medical education
  • Breast cancer
  • Artificial intelligence

Fingerprint

Dive into the research topics of 'Natural Language Processing to Extract Meaningful Information from a Corpus of Written Knowledge in Breast Cancer: Transforming Books into Data'. Together they form a unique fingerprint.

Cite this