Natural Language Processing for clinical research

April 30, 2021

Currently there are many hospitals, both nationally and internationally, that have an Electronic Health Record (EHR, Electronic Health Record) system, in which the set of documents, both written and graphic, that refer to to the episodes of health and illness of a person, and the health activity that is generated as a result of these episodes 1.

Although there is a lot of information that is collected in the EHR, there is also a lot of information that is in free and unstructured text format: clinical course, observations and notes, discharge reports, diagnostic test reports, surgical report, etc. According to the report “Cognitive computing and the future of health care (2017)” 2 , in 2020, every 73 days the amount of medical data will double, of which it is estimated 80% are unstructured.

This brings us to a paradox. Have an information system with a very large volume of data, but at the same time with a much smaller volume of data for exploitation and analysis.

On the other hand, it is common for clinical studies to require manual analysis of medical records to search for and identify information that is not structured (and which, as we have already commented, is a lot). Let’s see it with a small example; It is probable that, in some cases, when the discharge report is delivered to a patient who comes to the emergency room, the diagnosis at discharge from the visit will be coded in it, but what is often not coded almost anywhere system are the symptoms with which the patient went to the emergency room (fever, muscle pain, skin irritation, dizziness, etc.); This is where, if the patient’s symptoms are relevant to the study,

Furthermore, the narrative text is difficult to access reliably because the variety of expressions is enormous; many different words can be used to denote a single concept and a huge variety of grammatical structures can be used to convey equivalent information 3 .

If to all this we add another concept so basic, but so complex at the same time, such as temporality, things get quite complicated. In the previous example, where we were talking about the patient’s symptoms, we can denote that “the patient has a fever”, “the patient claims to have had a fever 2 days ago” or “the patient had a fever” is not the same.

It is in these cases, in which the techniques of NLP (Natural Language Processing) and AI (Artificial Intelligence) can significantly help the search and identification of information that is not initially structured in the EHR. An advantage that the application of the NLP may have is that the researcher does not have to interpret the texts and it is the algorithms that, by learning with the data, generate results 4 .

At IOMED for several years now, an important group of data analysts, artificial intelligence experts, doctors and healthcare professionals, are working hand in hand to make the task of navigating within the unstructured data of the medical record become a reality, and in this way convert all that information that until now was not exploitable, into structured information to be able to carry out studies and analysis of it.

Today, and thanks to IOMED technology, researchers already have a tool that allows them to make the most of their time and in studies and trials, focusing on their own research and drastically reducing the time they must invest in browsing. through medical records looking for the information they need.

With IOMED, ​​the future of clinical research is today.

[1] Carnicero, J.: De la Historia Clínica a la Historia de Salud Electrónica. V Informe Seis 2003. SEIS. 2003.

[2] Kesey O’neil y Dawson Friedland Mohamed Nooman Ahmed, Andeep S. Toor: Cognitive computing and the future of health care. 2017.

[3] Carol Friedman and Stephen B. Johnson: Natural Language and Text Processing in Biomedicine. pp 312. 2006

[4] José Vicente Sancho Escrivá: Revista de Comunicación y Salud, Vol. 10, nº 1, pp. 22. 2020

Image Description

Antoni Mallol

Hospital Engagement Manager