EHR Unstructured Data Mining


This morning, Shahid Shah over at the The Healthcare IT Guy blog, published an article outlining why medical device data is the best way to fill meaningful use EHRs and conduct comparative effectiveness research (CER). What was of particular interest to me is the way in which Shahid elegantly broke down how unstructured and structured data is “sourced” today (scroll down in the blog article for the graphic).

As is evident by the table above, many of the existing MU incentives in Phase 1 (patient reported and healthcare professional entered especially) promote the wrong kinds of collection: unreliable, slow, and error prone. Accurate, real-time, data is only available from connected medical devices and labs / diagnostics equipment.

Given that meaningful Use and CER advocates are promoting (structured) data collection for reduction of medical errors, analysis of treatments and procedures, and research for new methods it’s important to see that we’re not going to get real gains until the medical device vendors are fully connected and providing data directly into EHRs or clinical data warehouses.

Shahid’s article brings to light a larger issue within the industry – a lot of meaningful data is captured in an unstructured fashion. Dr. John Halamka brought this to light in a blog article earlier in the year which addressed “Freeing the data.” In this article, Dr. Halamka suggests that businesses will always have a combination of structured and unstructured data and that businesses must find ways to leverage this unstructured data:

In healthcare, the HITECH Act/Meaningful Use requires that clinicians document the smoking status of 50% of their patients.   In the past, many EHRs did not have structured data elements to support this activity.    Today’s certified EHRs provided structured vocabularies and specific pulldowns/checkboxes for data entry, but what do we do about past data?   Ideally, we’d use natural language processing, probability, and search to examine unstructured text in the patient record and figure out smoking status including the context of the word smoking such as “former”, “active”, “heavy”, “never” etc.

The value of unstructured patient narratives was addressed in detail in one of last year’s Health Management Technology articles – specifically the section which addressed Mining unstructured data:

As EHRs become increasingly widespread due to the billions of dollars in federal stimulus incentives, harnessing unstructured clinicians’ notes gives us the power to yield valuable patient data. With each year of data, more information will be gathered that could be used to find predictors for diseases or adverse effects of treatment that would otherwise have gone unnoticed by most traditional research studies. Though challenging, capturing and delving into this data will be worth the effort, and could potentially help healthcare institutions meet requirements for CMS reporting and for meaningful use, access funding and, most importantly, improve the health of entire populations.

At Galen, we have developed a solution that addresses current limitations with regards to extraction of structured note data within built-in Allscripts Enterprise EHR functionality. Galen’s NoteXML solution is designed to facilitate the querying of data contained within Allscripts Enterprise EHR v11 Structured Notes. These notes are not stored inside the EEHR as discrete data, but rather as XML documents that aren’t easily query-able. The solution has helped our clients extract pertinent MU reportable data that otherwise would not be discretely available.

Again, the aforementioned solution does not facilitate data mining of unstructured note data. However, companies such as Nuance are engaged in “‘unlocking’ unstructured clinical documentation, sometimes referred to as the ‘narrative blob'” Nuance’s NLP solutions assist in collecting and reporting on various diagnostic, quality and safety measures. I have yet to see this integrate directly to the Allscripts product line, but anticipate this possibility in the future months.

I’m curious as to how other groups and organizations are addressing the gap between unstructured data capture and discrete data extraction for MU and quality reporting? Are organizations relying on third-party solutions such as that offered by Nuance?

Facebook Twitter Email

+ There are no comments

Add yours