Legacy Data Conversion: Fuzzy Patient Matching to the EHR

One of the many challenges in interfacing to the Electronic Healthcare Record is patient identification and matching. Results and documents from outside systems need to link to the correct patient record. This is especially profound in data conversion initiatives. Given the scenario of an organization converting to utilize an EHR, aside from the plethora of documents being scanned in and associated with the chart as well as “bulk loads”  from the practice management system, there are could also be several data silos which need to feed data into the EHR.

We encountered one such scenario with one of our clients. Our client had been processing and loading flat-files from its legacy systems into the EHR. The client loaded approximately 15 years of legacy data (equating to millions of records). In the import process, the client had followed a strict patient matching criteria and received a patient matching error rate of approximately 5% which may be considered a reasonable matching rate.

However, the client’s help desk was getting a multitude of calls reporting missing legacy system records in the EHR (suspected to be in the 5% that did not make the conversion). The issue working against the client was a drop-dead date upon which these legacy systems were being deprecated and thus the clinicians would no-longer have a “fall-back” plan to access the records – the repercussions of which were potential patient care issues.

As such, Galen was engaged to analyze the records that did not meet the strict patient matching criteria , determine which records could be successfully loaded to the EHR under relaxed patient matching rules, and describe the impact of relaxing the patient matching. In the analysis that followed, it was recognized that in the data set that erred due to patient matching errors, identifier fields (namely first name, last name, DOB, MRN, Other Number1 and Other Number2) exhibited typos and inconsistencies. Enter Microsoft SQL Server Integration Studio’s (SSIS) Fuzzy Lookup Transformation. For those unfamiliar with fuzzy logic, it is “the process of reaching conclusions based on information and facts that are not 100 percent certain.”

SSIS Fuzzy Lookup

The underlying algorithm to the Fuzzy Matching transformation is the SOUNDEX function:

• In the late nineteenth century, United States census officials faced a dilemma. During the process of counting the huddled masses, our public servants created a huge paperwork trail that the law required them to preserve for future historians. With amazing forethought, they realized that people searching for records might not know the exact spelling of their ancestor’s name. Was it Smith or Smythe? Chapple, Chapel or Chapelle?

• To ease these searches, census officials turned to the Soundex phonetic filing system. This system uses a simple phonetic algorithm to reduce each name to a four character alphanumeric code. The first letter of the code corresponds to the first letter of the last name. The remainder of the code consists of three digits derived from the syllables of the word.

• Largely unused outside of the halls of government and genealogy, the Soundex system is making a comeback in modern databases. Database developers have long struggled with the problem of matching words that might not look alike, but actually sound alike.

Thus to reclaim some of the records that erred in matching to a patient chart in the EHR, the Fuzzy Matching transformation was utilized. Flat-files output from legacy data silos were input, pre-processed and then fed to the transformation. Given previous studies, the matching criterion utilized was as follows:  Match on LastName and FirstName Similarity Threshold >.8 AND DOB matches exactly AND one of three (MRN, OtherNumber, OtherNumber2) cross-referenced match exactly. The end result was reclamation of close to 25% of those legacy system patient records that originally failed patient matching.

If your organization is looking for assistance in data conversion, please contact sales@galenhealthcare.com and visit our website for more information regarding our technical service offerings.

Facebook Twitter Email


Add yours
  1. 1
    John Lynn

    Very cool stuff. I love the concept of Fuzzy Lookup. A lot of stuff can be improved using this type of process.

    Data conversion is such a pain in the butt though. Sadly most EMR systems make it even harder than it should be.

  2. 2
    Justin Campbell

    John – thanks for the comment. As a side note, we also assisted the same client in another issue with this legacy conversion. The flat-file export from the legacy system included a process in which a manual operator would go through a series of menus, selecting options for format of the file output. Inconsistencies in the process used by these operators led to unformatted textual result, thus causing the text to render as a blog after import into the EHR. We were able to write a script to update these records inserting carriage return and line-feeds, keying off of labels (Patient Name, Accession Number, etc) to properly format the results. Just a seemingly small programmatical update improved the perception of the client’s IT staff with end-users and clinicians.

+ Leave a Comment