Monday, August 28, 2017

GSoC 2017 - Final Report


Patient Matching 2.0



Student               -    Lahiru Jayathilake 
Primary Mentor   -    Burke Mamlin
Backup Mentor   -    Shaun Grannis

Project Wiki        -    Patient Matching 2.0
TALK Thread      -    OpenMRS TALK Thread Patient Matching 2.0
GitHub  Fork       -    Lahiru-J/patient-matching


What is Patient Matching 2.0

Patient Matching Module is an application where it tries to identify records that belong to the same patient among different number of data sources. This module is significant because in real world it has to be dealt with erroneous data. For example, a patient’s name can be misspelled. Benefit of this module is by linkage of records it would be easier for a patient to visit a hospital without carrying hard copies of test results whether or not the tests were done at the same hospital.

Summary of the Work

✔︎ Incremental Patient Matching

To give a little introduction on what incremental patient matching is, it is a method of identifying duplicate patients very efficiently. 
In a real world scenario, the requirement would be to match thousands of patients with each other. Once this matching process continues it would be very time consuming since all of the patients are compared with each of them regardless of the fact that two or three new patients are added/updated. Incremental patient matching is used as a suggestion to save time.

This task was my primary goal and I have successfully completed it.
Following are the commits & pull requests related to this task. Note that commits related to a branch are squashed according to the OpenMRS convention.

Commits

PTM-82 -  https://github.com/Lahiru-J/openmrs-module-patientmatching/commit/b3ffd77b394cf021b3b4552bda726100e39b6190  

PTM-83 - https://github.com/Lahiru-J/openmrs-module-patientmatching/commit/7704534457797845ecf6896072a3c441494d8299

PTM-84 - https://github.com/Lahiru-J/openmrs-module-patientmatching/commit/c81024533508ac91f57a48c9b41beb2ebdc74595

PTM-85 - https://github.com/Lahiru-J/openmrs-module-patientmatching/commit/35172a90f9ec061b9027d37f6757f53463ea1df6

PTM-86 - https://github.com/Lahiru-J/openmrs-module-patientmatching/commit/b624286e1ad362ecf283a942f6b4d4c1b9429b75

PTM-89 - https://github.com/Lahiru-J/openmrs-module-patientmatching/commit/be4f39153b976dc3493f7541ff3e590ba1ecb73c


Pull Requests

PTM-82: Functionality to load patients considering the date created and date changed

PTM-83: Save and update incremental patient matching report to the database

PTM-84: Functionality to support two datasources

PTM-85: Functionality to select or deselect incremental match

PTM-86 : Remove matching pairs from the report when patients are updated

PTM-89 : Ignore voided patients when running a patient match



✔︎ Merge Patients

If some of the patients in a group supposed to be the same then the user can merge those patients making they will not appear again in a patient matching report. 

Commit
PTM-87 - https://github.com/Lahiru-J/openmrs-module-patientmatching/commit/6ee57f19ef62eea642b18d1642c938cd2033f8ab

Pull Request
PTM-87 - Functionality to Merge Patients in the report



✔︎ Exclude Non-Matching Patients

There can be some scenarios where the module results some of patients to be same but in real life those patients are related to totally different people. If this happens Patient Matching 2.0 project provides a functionality to eliminate such records without them repeatedly appearing on a patient matching report.

Commit
PTM-88 -  https://github.com/Lahiru-J/openmrs-module-patientmatching/commit/b1f3fccf5151ee10f4d2282e62c88fa8b93dcca4

Pull Request
PTM-88 : Functionality to exclude non-matching patients


Blog Posts



Week
Blog Post
Week 12http://www.lahirujayathilake.com/2017/08/week-12-exclude-non-matching-patients.html
Week 11http://www.lahirujayathilake.com/2017/08/week-11-merge-patients.html
Week 10http://www.lahirujayathilake.com/2017/08/week-10-nice-report.html
Week 9http://www.lahirujayathilake.com/2017/07/week-9-incremental-patient-match.html
Week 8http://www.lahirujayathilake.com/2017/07/week-8-game-is-almost-done.html
Week 7http://www.lahirujayathilake.com/2017/07/week-7-game-of-codes.html
Week 6http://www.lahirujayathilake.com/2017/07/week-6-one-more-datasource.html
Week 5http://www.lahirujayathilake.com/2017/07/week-5-important-work.html
Week 4http://www.lahirujayathilake.com/2017/06/week-4-more-work.html
Week 3http://www.lahirujayathilake.com/2017/06/week-3-finally-some-relief.html
Week 2http://www.lahirujayathilake.com/2017/06/week-2-struggling-times.html
Week 1http://www.lahirujayathilake.com/2017/06/week-1-match-begins.html


OpenMRS

Throughout this summer I learned a lot. Whenever there were problems related to the project I had a great support from my mentor and the community. Participating in daily scrums showed me how good this OpenMRS community is. 
My mentors are the best. They gave me guidance at the first place, conducting many video conferences because of that I could carry out this project with a sound knowledge. Burke, Shaun thank you very much for your support throughout this summer. I feel really grateful to these mentors.


Monday, August 21, 2017

Week 12 : Exclude Non Matching Patients

In this week I have been working on the PTM-88 which is to exclude set of non-matching patient records from the report generation process. 

The main reason to carryout this task is, the match process generates a list of highly likely matches (probable duplicates) for human review. The human reviewer will declare true matches and non-matches from this list. Currently, the Patient Matching module repeatedly presents all likely matches each time it runs, without being informed by the human reviewer information learned from prior runs. 

User Interface,



You can exclude the patients by selecting them and clicking the button Exclude Patients.


Sunday, August 13, 2017

Week 11 : Merge Patients

Once after generating the report if user thinks some patients in a particular group are the same then there should be a functionality to merge them. In this week I have completed it successfully. 

You can merge the patients by selecting them and clicking the button merge.

Once the set of patients are merged the report will be look like as follows, 

This task was carried out under the PTM-87 and the pull request can be found here.

Sunday, August 6, 2017

Week 10 : Nice Report :)

In this week I have been doing some enhancements to the patient matching report. The reason I had to do that when the patients are updated in a way that it would affect to the  patient matching report, then the report too should be updated. 

Consider the following patient matching report.

If the patient 13(Unique ID) is updated in a way that it will no longer exhibits any matching properties with patient 12, then the group 4 should be removed from the report. But if patient 23 is updated then only the patient 23 should be removed from the report not every record in the group 3.

The code that I written for the above purpose looks like as follows,


Sunday, July 30, 2017

Week 9 : Incremental Patient Match

Good news, everything which is needed to perform an incremental match has been completed. In this week I did complete the PTM-85 which is mainly about taking the user's decision whether to perform the patient match as an incremental match.


If the user selects the match as an incremental then the time takes to perform the match will be less compared to a normal patient match. This performance is achieved because the data set to be compared is small. The data set contains only the newly added patients and updated patients after the last execution date of the report.


What if the user selects an incremental match and a patient match has not been done under the particular strategy? 
For the first time every patient record is compared with every other record. This process might take considerable time depending on the size of the patient records.

Despite whether there is an already existing report user can perform a full patient match for a particular strategy. 

Pull request for PTM-85 : https://github.com/openmrs/openmrs-module-patientmatching/pull/37 

The web page looks like as follows,



Incremental Patient Match

After all of theses changes this is how it looks like.

1. Run a report with the configuration name "test1"



A report will be created adding the incremental-report text to the configuration name.


2. This is how it looks like when a patient match is performed at the first time



3. Added a patient which shows a match with an existing record



4. Run the report again (Incremental patient matching)



The same incremental-report-test1 report will be updated.

5. New patient is added which do not exhibit any matching property with existing records



Then there will be no changes in the report.

6. Add another patient



7. Run the patient match then the report will be updated as the following image




8. Update an existing patient in a way that it will show matching properties with existing records


The updated patient will be added to the same group in the report.


Here is the link for my mid term presentation : https://www.youtube.com/watch?v=j-m9kDQmdz0&t=5s 

Sunday, July 23, 2017

Week 8: Game is almost done

I have completed almost all of the tasks that I have included in the project plan and made the third pull request. The pull request is for PTM-84 and after doing this task, patient matching module supports to match patients incrementally.    

To complete this task I had to change the MatchingReportUtils.java class in the patient matching module. Those changes can be found here.

What I have done up to now?
So far I have completed,

  • PTM-82 - Load patients for the incremental matching
  • PTM-83 - Generate and save reports in incremental patient matching process
  • PTM-84 - Perform patient match with two datasources


Sunday, July 16, 2017

Week 7: Game of Codes ;)

The tasks that I have aforementioned in my 5th week's blog post, have already been completed. The main target in PTM-84 task is to make patient matching module to deal with two datasources. The importance of this process has been mentioned in my 5th week blog post

I had to change 6 methods in MatchingReportUtils.java. Methods are listed below. 
  1. InitScratchTable
  2. CreRanSamAnalyzer
  3. CreAnalFormPairs
  4. CrePairdataSourAnalyzer
  5. ScoringData
  6. CreatingReport
Every method indicated above should support for both deduplication as well as for two datasources. Deduplication process is needed in the incremental patient matching process at the first time since every patient record is being matched with every other record. Not only that if the user specifically indicates to run the patient match for all the records, deduplication process is the one should carry out the task.

Power of the incremental patient matching comes with two datasources. One datasource is comprised with all the patients while the other datasource contains only the patients who are added or changed after the last execution date of the report.

What I have done ?

The changes that I have done for the patient matching module to get this work done can be found here.


Monday, July 10, 2017

Week 6: One more datasource!

According to my last week post I need to change few methods in patient matching module, in order to have a patient match with two datasources. So I started with following methods in MatchingReportUtils.java
  1. InitScratchTable
  2. CreRanSamAnalyzer
  3. CreAnalFormPairs
  4. CrePairdataSourAnalyzer
  5. ScoringData
  6. CreatingReport
Following code shows how I changed the code to support for both deduplication as well as for two datasources.

This code segment was added inside the InitScratchTable,

CreRanSamAnalyzer method was changed as follows,

Saturday, July 1, 2017

Week 5: Important Work

In this week I have been working on PTM-84. As I mentioned in OpenMRS talks the current version does not support to match patients by taking two datasources. That means it only supported to match patients by deduplication. 

What is required to do?
According to my ultimate goal Patient Matching 2.0 module must support for the incremental patient matching. For that it is required to,
  • Fetch patients considering the date created or date changed with the report date
  • Fetch all the patients (except the patients that have been already fetched)
and perform the match with two data sources.

Why this is necessary?
The current version is not an efficient method for implementations having huge set of records.

For instance, if we have 10,000 patients in our system and we need to match the patients using first name and the date of birth. Goal is to check for the duplicates among them. If we compare all patients to all the others that is roughly 50 million comparisons ( 10,000 x (10,000 - 1) / 2 ). After couple of days if we run the same match where 90 patients have been added and 10 updated, with the current version it would still carry out the same method of comparison and this time it would be about 51 million comparisons!
The current goal is to perform comparisons only for the added and updated records for that particular match. If we have this sort of an amazing method rather than 51 million of comparisons it would result in only about 1 million [(100 x 99 / 2) + (100 x 9990)]. 

In order to do that we should consider two datasources without that it is just matching updated or added patients with themselves. (deduplication)

What I have done ?
These are the things I did, I had to change couple of methods in MatchingReportUtils.java
The methods are
  1. InitScratchTable
  2. CreRanSamAnalyzer
  3. CreAnalFormPairs
  4. CrePairdataSourAnalyzer
  5. ScoringData
  6. CreatingReport
In order to support for two datasources rather than deduplication.

Sunday, June 25, 2017

Week 4: More Work!

According to my project plan, the tasks were planned to do in the first phase have already been completed.
Within this week I did complete the task PTM-82 and have already made the pull request

Saving the report to a persistent storage is more important in this incremental process as any run after that depends on the previous report's properties. Task PTM-83 was created targeting that very important requirements. I have almost completed this task in this week.

Things I did to get the work done in PTM-83

Concerning on the previous report, there was need for create a method to update the properties of the new report.This is not just updating an object. What made me to say this? Consider the following database table,

patientmatching_matchingset

In the above table all the pending matches are going to be shown as long as user mark them as accepted or rejected. So what happens if the user runs a strategy which already has a report and set of records in the above table? 

Let me give you an example suppose the user runs a strategy which already has a report (take report ID as 48) according to the above table it has 13 records (from set_id 125 to 137). According to the process of incremental matching it considers only the patients whose record has changed or patients who are newly added to the system after the last ran date of the above report. Let's say the match has completed and there can be two cases,

1. A patient record in a matching pair might already exists in the above table. (Take the new patient as PatientA and the it showed a match with 126th set_id)

2. Non of the patient records in the matching pairs are not in the above table.

If the match process comes across the 2nd possibility we do not have to concern more it is just an update of the table but issue comes in the 1st possibility. If it is the case we have to insert the patient record(PatientA) to the same group id, in this scenario group_id is 9. Not only that all other patient records which exhibited a matching property with that patient PatientA should come to the same group id 9 under the report_id 48.

This is the code that I wrote for this task.


More details about the changes can be found here.

Within this week I also thought to lay the foundation for the next task as well. A ticket has been created for the the task by name PTM-84. The main target of this task is to perform the match with all the records. In the current version it only allows to perform match with the same set of records by the property of deduplication. 

Sunday, June 18, 2017

Week 3: Finally Some Relief

I am very happy to say that the problem that I have aforementioned in my 2nd week's blog post, has already been solved. 

In my 1st week's blog post I did mention the tasks that I should complete withinin this summer. The task I have completed in this week is,
Once the user has selected the old report, we must find the date it had run and fetch the patients' records which are either added or updated after that date. This is done by comparing patientmatching_report.created_on AND patient.date_added OR patient.date_changed table fields.
 This is how it was mentioned in the post.

There were two classes that I should change to get the job done,
Changes related to those classes can be found here and here respectively.


Things I want to highlight


According to my target I should maintain a single report per a strategy therefore it is necessary to have unique name for a strategy. For example a strategy having the name "family_name block" will be named as as "dedup-incremental-report-family_name block". 

How did I Fetch the Patient Records ?
It is straightforward. As I have to fetch the patients' records which are either added or updated after the date of the report generated, I had to add a restriction to the criteria which was created by hibernate createCriteria(Patient.class).


Only for a Single Strategy
In order to achieve the above part it must be ensured that this only happens when the user selects a single strategy. This part was done considering the user's selection. If there is only a single strategy the above part will get activated. I have shown in the below, how the code appears to be in the class MatchingReportUtils.java inside the method InitScratchTable


Sunday, June 11, 2017

Week 2: Struggling Times

Since the 2nd week has almost finished, my primary goal is to get the incremental patient matching into alive, as aforesaid in my 1st week’s report. I have split it into sub goals in order to get a clear picture on it. 

Here are the sub goals,
  1. The strategy should be identified in which the user is being used.
  2. Is the strategy a combination of set of strategies? If it is so, we ignore it. Otherwise we should consider the particular strategy and then should follow the steps as they are listed below.
  3. Should find out whether there are any reports stored in the database related to the strategy. If there are no reports, we are good to go. Nothing has matched earlier therefore all the patient records should be matched with each other.
  4. Once the user has selected the old report, we must find the date it had run and fetch the patients' records which are either added or updated after that date. This is done by comparing patientmatching_report.created_on AND patient.date_added OR patient.date_changed table fields.
 My primary work consists of 1,2 subgoals and it is almost done. I have started creating the foundation for the sub goal 3. Since I was curious and interested about working with patients I paid my attention to the 4th goal at the beginning of the second week. Believe me, I have been struggling for the past 5 days just to find the code segment which is meant for retrieval of patients from the database. It’s just a matter of time of finding the code segment because I have already found a way to retrieve the patient records in order to adhere with incremental matching process.

The following two tables are the data sources which are meant to help in completing this task, 

1. patient

2. patientmatching_report

By comparing the patient's table date_changed or date_created with the field created_on in patientmatching_report I can simply sort out the necessary records for the next match.

I have figured out the place where the patients are retrieved,

This is the method which loads all the patients regardless of the date created or date changed.


This week was a tough week for me as my end semester exams have already started, somehow I managed to allocate time for this wonderful Patient Matching module. 
My next goal is to complete the 4th sub goal.







Monday, June 5, 2017

Week 1: The Match Begins!

The objectives of the Patient Matching 2.0 project are,

1.  Perform patient match taking considerations of the previous run (incremental patient matching). 
  • This method will avoid unnecessary record comparisons.
  • A match will be performed upon,
          addition of new patients or
          changes in patients’ records
  • After an incremental match a separate output will be created.
  • This incremental matching is specific to a strategy.
  • An output from previous run can be selected as the memory for the next run(eg. For strategy “A” there are 50 runs in last 3 months, so it should be possible to pick one output from the 50 as the memory for the next match)
2.  Avoid repeated manual reviews of previous manually reviewed matches
3.  UI changes to reflect above functionalities

According to my project plan my first goal is to perform patient match taking considerations of the previous run (incremental patient matching).
I have already identified a method to get the work done. For my first goal the relevant tables in OpenMRS database are,

  • patient - has all the information about when was the patient added and when was the patient last updated
  • patientmatching_configuration - has the details of the saved configurations
  • patientmatching_report - has the stored strategies
  • patientmatching_report_configuration - associate table which match the table patientmatching_configuration and the patientmatching_report
  • patientmatching_report_generation_step - has the properties which should be updated as in the  incremental matching process report is going to be updated without creating some bunch of reports over and over again.
Below images are the screenshots of above mentioned tables.

patient


patientmatching_configuration


patientmatching_report


patientmatching_report_configuration


patientmatching_report_generation_step



I have identified few steps to solve this problem. 
In the code, 
  1. The strategy should be identified which the user being used.
  2. Is the strategy combination of set of strategies? If it is we ignore it. If not we should consider the particular strategy and then should follow the steps listed below.
  3. Should find out whether there are any reports stored in the database related to the strategy. If there are no reports, good to go nothing has matched earlier therefore all the patient records should be matched with each other.
  4. If there are couple of old reports related to the strategy, should be prompt to user asking which report is considered as the memory for the next run.
  5. Once after the user has selected the old report we must find the date it has been run and fetch the patients' records which are either added or updated after that date. This is done by comparing patientmatching_report.created_on AND patient.date_added OR patient.date_changed table fields.
So what I have been currently doing is the 1,2, and 3rd steps. My GitHub repo can be found here.

Tuesday, May 23, 2017

OpenMRS Patient Matching Module

First of all, I would like to start with a brief introduction about OpenMRS even though you might know about it.

What is OpenMRS?

"Write Code, Save Lives"


OpenMRS is an electronic Medical Record System with the essential functionalities. It is a community of developers, and users working toward a shared and open foundation for managing health information in developing countries. You can find more details here. As a developer I love their motto "Write Code, Save Lives"

What is Patient Matching Module?

Patient Matching Module is an application where it tries to identify records that belong to the same patient among different number of data sources. This module is significant because in real world it has to be dealt with erroneous data. For example, a patient’s name can be misspelled. Benefit of this module is by linkage of records it would be easier for a patient to visit a hospital without carrying hard copies of test results whether or not the tests were done at the same hospital. 

If you would like you can download the module here and the source code is available here.

Currently it has standalone version which is a Java Swing application and patient matching module can be operated through the OpenMRS reference application as well.

This is an outstanding module as I have mentioned earlier, but in the current version it has some limitations such as, 
  • Patient Matching Module scans all the records each time it runs.
Not an efficient method for implementations having huge set of records. 
For instance, if we have 10,000 patients in our system and we need to match the patients using first name and the date of birth. Goal is to check for the duplicates among them. If we compare all patients to all the others that is roughly 50 million comparisons ( 10,000 x (10,000 - 1) / 2 ). After couple of days if we run the same match where 90 patients have been added and 10 updated, with the current version it would still carry out the same method of comparison and this time it would be about 51 million comparisons!
What if we have a method where the module performs comparisons only for the added and updated records for that particular match. If we have this sort of an amazing method rather than 51 million of comparisons it would result in only about 1 million [(100 x 99 / 2) + (100 x 9990)]. Congratulations we saved the valuable time of the module neglecting 50 million comparisons. This power will be available in Patient Matching 2.0
  • The match process generates a list of highly corresponding matches (probable duplicates) for human review.
Current version presents all similar matches each time it functions. This would not be a positive experience for the user because every time after a comparison, user has to review the same set of records.

This Patient Matching is an amazing module which needs some modifications to perform well in real world. OpenMRS has given this problem as a GSoC 2017 project and I will be solving these problems in this summer. Hope you would read upcoming articles to know more about Patient Matching 2.0. 

Cheers! 😀