Sunday, July 23, 2017

Week 8: Game is almost done

I have completed almost all of the tasks that I have included in the project plan and made the third pull request. The pull request is for PTM-84 and after doing this task, patient matching module supports to match patients incrementally.    

To complete this task I had to change the MatchingReportUtils.java class in the patient matching module. Those changes can be found here.

What I have done up to now?
So far I have completed,

  • PTM-82 - Load patients for the incremental matching
  • PTM-83 - Generate and save reports in incremental patient matching process
  • PTM-84 - Perform patient match with two datasources


Sunday, July 16, 2017

Week 7: Game of Codes ;)

The tasks that I have aforementioned in my 5th week's blog post, have already been completed. The main target in PTM-84 task is to make patient matching module to deal with two datasources. The importance of this process has been mentioned in my 5th week blog post

I had to change 6 methods in MatchingReportUtils.java. Methods are listed below. 
  1. InitScratchTable
  2. CreRanSamAnalyzer
  3. CreAnalFormPairs
  4. CrePairdataSourAnalyzer
  5. ScoringData
  6. CreatingReport
Every method indicated above should support for both deduplication as well as for two datasources. Deduplication process is needed in the incremental patient matching process at the first time since every patient record is being matched with every other record. Not only that if the user specifically indicates to run the patient match for all the records, deduplication process is the one should carry out the task.

Power of the incremental patient matching comes with two datasources. One datasource is comprised with all the patients while the other datasource contains only the patients who are added or changed after the last execution date of the report.

What I have done ?

The changes that I have done for the patient matching module to get this work done can be found here.


Monday, July 10, 2017

Week 6: One more datasource!

According to my last week post I need to change few methods in patient matching module, in order to have a patient match with two datasources. So I started with following methods in MatchingReportUtils.java
  1. InitScratchTable
  2. CreRanSamAnalyzer
  3. CreAnalFormPairs
  4. CrePairdataSourAnalyzer
  5. ScoringData
  6. CreatingReport
Following code shows how I changed the code to support for both deduplication as well as for two datasources.

This code segment was added inside the InitScratchTable,

CreRanSamAnalyzer method was changed as follows,

Saturday, July 1, 2017

Week 5: Important Work

In this week I have been working on PTM-84. As I mentioned in OpenMRS talks the current version does not support to match patients by taking two datasources. That means it only supported to match patients by deduplication. 

What is required to do?
According to my ultimate goal Patient Matching 2.0 module must support for the incremental patient matching. For that it is required to,
  • Fetch patients considering the date created or date changed with the report date
  • Fetch all the patients (except the patients that have been already fetched)
and perform the match with two data sources.

Why this is necessary?
The current version is not an efficient method for implementations having huge set of records.

For instance, if we have 10,000 patients in our system and we need to match the patients using first name and the date of birth. Goal is to check for the duplicates among them. If we compare all patients to all the others that is roughly 50 million comparisons ( 10,000 x (10,000 - 1) / 2 ). After couple of days if we run the same match where 90 patients have been added and 10 updated, with the current version it would still carry out the same method of comparison and this time it would be about 51 million comparisons!
The current goal is to perform comparisons only for the added and updated records for that particular match. If we have this sort of an amazing method rather than 51 million of comparisons it would result in only about 1 million [(100 x 99 / 2) + (100 x 9990)]. 

In order to do that we should consider two datasources without that it is just matching updated or added patients with themselves. (deduplication)

What I have done ?
These are the things I did, I had to change couple of methods in MatchingReportUtils.java
The methods are
  1. InitScratchTable
  2. CreRanSamAnalyzer
  3. CreAnalFormPairs
  4. CrePairdataSourAnalyzer
  5. ScoringData
  6. CreatingReport
In order to support for two datasources rather than deduplication.

Sunday, June 25, 2017

Week 4: More Work!

According to my project plan, the tasks were planned to do in the first phase have already been completed.
Within this week I did complete the task PTM-82 and have already made the pull request

Saving the report to a persistent storage is more important in this incremental process as any run after that depends on the previous report's properties. Task PTM-83 was created targeting that very important requirements. I have almost completed this task in this week.

Things I did to get the work done in PTM-83

Concerning on the previous report, there was need for create a method to update the properties of the new report.This is not just updating an object. What made me to say this? Consider the following database table,

patientmatching_matchingset

In the above table all the pending matches are going to be shown as long as user mark them as accepted or rejected. So what happens if the user runs a strategy which already has a report and set of records in the above table? 

Let me give you an example suppose the user runs a strategy which already has a report (take report ID as 48) according to the above table it has 13 records (from set_id 125 to 137). According to the process of incremental matching it considers only the patients whose record has changed or patients who are newly added to the system after the last ran date of the above report. Let's say the match has completed and there can be two cases,

1. A patient record in a matching pair might already exists in the above table. (Take the new patient as PatientA and the it showed a match with 126th set_id)

2. Non of the patient records in the matching pairs are not in the above table.

If the match process comes across the 2nd possibility we do not have to concern more it is just an update of the table but issue comes in the 1st possibility. If it is the case we have to insert the patient record(PatientA) to the same group id, in this scenario group_id is 9. Not only that all other patient records which exhibited a matching property with that patient PatientA should come to the same group id 9 under the report_id 48.

This is the code that I wrote for this task.


More details about the changes can be found here.

Within this week I also thought to lay the foundation for the next task as well. A ticket has been created for the the task by name PTM-84. The main target of this task is to perform the match with all the records. In the current version it only allows to perform match with the same set of records by the property of deduplication. 

Sunday, June 18, 2017

Week 3: Finally Some Relief

I am very happy to say that the problem that I have aforementioned in my 2nd week's blog post, has already been solved. 

In my 1st week's blog post I did mention the tasks that I should complete withinin this summer. The task I have completed in this week is,
Once the user has selected the old report, we must find the date it had run and fetch the patients' records which are either added or updated after that date. This is done by comparing patientmatching_report.created_on AND patient.date_added OR patient.date_changed table fields.
 This is how it was mentioned in the post.

There were two classes that I should change to get the job done,
Changes related to those classes can be found here and here respectively.


Things I want to highlight


According to my target I should maintain a single report per a strategy therefore it is necessary to have unique name for a strategy. For example a strategy having the name "family_name block" will be named as as "dedup-incremental-report-family_name block". 

How did I Fetch the Patient Records ?
It is straightforward. As I have to fetch the patients' records which are either added or updated after the date of the report generated, I had to add a restriction to the criteria which was created by hibernate createCriteria(Patient.class).


Only for a Single Strategy
In order to achieve the above part it must be ensured that this only happens when the user selects a single strategy. This part was done considering the user's selection. If there is only a single strategy the above part will get activated. I have shown in the below, how the code appears to be in the class MatchingReportUtils.java inside the method InitScratchTable


Sunday, June 11, 2017

Week 2: Struggling Times

Since the 2nd week has almost finished, my primary goal is to get the incremental patient matching into alive, as aforesaid in my 1st week’s report. I have split it into sub goals in order to get a clear picture on it. 

Here are the sub goals,
  1. The strategy should be identified in which the user is being used.
  2. Is the strategy a combination of set of strategies? If it is so, we ignore it. Otherwise we should consider the particular strategy and then should follow the steps as they are listed below.
  3. Should find out whether there are any reports stored in the database related to the strategy. If there are no reports, we are good to go. Nothing has matched earlier therefore all the patient records should be matched with each other.
  4. Once the user has selected the old report, we must find the date it had run and fetch the patients' records which are either added or updated after that date. This is done by comparing patientmatching_report.created_on AND patient.date_added OR patient.date_changed table fields.
 My primary work consists of 1,2 subgoals and it is almost done. I have started creating the foundation for the sub goal 3. Since I was curious and interested about working with patients I paid my attention to the 4th goal at the beginning of the second week. Believe me, I have been struggling for the past 5 days just to find the code segment which is meant for retrieval of patients from the database. It’s just a matter of time of finding the code segment because I have already found a way to retrieve the patient records in order to adhere with incremental matching process.

The following two tables are the data sources which are meant to help in completing this task, 

1. patient

2. patientmatching_report

By comparing the patient's table date_changed or date_created with the field created_on in patientmatching_report I can simply sort out the necessary records for the next match.

I have figured out the place where the patients are retrieved,

This is the method which loads all the patients regardless of the date created or date changed.


This week was a tough week for me as my end semester exams have already started, somehow I managed to allocate time for this wonderful Patient Matching module. 
My next goal is to complete the 4th sub goal.







Week 8: Game is almost done