I think I need the expression of interest from more than one person before tackling this potentially large task!
In thinking on how this might be done, we need to specify areas on the form where the type of form can be identified where the patient's name is the date of the test if it's a lab test the patient's nhi or other identifying code and preferably this should be done thru a GUI of some sort
[quote user="Graham"] In thinking on how this might be done, we need to specify areas on the form where the type of form can be identified [/quote] Probably the best method is to OCR the document first with a preliminary OCR. Search for identifying Text, when the match occurs, re-OCR the page with zonal OCR template that matches that document. Hopefully ... these items where the patient's name is the date of the test if it's a lab test the patient's nhi or other identifying code can reliably be retrieved once zonal OCR is performed. Preferably it's rarely used because the zonal OCR is so good.
That doesn't seem to work for me because the lab identifying stuff is often in a different font and color and so often is not ocr'ed correctly. Using Zonal ocr gives much better results.
In this example, can you post the image ... and how you are going to choose the zonal template ? so I can see if this notion will work with my documents ? ]
Ok, given a definition for a form, this is what Synapse will be able to do - capture the labname, the patient name, the NHI number and the test date. That is enough to auto file this document.
I've written a little tutorial on how to create the rules that will be used by the form recognizer engine.
The next step is to OCR the fields at the form creation time so that Synapse can fill in the fields for you. I've got that partially working now.
[quote user="Graham"] The next step is to OCR the fields at the <strong style="background-color: #ffff00">form[/b] creation time so that Synapse can fill in the fields for you. I've got that partially working now. [/quote] what form ?
I could use this to automate fax prescription renewals. - from what pharmacy - what drugs - patient name
Explain! I'm thinking of caching the last rule active so that if you scan in a bunch of documents all from the same source, it will be faster to process.
The autofiling utility will use it's own directory. Any scan it can't process, it will move it to the default scan directory. That's to stop it endlessly processing the same scans.
It looks like I need to add an orientation to the text as well. I found one lab form that has the name of the lab vertically along the left hand edge of the paper. So, I need to scan that region, rotate it, and then submit for OCR. It takes about 4 mins for Synapse OCR to recognize a form on initial testing ...
Looking good .... if orientation is set to vertical, Synapse OCR will now rotate the specified region 90 deg clockwise before submitting for OCR []
First run ... 5/5. Correctly identified the forms, and then ocr'd the regions where the name, id, and test date were specified.