Automated OCR for Synapse

Discussion in 'Feature: Requests and Planning' started by Graham, Nov 10, 2007.

  1. Graham

    Graham Developer Staff Member

    did some timing and it takes 10 mins to correctly recognize 5 forms including grabbing patient names, test dates and ids.

  2. Jason

    Jason Developer / Handyman Staff Member

    Were they all from the same lab ?

    Attached Files:

  3. Graham

    Graham Developer Staff Member

    No, four different sources.

  4. Graham

    Graham Developer Staff Member

    Actually, rather than giving up on not being able to allocate the scan correctly, it might be better to attach to a dummy patient and then move it later on .. that would be quicker.

  5. Jason

    Jason Developer / Handyman Staff Member

    Except if the enduser forgets about the dummy patient ! = lost scan.

    I'd pass on the dummy patient idea.
  6. Graham

    Graham Developer Staff Member

    No, because the dummy patient's results still end up in your inbox.

  7. Jason

    Jason Developer / Handyman Staff Member

    What a handy little inbox we have.



    ID, in your example is the name of the lab. Is that the plan ? ID = lab name ?

    Attached Files:

  8. Graham

    Graham Developer Staff Member

    The ID field refers to a unique text string found on the form that will identify that form.

    In some cases where the lab uses a graphic, I have used a PO Box number, or a fax number on their form to identify them.

  9. Graham

    Graham Developer Staff Member

    Here's a screen capture of the auto-filing utility in progress. The fields below the status area show the details for the last patient identified.



    [​IMG]
  10. Jason

    Jason Developer / Handyman Staff Member

    [quote user="Graham"]

    Here's a screen capture of the auto-filing utility in progress. The fields below the status area show the details for the last patient identified.



    [​IMG]

    [/quote]



    Cool GUI. I love the Trying Rule updates :)
  11. Graham

    Graham Developer Staff Member

    Updated screenshot with data masked out

    [​IMG]
  12. Graham

    Graham Developer Staff Member

    I wonder if it might be too difficult to allow users to specify bitsets for the SSN/NHI.

    Eg: [#"0" - #"9" #"A" - #"Z" #"a" - #"z"]

    specifies that the SSN/NHI can only contain alphanumeric characters.

    or,<pre> [#"0" - #"9"]</pre>

    is numeric only ....



  13. Jason

    Jason Developer / Handyman Staff Member

    [quote user="Graham"]

    I wonder if it might be too difficult to allow users to specify bitsets for the SSN/NHI.

    Eg: [#"0" - #"9" #"A" - #"Z" #"a" - #"z"]

    specifies that the SSN/NHI can only contain alphanumeric characters.

    [/quote]

    OCR stuff tends to make letter/number errors ALOT. 1 = l and 0 = O. (See ! - hard to tell).

    I think it is a good idea to do the restriction if it will yield better results.



  14. Graham

    Graham Developer Staff Member

    Latest version allows you to specify alpha, alphanumeric etc.

    This has improved the results.

  15. Graham

    Graham Developer Staff Member

    Added a new rule for one of my major lab vendors today - not really necessary since I get their labs via HL7.

    But .. there were two problems:
    1. There was no text I could use to identify the lab because they used a blue small font for the text I would otherwise use - and this doesn't OCR. In the end I ended up using the year string of the test date as it always appears in the same position. Just have to change it for next year.
    2. The first name and surname are on different parts of the form. I don't really want to add another field for the first name .. so I am just using the surname. This with the NHI number is sufficient for me to ID the patient.
  16. Jason

    Jason Developer / Handyman Staff Member

    One method of identification of the scans is to OCR the scan ... and with the IDENTIFIED extracted text .. you can probably figure out what it is.

    For me, I think this would be superior to the current proposed method.

    But theories are just theories ... testing is the key.



  17. Graham

    Graham Developer Staff Member

    the flaw in this is that the keywords I want can't be ocr'd because they're in some tiny colored font and produce garbage in the ocr'd text.

    I don't know why our labs here want to produce printed results in what looks like a 8 point font!

    I am currently rewriting the OCR to use asynchronous tcp, as currently it uses synchronous tcp which leaves the GUI non responsive during the OCR process.

Share This Page