Autofile

Discussion in 'New Release and Beta Release Information' started by Graham, Apr 10, 2008.

  1. Graham

    Graham Developer Staff Member

    Autofile is a utility that attempts to recognize scanned and faxed documents, and rename them accordingly.

    It uses a commercial web ocr service.

    Latest build 7 - now offers a larger preview area to select the regions for zonal OCR.

    If you wish to test this, you will need to send me your ip address.

  2. Graham

    Graham Developer Staff Member

    Updated to build 8 ... now also uses tesseract to speed up the recognition.

    IP address no longer required for the testing period.



  3. Jason

    Jason Developer / Handyman Staff Member

  4. Graham

    Graham Developer Staff Member

    regrettably not.

    Open source software is still a long way off from commercial OCR software.
  5. Jason

    Jason Developer / Handyman Staff Member

    bug

    Attached Files:

  6. Jason

    Jason Developer / Handyman Staff Member

    I have gotten this error message 2x out of 10.

    Attached Files:

  7. Graham

    Graham Developer Staff Member

    I haven't seen this message recently but it means that the web service is too busy to process the request.

  8. Jason

    Jason Developer / Handyman Staff Member

    How about a process now ?

    I'd be nice to save the scans directory as well, without having to re-select it each time.

    Attached Files:

  9. Jason

    Jason Developer / Handyman Staff Member

    One of my dates, the year was recognized as Of (capital letter O and the small letter f). The year should be restricted to numbers.

  10. Graham

    Graham Developer Staff Member

    >How about a process now ?

    It's supposed to be left running 24/7 ...
  11. Jason

    Jason Developer / Handyman Staff Member

    sometimes my staff notice that a patient has come in and has new results, unscanned, so they want to get it into the computer ASAP, before the patient is seen.
  12. Graham

    Graham Developer Staff Member

    Pressing the "stop" button and then the "Go" button should do the trick

    New version 10 at
    http://www.compkarori.com/autofile/autofile.exe


    which does more network error checking, waits 30 seconds on a timeout, and saves the scan directory.

  13. Jason

    Jason Developer / Handyman Staff Member

    error message

    Attached Files:

  14. Jason

    Jason Developer / Handyman Staff Member

    getting this error with some files ... and a slightly different error when it does it on the date.
  15. Jason

    Jason Developer / Handyman Staff Member

    The problem was the area of the scan I chose to "detect" the right rule. ie. the ID area. I have since chosen a better area, and it's working well.

    However, after reviewing my test documents, it seems that the failures come from the variable nature of the vertical direction in scanning.

    Attached Files:

  16. Jason

    Jason Developer / Handyman Staff Member

    Another factor behind the variable vertical dimension ... how my lab prints on paper creates variability.

    Attached Files:

  17. Jason

    Jason Developer / Handyman Staff Member

    Idea: the small selection areas are what introduce errors, a bigger selection area would by definition be easier to make sure the important text is within the image.

    Q: are there different algorithms for "Finding" the text to be OCR'd?

    Attached Files:

  18. Graham

    Graham Developer Staff Member

    Build 15 fixes all these errors.

  19. Graham

    Graham Developer Staff Member

    I ran another test using build 15. I had 25 of my own documents, 8 of Jason's and a total of 13 rules. Managed 100% recognition.

  20. Graham

    Graham Developer Staff Member

    Tesseract seems to do a fairly decent job on some documents and poorly on others.

    Another optimization might be to allow the user to specify which OCR engine to use ..

Share This Page