Unidecode

Discussion in 'Rebol' started by MaxV, May 13, 2010.

  1. MaxV

    MaxV Member

    Hi all,
    I have a problem, I need to convert to standard ASCII namefiles with a lot of Unicode characters and accent characters, examples:

    350_Mother_Node™_Communication.pdf
    Perchè.pdf

    and so on...

    I would use parse to look at each char file name and if char isn't an ASCII char, it'll be substitute with underscore _ .
    I think something with
    Code:
    ascii: charset [ "_-."  #"0" - #"9" #"A" - #"Z" #"a" - #"z"]
    
    Have you any ideas?
    Thank you
    Max
  2. Graham

    Graham Developer Staff Member

    Rebol2 isn't unicode aware so I'd use Rebol3 for this.
  3. Sunanda

    Sunanda New Member

    Here's one way to get close -- it replaces all non-acceptable chars in the target with "_"
    Code:
     file-name: {350_Mother_Node™_Communication.pdf}
     ascii: "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-_"
    
    foreach char unique trim/with copy file-name ascii [
        replace/all file-name char "_"
    ]
    
    == "350_Mother_Node__Communication_pdf"
    
  4. Graham

    Graham Developer Staff Member

    I think you're allowed spaces in file names :)
  5. MaxV

    MaxV Member

    Thank you very much, it works very well! It's better than any other tool in other languages (Perl an Python), I added a check to speed up the process and the "/" to avoid addind a "_" at the end of all directories:

    Code:
    ascii: "0123456789qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM-_./" 
    ascii2: charset [ "-_./" #"a" - #"z" #"A" - #"Z"  #"0" - #"9" ]
    
    ;function to correct file names
    funz_ascii: func [ /local lista a ] [
    	lista: read %. 
    	foreach nomefile lista [
    		if not (parse to-string nomefile  [any ascii2] ) [
    			a: copy to-string nomefile
    			foreach char unique trim/with copy a ascii [ replace/all a char "_"  ]
    			rename nomefile to-file a
    			]
    		]	
    	]
    
    I'm so happy! :D
    P.S. I'm not allowed spaces in file names, I have to send files in pipe in Linux programs, so spaces make a mess....
  6. Graham

    Graham Developer Staff Member

    Your script will fail if you rename to an existing filename
  7. MaxV

    MaxV Member

    Nobody is perfect. :p
  8. Graham

    Graham Developer Staff Member

    What you do is to add a digit to the filename if there is a name clash, and keep incrementing the digit(s) until there is no clash.
  9. MaxV

    MaxV Member

    Best and quicker:
    Code:
    if exists? to-file a [ view layout [
       text "The following file already esists:"
       text a
       text "please rename it:"
       newname: field 
       button "OK" [ unview]
     ]
    rename nomefile to-file newname/text
    
    ;)
  10. Sunanda

    Sunanda New Member

    Glad my sample code was some use.

    Bullet-proofing code for operational usage is a never-ending task.

    One other area you might want to look at is ensuring ascii and ascii2 remain equivalent.....With hurried maintenance they may drift apart in value and that could cause subtle problems.

    Avoid that by deriving one from the other:

    Code:
         ascii: "0123456789qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM-_./"
        ascii2: charset ascii
    
    Sadly, there is no easy way of doing it the other way around: deriving the string! that is equivalent to the bitset!
  11. Graham

    Graham Developer Staff Member

    I'd let the script rename it as asking the user to rename a file can still lead to a name clash. Unless you want to also check to see if the suggested name exists.
  12. MaxV

    MaxV Member

    Ok, ok:
    Code:
    ascii: "0123456789qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM-_./"
    ascii2: charset ascii
    funz_ascii: func [ /local lista a ] [
    	lista: read %. 
    	foreach nomefile lista [
    		if not (parse to-string nomefile  [any ascii2] ) [
    			a: copy to-string nomefile
    			foreach char unique trim/with copy a ascii [ replace/all a char "_"  ]
    			funz_testname a
    			]
    		]	
    	]
    funz_testname: func [name /local b] [
       b: copy name
       either exists? to-file b [ 
           view layout [
              text "The following file already exists:"
              text b
              text "please rename it:"
              newname: field 
              button "OK" [ unview]
              ] 
           funz_testname newname/text
           ] [ rename nomefile to-file b]
    ]
    ;here we go
    funz_ascii
    
    Now: or user gives a correct name or he'll never exit from the loop. ;)
    Max

Share This Page