Unidecode

MaxV · May 13, 2010

Hi all,
I have a problem, I need to convert to standard ASCII namefiles with a lot of Unicode characters and accent characters, examples:

350_Mother_Node™_Communication.pdf
Perchè.pdf

and so on...

I would use parse to look at each char file name and if char isn't an ASCII char, it'll be substitute with underscore _ .
I think something with
Code:
ascii: charset [ "_-."  #"0" - #"9" #"A" - #"Z" #"a" - #"z"]
Have you any ideas?
Thank you
Max

Graham · May 13, 2010

Rebol2 isn't unicode aware so I'd use Rebol3 for this.

Sunanda · May 13, 2010

Here's one way to get close -- it replaces all non-acceptable chars in the target with "_"

Code:

 file-name: {350_Mother_Node™_Communication.pdf}
 ascii: "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-_"

foreach char unique trim/with copy file-name ascii [
    replace/all file-name char "_"
]

== "350_Mother_Node__Communication_pdf"

Graham · May 13, 2010

I think you're allowed spaces in file names

MaxV · May 14, 2010

Thank you very much, it works very well! It's better than any other tool in other languages (Perl an Python), I added a check to speed up the process and the "/" to avoid addind a "_" at the end of all directories:
Code:
ascii: "0123456789qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM-_./" 
ascii2: charset [ "-_./" #"a" - #"z" #"A" - #"Z"  #"0" - #"9" ]

;function to correct file names
funz_ascii: func [ /local lista a ] [
	lista: read %. 
	foreach nomefile lista [
		if not (parse to-string nomefile  [any ascii2] ) [
			a: copy to-string nomefile
			foreach char unique trim/with copy a ascii [ replace/all a char "_"  ]
			rename nomefile to-file a
			]
		]	
	]
I'm so happy!
P.S. I'm not allowed spaces in file names, I have to send files in pipe in Linux programs, so spaces make a mess....

Graham · May 14, 2010

Your script will fail if you rename to an existing filename

MaxV · May 14, 2010

Nobody is perfect.

Graham · May 14, 2010

What you do is to add a digit to the filename if there is a name clash, and keep incrementing the digit(s) until there is no clash.

MaxV · May 14, 2010

Best and quicker:

Code:

if exists? to-file a [ view layout [
   text "The following file already esists:"
   text a
   text "please rename it:"
   newname: field 
   button "OK" [ unview]
 ]
rename nomefile to-file newname/text

Sunanda · May 14, 2010

Glad my sample code was some use.

Bullet-proofing code for operational usage is a never-ending task.

One other area you might want to look at is ensuring ascii and ascii2 remain equivalent.....With hurried maintenance they may drift apart in value and that could cause subtle problems.

Avoid that by deriving one from the other:
Code:
     ascii: "0123456789qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM-_./"
    ascii2: charset ascii
Sadly, there is no easy way of doing it the other way around: deriving the string! that is equivalent to the bitset!

Graham · May 14, 2010

I'd let the script rename it as asking the user to rename a file can still lead to a name clash. Unless you want to also check to see if the suggested name exists.

MaxV · May 18, 2010

Ok, ok:

Code:

ascii: "0123456789qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM-_./"
ascii2: charset ascii
funz_ascii: func [ /local lista a ] [
	lista: read %. 
	foreach nomefile lista [
		if not (parse to-string nomefile  [any ascii2] ) [
			a: copy to-string nomefile
			foreach char unique trim/with copy a ascii [ replace/all a char "_"  ]
			funz_testname a
			]
		]	
	]
funz_testname: func [name /local b] [
   b: copy name
   either exists? to-file b [ 
       view layout [
          text "The following file already exists:"
          text b
          text "please rename it:"
          newname: field 
          button "OK" [ unview]
          ] 
       funz_testname newname/text
       ] [ rename nomefile to-file b]
]
;here we go
funz_ascii

Now: or user gives a correct name or he'll never exit from the loop.

Max

Log in or Sign up

Unidecode

MaxV Member

Graham Developer Staff Member

Sunanda New Member

Graham Developer Staff Member

MaxV Member

Graham Developer Staff Member

MaxV Member

Graham Developer Staff Member

MaxV Member

Sunanda New Member

Graham Developer Staff Member

MaxV Member

Share This Page

Log in or Sign up

Unidecode

MaxV Member

Graham Developer Staff Member

Sunanda New Member

Graham Developer Staff Member

MaxV Member

Graham Developer Staff Member

MaxV Member

Graham Developer Staff Member

MaxV Member

Sunanda New Member

Graham Developer Staff Member

MaxV Member

Share This Page

Useful Searches