Did you know that Dutch uses the following list of special (diacritic) characters?
áéíóúàèëïöüÁÉÍÓÚÀÈËÏÖÜ
While I was working on a python script, I figured I could simply use python’s translate function to “translate” them to their base character. It probably is the worst idea ever, but hey, it is what it is.
According to tutorialspoint.com you need to know the following about this function:
The method translate() returns a copy of the string in which all characters have been translated using table (constructed with the maketrans() function in the string module), optionally deleting all characters found in the string deletechars.
Great – sounds like the perfect solution, except that it doesn’t. The above are unicode characters and if you are doing Python 2.x, the number of character you see might be the same, but Python will see a different string length. So how to address?
intab = u"áéíóúàèëïöüÁÉÍÓÚÀÈËÏÖÜ" outtab = u"aeiouaeeiouAEIOUAEEIOU" trantab = dict((ord(a), b) for a, b in zip(intab, outtab)) translated = originalString.translate(trantab)
Works a charm! Here’s what you need to know
- Ensure that you have # -*- coding: utf-8 -*- as the first or second line of your script
- Ensure that you place a u before the string which you define for intab and outtab
- Ensure that you do not use maketrans() but the dict() function shown above