Hi. This is a little off topic. I am not actually using mod_rewrite, but I am using reg ex to make pretty URLs.
At the moment I a trying to make a bulletproof string to url function to create nice URLs. All is well until I realised that any accented chars are going to be removed. I have found some instructions on how to do this, however its in Perl and I don't know if its possible to port this over to PHP.
The instructions are:
1. we take some data with diacritics;
2. convert it to Unicode;
3. put it through Canonical Decomposition, also known as Normalization Form D;
4. remove all characters that belong to the Unicode General Category “Mark” (non-spacing, spacing combining, enclosing) — thus removing the diacritics (accent marks);
5. prepare the data for output to an ASCII stream.
I have completed them up to step 3, but am stuck on step 4. The following Perl is given to do step 4 but I am not sure if this can be done in PHP.
for ( $str ) { # the variable we work on
## convert to Unicode first
## if your data comes in Latin-1, then uncomment:
#$_ = Encode::decode( 'iso-8859-1', $_ );
$_ = NFD( $_ ); ## decompose
s/\pM//g; ## strip combining characters
s/[^\0-\x80]//g; ## clear everything else
}
Any ideas?