Why does this PHP construct:
normalizer_normalize( $search_string, \Normalizer::FORM_D );
Convert ÖÖÖ to OOO, but keeps ÅÅÅ as ÅÅÅ ... WTF?!
@joho Wasn't there something with locale? Or the underlying ICU version? Something knocks from deep down in my mind....
@joho And what does the HEX characters actually say?
@heiglandreas I didn't do that part, I'm just looking at the output, which is what I need to be correct.
But @thanius and @lpwaterhouse may be onto something here.
Maybe I'll just stick to transliteration then. I'm probably overworking the code, but I hate to leave thing to "chance" when I develop.
@joho Stupid question perhaps: Why are you using normalization when the output just needs to look correct?
What problem are you trying to solve?
/cc @thanius @lpwaterhouse
The data is stored in an SQL database. I've started to encrypt the (sensitive parts of) data at rest. So I need to do in-memory comparisons and sorting.
Normally, I would compare w/all umlauts, etc, but in this particular case, I want to get a match on "vårsol" when I'm searching for "vårsol" or "varsol". And this matching is, after decryption, done in the application layer.
(And I don't want to use specific database functionality to handle all this.)
@joho But wouldn't transliteration be more what you are looking for?
'Cause Normalization just handles how the Unicode-Character is stored internally. So an 'Ä' should always 'look' the same, but the HEX-code might be different.
But transliteration converts from something into something else. And in your case you want to compare kind of based on ASCII if I see that correctly.
Feel free to check out https://andreas.heigl.org/2021/06/23/transliter-what/
@joho As an alternative you could move the searching to something like Elasticsearch? @thanius @lpwaterhouse
Thanks, but that won't work in this case as I want to depend on as few components / stacks / whatever as possible.
Yes, transliteration is the way to go in this case, which is what I'm doing now.
Thanks for all the advice, and pointers in the right direction.