phpc.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A server for PHP programmers & friends. Join us for discussions on the PHP programming language, frameworks, packages, tools, open source, tech, life, and more.

Administered by:

Server stats:

800
active users

Why does this PHP construct:

normalizer_normalize( $search_string, \Normalizer::FORM_D );

Convert ÖÖÖ to OOO, but keeps ÅÅÅ as ÅÅÅ ... WTF?! 🤔

@joho Wasn't there something with locale? Or the underlying ICU version? Something knocks from deep down in my mind....

Alerta! Alerta!

@joho And what does the HEX characters actually say?

@heiglandreas I didn't do that part, I'm just looking at the output, which is what I need to be correct.

But @thanius and @lpwaterhouse may be onto something here.

Maybe I'll just stick to transliteration then. I'm probably overworking the code, but I hate to leave thing to "chance" when I develop.

@joho Stupid question perhaps: Why are you using normalization when the output just needs to look correct?

What problem are you trying to solve?
/cc @thanius @lpwaterhouse

@heiglandreas

The data is stored in an SQL database. I've started to encrypt the (sensitive parts of) data at rest. So I need to do in-memory comparisons and sorting.

Normally, I would compare w/all umlauts, etc, but in this particular case, I want to get a match on "vårsol" when I'm searching for "vårsol" or "varsol". And this matching is, after decryption, done in the application layer.

(And I don't want to use specific database functionality to handle all this.)

@thanius @lpwaterhouse

@joho But wouldn't transliteration be more what you are looking for?

'Cause Normalization just handles how the Unicode-Character is stored internally. So an 'Ä' should always 'look' the same, but the HEX-code might be different.

But transliteration converts from something into something else. And in your case you want to compare kind of based on ASCII if I see that correctly.

Feel free to check out andreas.heigl.org/2021/06/23/t

/cc @thanius @lpwaterhouse

andreas.heigl.org · Transliter... what? » andreas.heigl.org
More from Andreas Heigl

@joho As an alternative you could move the searching to something like Elasticsearch? @thanius @lpwaterhouse

@heiglandreas

Thanks, but that won't work in this case as I want to depend on as few components / stacks / whatever as possible.

@heiglandreas

Yes, transliteration is the way to go in this case, which is what I'm doing now.

Thanks for all the advice, and pointers in the right direction.

@thanius @lpwaterhouse
@nafmo