[Users] How to filter utf8 messages

Slavko linux at slavino.sk
Thu Jul 27 11:36:37 UTC 2023


Ahoj,

Dňa Thu, 27 Jul 2023 07:22:16 -0400 Pierre Fortin <pf at pfortin.com>
napísal:

> Wow!  I've never seen that regex syntax and my ancient O'Reilly
> (7/1997) "Mastering Regular Expressions" book does not cover it.
> Therein, {min:max} is the "interval" quantifier, and no mention of
> "\p". I tried the suggested regex in QuickSearch; but no hits.  The
> messages I'm trying to filter are mostly in Chinese & Japanese; are
> these covered by the above suggestion?

I am not regex nor CJK guru, but i afraid that in 1997 nobody care about
Unicode ;-) From my notes:

Japan:

    [\u3040-\u30ff]
    [\p{Han}\p{Hiragana}\p{Katakana}]

Chinese:

    [\u4e00-\u9FFF]
    \p{Han}

Korean:

    [\uac00-\ud7a3]
    \p{Hangul}

Arabic:

    [\u0621-\u064A]
    \p{Arabic}

Try this https://www.regular-expressions.info/unicode.html

regards

-- 
Slavko
https://www.slavino.sk
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: Digit��lny podpis OpenPGP
URL: <http://lists.claws-mail.org/pipermail/users/attachments/20230727/1b013b5b/attachment.sig>


More information about the Users mailing list