[Users] How to filter utf8 messages

Pierre Fortin pf at pfortin.com
Thu Jul 27 11:22:16 UTC 2023


On Thu, 27 Jul 2023 08:38:18 +0000 Colin Leroy-Mira via Users wrote:

>July 27, 2023 at 9:50 AM, "Slavko" <linux at slavino.sk> wrote:
>
>
>> And another question, i use this regex to score Chinesse & etc chars
>> (scripts) in subject in rspamd (perhaps can be useful for OP):
>>  [\p{Han}\p{Hiragana}\p{Katakana}\p{Hangul}\p{Arabic}]+
>> 
>> Will that work in CM filter regexes?  
>
>I'm unsure, you can test regexps in the QuickSearch in extended mode:
>subject regexpcase "..."
>

Wow!  I've never seen that regex syntax and my ancient O'Reilly (7/1997)
"Mastering Regular Expressions" book does not cover it.  Therein,
{min:max} is the "interval" quantifier, and no mention of "\p".
I tried the suggested regex in QuickSearch; but no hits.  The messages
I'm trying to filter are mostly in Chinese & Japanese; are these covered
by the above suggestion?

Thanks,
Pierre 


More information about the Users mailing list