[Users] How to filter utf8 messages

Thu Jul 27 11:58:54 UTC 2023

On Thu, 27 Jul 2023 13:37:18 +0200, Slavko <linux at slavino.sk> wrote:

> Ahoj,
> 
> Dňa Thu, 27 Jul 2023 08:38:18 +0000 Colin Leroy-Mira via Users
> <users at lists.claws-mail.org> napísal:
> 
> > July 27, 2023 at 9:50 AM, "Slavko" <linux at slavino.sk> wrote:
> > 
> >   
> > > And another question, i use this regex to score Chinesse & etc chars
> > > (scripts) in subject in rspamd (perhaps can be useful for OP):
> > >  [\p{Han}\p{Hiragana}\p{Katakana}\p{Hangul}\p{Arabic}]+
> > > 
> > > Will that work in CM filter regexes?    
> > 
> > I'm unsure, you can test regexps in the QuickSearch in extended mode:
> > subject regexpcase "..."
> >   
> 
> does not work :-(

\p{Any} was added in pcre-10.33 in April 2019
current is pcre2-10.39 from October 2021

[\p{Han}\p{Hiragana}\p{Katakana}] (inside a character class) works in
perl since version 5.10.0 (December 2018)

Source code however reveals it does not (yet) support PCRE/PCRE2, but
uses standard regex/regexp. At least on Linux.

I would applaud PCRE support! I'd switch to it the moment it is available!

> Or, more precise, it seems to work in opposite direction, it founds
> everything except Chinese (or so) subjects. I tried some escaping, but
> that doesn't help at all.
> 
> I used dialog to create rule, result:
> 
>     subject regexp "[\\p{Han}\\p{Hiragana}\\p{Katakana}\\p{Hangul}]"
> 
> Don't worry, i am not interested in that, i was just curious...

-- 
H.Merijn Brand  https://tux.nl   Perl Monger   http://amsterdam.pm.org/
using perl5.00307 .. 5.37        porting perl5 on HP-UX, AIX, and Linux
https://tux.nl/email.html http://qa.perl.org https://www.test-smoke.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://lists.claws-mail.org/pipermail/users/attachments/20230727/3f22428d/attachment.sig>