[Users] How to filter utf8 messages
H.Merijn Brand
linux at tux.freedom.nl
Thu Jul 27 11:58:54 UTC 2023
On Thu, 27 Jul 2023 13:37:18 +0200, Slavko <linux at slavino.sk> wrote:
> Ahoj,
>
> Dňa Thu, 27 Jul 2023 08:38:18 +0000 Colin Leroy-Mira via Users
> <users at lists.claws-mail.org> napísal:
>
> > July 27, 2023 at 9:50 AM, "Slavko" <linux at slavino.sk> wrote:
> >
> >
> > > And another question, i use this regex to score Chinesse & etc chars
> > > (scripts) in subject in rspamd (perhaps can be useful for OP):
> > > [\p{Han}\p{Hiragana}\p{Katakana}\p{Hangul}\p{Arabic}]+
> > >
> > > Will that work in CM filter regexes?
> >
> > I'm unsure, you can test regexps in the QuickSearch in extended mode:
> > subject regexpcase "..."
> >
>
> does not work :-(
\p{Any} was added in pcre-10.33 in April 2019
current is pcre2-10.39 from October 2021
[\p{Han}\p{Hiragana}\p{Katakana}] (inside a character class) works in
perl since version 5.10.0 (December 2018)
Source code however reveals it does not (yet) support PCRE/PCRE2, but
uses standard regex/regexp. At least on Linux.
I would applaud PCRE support! I'd switch to it the moment it is available!
> Or, more precise, it seems to work in opposite direction, it founds
> everything except Chinese (or so) subjects. I tried some escaping, but
> that doesn't help at all.
>
> I used dialog to create rule, result:
>
> subject regexp "[\\p{Han}\\p{Hiragana}\\p{Katakana}\\p{Hangul}]"
>
> Don't worry, i am not interested in that, i was just curious...
--
H.Merijn Brand https://tux.nl Perl Monger http://amsterdam.pm.org/
using perl5.00307 .. 5.37 porting perl5 on HP-UX, AIX, and Linux
https://tux.nl/email.html http://qa.perl.org https://www.test-smoke.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://lists.claws-mail.org/pipermail/users/attachments/20230727/3f22428d/attachment.sig>
More information about the Users
mailing list