[Users] How to filter utf8 messages
Leon Fisk
lfiskgr at gmail.com
Thu Jul 27 20:08:58 UTC 2023
On Thu, 27 Jul 2023 07:22:16 -0400
Pierre Fortin <pf at pfortin.com> wrote:
>On Thu, 27 Jul 2023 08:38:18 +0000 Colin Leroy-Mira via Users wrote:
>
>>July 27, 2023 at 9:50 AM, "Slavko" <linux at slavino.sk> wrote:
>>
>>
>>> And another question, i use this regex to score Chinesse & etc chars
>>> (scripts) in subject in rspamd (perhaps can be useful for OP):
>>> [\p{Han}\p{Hiragana}\p{Katakana}\p{Hangul}\p{Arabic}]+
>>>
>>> Will that work in CM filter regexes?
>>
>>I'm unsure, you can test regexps in the QuickSearch in extended mode:
>>subject regexpcase "..."
>>
>
>Wow! I've never seen that regex syntax and my ancient O'Reilly (7/1997)
>"Mastering Regular Expressions" book does not cover it. Therein,
>{min:max} is the "interval" quantifier, and no mention of "\p".
>I tried the suggested regex in QuickSearch; but no hits. The messages
>I'm trying to filter are mostly in Chinese & Japanese; are these covered
>by the above suggestion?
This works, but needs more testing and probably a few more characters
for other languages.
subject regexp "[えの理]+"
I was testing it with Quick Search in my Claws List directory with:
body_part regexp "[えの理]+"
"\p{Han}" seems to be Java syntax. Some info on this Wiki page:
https://en.wikipedia.org/wiki/Regular_expression#Character_classes
--
Leon
Claws 3.19.0, Debian
More information about the Users
mailing list