[Users] [Bug 4428] New: punctuation stripping from URLs in text emails strips $ signs from end of URL

noreply at thewildbeast.co.uk noreply at thewildbeast.co.uk
Thu Dec 31 15:59:21 CET 2020


https://www.thewildbeast.co.uk/claws-mail/bugzilla/show_bug.cgi?id=4428

            Bug ID: 4428
           Summary: punctuation stripping from URLs in text emails strips
                    $ signs from end of URL
           Product: Claws Mail
           Version: GIT
          Hardware: PC
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P3
         Component: UI/Message View
          Assignee: users at lists.claws-mail.org
          Reporter: rhaas at illinois.edu

When encountering an URL like:

https://urldefense.com/v3/__https://eff.org/r.o9g6__;!!DZ3fjg!srHW_CI4QzWk_Et7SAcZwTL_6C2bVOKE-ZLz9eesJB6afpP_kdt4-QeDMY9WyOMX$

in an email then the punctuation stripping code in get_uri_part will remove the
trailing "$" signs since they are considered a real punctuation by the
IS_REAL_PUNCT macro defined in that function (src/common/utils.c):

#define IS_REAL_PUNCT(ch)       (g_ascii_ispunct(ch) && !strchr("/?=-_~)", ch))

Unfortunately urldefense, used by my institution, has recently started to
construct their redirection emails to all end in "$" so I have to manually copy
each URL to a browser address bar and add the $ to it, rendering the URL
detection useless.

I simple fix would be to extend the list of characters in the strchr call in
the macro to include "$". A better one might be to use a list of punctuation
characters instead and make that list user configurable for cases of
non-English languages where the claws-authors might not know what is likely to
be a punctuation character (e.g. « and » in French).

Note that this is apparently something that is a know(ish) issue based on the
comment just above the macro:

/* FIXME: this stripping of trailing punctuations may bite with other URIs.
 * should pass some URI type to this function and decide on that whether
 * to perform punctuation stripping */

Given that punctuation stripping seems to be based on a heuristic of what
likely is expected to end a URL in an email, I do not, of course, know how
likely it is to find emails where "$" really should not be considered part of
the link, eg:

$$$Earn money now https://example.com/earn$$$

would likely indicate that one wold want to consider "$" to designate the end
of the URL here (in particular if bad HTML-text conversion was at work on the
sender's side).

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the Users mailing list