[Users] RSSyl mistakes items for fresh with non-ASCII authors

wwp wwp at claws-mail.org
Sun Mar 16 11:36:48 UTC 2025


Hello Ivan,


On Sun, 16 Mar 2025 14:27:29 +0300 Ivan Krylov via Users <users at lists.claws-mail.org> wrote:

> Here's a mildly annoying problem that I have investigated but am not
> sure how to fix.
> 
> Consider a feed item with non-ASCII characters in the <dc:author>
> field. For example, the U+00A0 NO-BREAK SPACE on line 6944 at [1]:
> 
> >> <dc:creator>Jeffrey S.<c2><a0>Racine, Zhenghua Nie</dc:creator>  
> 
> RSSyl stores it as:
> 
> >> From: Jeffrey =?UTF-8?B?Uy7CoFJhY2luZSw=?= Zhenghua Nie  
> 
> When the feed is loaded again, the function rssyl_feed_item_changed()
> in rssyl_add_item.c compares the new copy of the feed item against the
> old, including the authors:
> 
> >> if( old_item->author && new_item->author ) {
> >> 	gchar *old = conv_unmime_header(old_item->author, CS_UTF_8,
> >>                                      TRUE);
> >> 	gchar *new = conv_unmime_header(new_item->author, CS_UTF_8,
> >>                                      TRUE);
> >> 	if( strcmp(old, new) ) { /* ... compare "unmimed" authors */
> >>		g_free(old);
> >> 		g_free(new);
> >> 		debug_print("RSSyl:\t\titem authors differ\n");
> >> 		return ITEM_CHANGED;  
> 
> Since the decoded string contains a comma, unmime_header() quotes it
> and returns (with the U+00A0 after the "S."):
> 
> >> Jeffrey "S. Racine," Zhenghua Nie  
> 
> (This is how it looks in the "From" column of the item list but not the
> message view.)
> 
> Since the strings now differ, the item is marked as unread.
> 
> How to fix this discrepancy? Should rssyl_feed_item_changed() call
> conv_encode_header_full() (the same function that is used by
> rssyl_add_item() to store new entries) and compare the encoded string
> representations instead? Should it use a different MIME-decoding
> function?

Right, making use of the encoding func for comparison purposes makes
sure we're comparing what can be compared, otherwise comparing is just
useless (and a bug). That's the minimum fix that should go in
repository as soon as possible (I think I get the same false new issues
frequently, wasn't sure about the reason behind). Can't say yet about
the need for another encoding, this will be discussed after looking at
the code.


Regards,

-- 
wwp
https://useplaintext.email/


More information about the Users mailing list