[Users] RSSyl mistakes items for fresh with non-ASCII authors
Ivan Krylov
ikrylov at disroot.org
Sun Mar 16 11:27:29 UTC 2025
Hello!
Here's a mildly annoying problem that I have investigated but am not
sure how to fix.
Consider a feed item with non-ASCII characters in the <dc:author>
field. For example, the U+00A0 NO-BREAK SPACE on line 6944 at [1]:
>> <dc:creator>Jeffrey S.<c2><a0>Racine, Zhenghua Nie</dc:creator>
RSSyl stores it as:
>> From: Jeffrey =?UTF-8?B?Uy7CoFJhY2luZSw=?= Zhenghua Nie
When the feed is loaded again, the function rssyl_feed_item_changed()
in rssyl_add_item.c compares the new copy of the feed item against the
old, including the authors:
>> if( old_item->author && new_item->author ) {
>> gchar *old = conv_unmime_header(old_item->author, CS_UTF_8,
>> TRUE);
>> gchar *new = conv_unmime_header(new_item->author, CS_UTF_8,
>> TRUE);
>> if( strcmp(old, new) ) { /* ... compare "unmimed" authors */
>> g_free(old);
>> g_free(new);
>> debug_print("RSSyl:\t\titem authors differ\n");
>> return ITEM_CHANGED;
Since the decoded string contains a comma, unmime_header() quotes it
and returns (with the U+00A0 after the "S."):
>> Jeffrey "S. Racine," Zhenghua Nie
(This is how it looks in the "From" column of the item list but not the
message view.)
Since the strings now differ, the item is marked as unread.
How to fix this discrepancy? Should rssyl_feed_item_changed() call
conv_encode_header_full() (the same function that is used by
rssyl_add_item() to store new entries) and compare the encoded string
representations instead? Should it use a different MIME-decoding
function?
--
Best regards,
Ivan
[1] https://journal.r-project.org/articles.xml
More information about the Users
mailing list