[Users] Questions regarding character encoding of text/plain attachments

Michael Gmelin freebsd at grem.de
Tue Oct 2 14:47:44 CEST 2012



On Tue, 2 Oct 2012 14:28:25 +0200
Ricardo Mones <ricardo at mones.org> wrote:

> On Mon, Oct 01, 2012 at 12:25:23AM +0200, Michael Gmelin wrote:
> > Hi,
> > 
> > I noticed the following issue:
> > 
> > When sending a text file attachment, claws uses content-type text
> > plain, even if it is encoded in UTF-8 and ends up base64-encoded.
> > 
> > So the mime header of the attachment looks like this:
> > 
> > Content-Type: text/plain
> > Content-Transfer-Encoding: base64
> > Content-Disposition: attachment; filename=china.txt
> > 
> > In some cases it would be preferable to have a header like this:
> > 
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: base64
> > Content-Disposition: attachment; filename=china.txt
> 
>   Maybe it should be set to UTF-8 always for text/plain in the cases
> where the current code does not add a charset. ASCII only attachments
> would work anyway as that's a subset of UTF-8.
>  

I just realized that this is more like a follow/duplicate of my earlier
request, sorry for that. 
> > Questions:
> > 1. Is there a reasonable way to auto-detect and set the encoding?
> 
>   http://code.google.com/p/uchardet/
> 
> > 2. If not, is there a way to make this happen on user request (like,
> >    selecting the encoding)?
> 
>   Not currently, but a patch is welcome :)

As I learned by checking the Properties dialog (which I never did
before on an attachment, thanks for that), you can actually just add
the encoding after the mime-type, like "text/plain; charset=UTF-8". Not
nice, but it actually works.

I would be willing to contribute a patch though, that allows:
- Selecting the encoding of an attachment
- Specifying a default based on various conditions

Not sure about a time line thought :)

> 
> > 3. If not, what is the rationale for not doing this. I could imagine
> >    something like "the receiving system should assume UTF-8 in the
> >    absence of a character encoding specification" or "the receiving
> >    system should handle encodings transparently". In this case it
> > would be good to get some reference supporting one or both of these
> >    arguments (RFC anyone?)
> 
>   My response to your other mail has several RFC references, and UTF-8
> is not the default assumption for text/plain without charset. Anyway,
> since automatic charset detection is equally flawed on any side, I
> don't think such transparent handling is possible.

My argument would be: If it's ascii, don't try to change it - this way
UTF-8 can pass through transparently. But that can be hard on legacy
systems. That's definitely not Claws responsibility though.

> At most all the
> MUAs can do would be a) suggest some encoding for sending, and b)
> allow changing it, and c) suggest some encoding for reading, and d)
> allow changing it.
> 
>   Claws Mail currently lacks b), and a) probably can be improved.
>   AFAIK c) and d) are fully covered.

AFAIK b) is also there as well, but could be improved.

> 
>   regards,

Cheers,


-- 
Michael Gmelin



More information about the Users mailing list