Opened 13 years ago

Closed 13 years ago

Last modified 13 years ago

#1010 closed defect (wontfix)

i18n: non-ASCII encoding broken on Konqueror, Safari

Reported by: Adam Peller Owned by: Adam Peller
Priority: high Milestone:
Component: General Version: 0.3
Keywords: Cc:
Blocked By: Blocking:

Description

Konqueror apparently does not honor UTF-8 by default in XHR

Change History (10)

comment:1 Changed 13 years ago by Adam Peller

Summary: i18n: non-ASCII encoding broken on Konquerori18n: non-ASCII encoding broken on Konqueror, Safari

comment:2 Changed 13 years ago by Adam Peller

http://twistedmatrix.com/pipermail/twisted-web/2005-February/001165.html

Unclear exactly when this was fixed, and I'm not sure this covers the case where no encoding is specified. Either way, we may be stuck doing the encoding on to old browsers?

comment:4 Changed 13 years ago by mumme

I can confirm that this apears in the latest kde3.5 branch so it isnt resolved yet, at least not for konqueror

This is cache related, seems like the Content-Type header isnt stored on cache.

After debugging this some bit with kdevelop on a related thing, I was trying to find out why a cached XHR got a 200 status on a async, but 304 on a sync XHR, it seems like the decoder looks for utf-8 BOM in the 3 start bytes, if that isnt found it looks for tags with charset info (<?xml or <meta )

it seems like would be able to get away with:

/* <?xml version="1.0" encoding="UTF-8" ?> */

in the top of your translation files

or use cache a buster

or save as UTF-16, iso-10646-ucs2

If you still would like to do the workaround that is..

/ Fredrik

comment:5 Changed 13 years ago by Adam Peller

Interesting. I'm not sure this is a content-type header bug, as I was relying on the default encoding to be UTF-8 without any content-type heading... but I was hoping that content-type could be used as a server-based workaround. Guess that's out.

Unfortunately, all of these workaround would break the current code, which assumes the contents of the file can be eval'd as a JS expression. I suppose we could introduce code to optionally eat an XML decl, but I'd hate to do this... other browsers assume UTF-8, so encoding in UTF-16 or iso would break them, I think.

For now, I think the only workaround is to encode with JavaScript? uxxxx escapes or the single byte equivalent for high ascii. A build script could do this.

comment:6 Changed 13 years ago by mumme

Well I looked some more into the khtml code and there are a number of issues, not just the content-type cache. I wrote a patch that seems to be working. Im going to try it a bit more before I send it to Kfm devel.

However the reason I put the uggly <?xml ... charset="UTF-8" inside a javascript comment was that it makes it eval'able. The auto detection decoder used in khtml XHR is the same as any other HTML/xml page and it does'nt care about a javascript comment.

I know its uggly but it must be cleaner than doing a build script replacement

/ Fredrik

comment:7 Changed 13 years ago by Adam Peller

ah... I missed the /* comments */ around the xml declaration. Clever workaround, even if it's a real kludge.

That's wonderful if you can help get a patch into khtml!

comment:8 Changed 13 years ago by dylan

Milestone: 0.4

comment:9 Changed 13 years ago by Adam Peller

Resolution: wontfix
Status: newclosed

Ok, so we have a workaround (thanks, Fredrik) and a bug filed against KDE. Not sure there's much more we can do. I checked in an example of the workaround into the tests and will add it when we have more detailed how-to documentation.

comment:10 Changed 13 years ago by (none)

Milestone: 0.4

Milestone 0.4 deleted

Note: See TracTickets for help on using tickets.