Opened 6 years ago
Last modified 2 years ago
#18072 assigned defect
dojo/request/node response encoding
Reported by: | tsofist | Owned by: | dylan |
---|---|---|---|
Priority: | high | Milestone: | 1.14 |
Component: | IO | Version: | 1.10.0-beta1 |
Keywords: | Cc: | ||
Blocked By: | Blocking: |
Description
Found that the data coming from the server do not match the encoding. Executes the query over:
Code highlighting:
var _bind = function (url, content) { return request(url, { headers: { "Content-Length": Buffer.byteLength(content), "Content-Type": "text/plain;charset=UTF-8", "Accept-Encoding": "UTF-8", "User-Agent": "E", "Accept-Charset": "UTF-8" }, formData: undefined, data: content, query: "", handleAs: "text", method: "POST" }); };
Response:
Code highlighting:
<?xml version="1.0"?> <SOAP-ENV:Envelope xmlns:SOAP-ENV="..." xmlns:xsi="..." xmlns:SOAP-ENC="..."> <SOAP-ENV:Body SOAP-ENC:encodingStyle="..."> <NS1:GetAvailableHotelListResponse xmlns:NS1="urn:webservice-electrasoft-ru:types-twsReservationServiceIntf-ItwsReservationService"> <return xsi:type="SOAP-ENC:Array" SOAP-ENC:arrayType="xsd:string[2119]"> <item>��лексик(Трускавец)</item> <item>Ате��ика Гамма курорт</item> <item>Горная Грузия 12 дней</item> <item>Georgia 3_5дней</item> <item>Новотель Шереметьево</item> </return> </NS1:GetAvailableHotelListResponse> </SOAP-ENV:Body> </SOAP-ENV:Envelope>
What a � character?
I thought about it a little: http://nodejs.org/api/http.html#http_http_request_options_callback
In the end, making a small change in the dojo/request/node:
Code highlighting:
req.on('response', function(clientResponse){ clientResponse.setEncoding("utf8"); ...
After that I got the desired result:
Code highlighting:
<?xml version="1.0"?> <SOAP-ENV:Envelope xmlns:SOAP-ENV="..." xmlns:xsi="..." xmlns:SOAP-ENC="..."> <SOAP-ENV:Body SOAP-ENC:encodingStyle="..."> <NS1:GetAvailableHotelListResponse xmlns:NS1="urn:webservice-electrasoft-ru:types-twsReservationServiceIntf-ItwsReservationService"> <return xsi:type="SOAP-ENC:Array" SOAP-ENC:arrayType="xsd:string[2119]"> <item>Алексик(Трускавец)</item> <item>Ателика Гамма курорт</item> <item>Горная Грузия 12 дней</item> <item>Georgia 3_5дней</item> <item>Новотель Шереметьево</item> </return> </NS1:GetAvailableHotelListResponse> </SOAP-ENV:Body> </SOAP-ENV:Envelope>
I'm not sure about the competence of the decision, but would love to see any result on this issue in the upcoming version.
Thank you very much! ..sorry my poor English
Change History (15)
comment:1 Changed 6 years ago by
Component: | Core → IO |
---|---|
Milestone: | tbd → 1.10 |
Owner: | set to dylan |
Priority: | undecided → high |
Status: | new → assigned |
comment:2 Changed 6 years ago by
Technically since we've already cut the 1.10 RC, this shouldn't go into 1.10 unless it's a regression.
comment:3 Changed 6 years ago by
Milestone: | 1.10 → 1.11 |
---|
Fair enough, it's probably a bug more than a regression.
comment:4 Changed 6 years ago by
I've investigated this isue a little bit and this what I found out:
"clientResponse" as the argument of the "response" event handler is an WritableStream? instance, "data" event handler of which has one argument (it appears as "chunk" in the code). Here we are faced with known streaming feature (WritableStream?) - if the encoding is not specified for it the data are represented as an instance of Buffer. If the stream encoding is initially set there won't be any problems with concatenation, which occurs on "data" event.
But you are using different way by gathering Buffer instances into an array and then execute body.join() , which is equivalent to loop through items with implicit call item.toString() (even if toString without any arguments it is equivalent to toString("utf8")). But by unknown reason the concatenation of these data leads to the isues I've mentioned before.
Taking all of the above, It seems to me this code has never worked correctly and requires changes that I've suggested above.
comment:5 Changed 6 years ago by
A simple example with a Russian site, as a demonstration of the problem:
Code highlighting:
var options = { host: "habrahabr.ru", port: 80, path: "/company/knopka/blog/225027/", method: "POST" }; var mCharPattern = /?/g; var reqBad = http.request(options, function(res) { console.log("\n*** [BAD REQUEST] ***\n", "\nSTATUS: " + res.statusCode, "\nHEADERS: " + JSON.stringify(res.headers)); var data = []; res.on("data", function (chunk) { data.push(chunk); }); res.on("end", function () { var res = ""; data.forEach(function (item) { res += item.toString("utf8"); }); console.log("CHUNKS: ", data.length, "\nDATA LEN: ", res.length); var matchRes = res.match(mCharPattern); if (matchRes) console.log("Mystical symbols count: ", matchRes.length); else console.log("No mystical symbols!"); }); }); // write data to request body reqBad.write("data\n"); reqBad.write("data\n"); reqBad.end(); var reqOk = http.request(options, function(res) { console.log("\n*** [GOOD REQUEST] ***\n", "\nSTATUS: " + res.statusCode, "\nHEADERS: " + JSON.stringify(res.headers)); res.setEncoding("utf8"); var data = []; res.on("data", function (chunk) { data.push(chunk); }); res.on("end", function () { var res = data.join(""); console.log("CHUNKS: ", data.length, "\nDATA LEN: ", res.length); var matchRes = res.match(mCharPattern); if (matchRes) console.log("Mystical symbols count: ", matchRes.length); else console.log("No mystical symbols!"); }); }); // write data to request body reqOk.write("data\n"); reqOk.write("data\n"); reqOk.end();
Well, what about 1.10? =)
comment:6 Changed 6 years ago by
Mystical symbol - a problem for Code highlighting Perhaps as displayed correctly: var mCharPattern = /�/g;
comment:7 Changed 6 years ago by
1.10 is past the release candidate stage and scheduled to be released next week.
We will fix for 1.11, and then consider backporting to 1.10.x
Thanks for the detailed explanation @tsofist.
comment:9 Changed 4 years ago by
Priority: | high → blocker |
---|
comment:10 Changed 4 years ago by
The proposed patch would always force it to be UTF8 encoded. I think we probably instead need to check the headers passed in and set the encoding based on that, when they are specified.
comment:11 Changed 4 years ago by
I agree. But what if you can not match the encoding? Returns a buffer?
comment:12 Changed 4 years ago by
Priority: | blocker → high |
---|
comment:13 Changed 4 years ago by
Milestone: | 1.11 → 1.12 |
---|
Ok, after massive triage, ended up with about 80 tickets for 1.11 and 400 or so for 1.12. That's a bit unrealistic, so first I changed all 1.12 to 1.13 (with the plan to move some forward to the new 1.12. Now, I'm moving some of the 1.11 tickets that are less likely to get done this month without help to 1.11. Feel free to help out in January if you want to see this ticket land in 1.11.
comment:14 Changed 3 years ago by
Milestone: | 1.12 → 1.13 |
---|
Ticket planning... move current 1.12 tickets out to 1.13 that likely won't get fixed in 1.12.
comment:15 Changed 2 years ago by
Milestone: | 1.13 → 1.14 |
---|
Not sure I can get to this for 1.10, but I'll see if this can be easily fixed.