Opened 5 years ago

Last modified 20 months ago

#18072 assigned defect

dojo/request/node response encoding

Reported by: tsofist Owned by: dylan
Priority: high Milestone: 1.14
Component: IO Version: 1.10.0-beta1
Keywords: Cc:
Blocked By: Blocking:

Description

Found that the data coming from the server do not match the encoding. Executes the query over:

Code highlighting:

var _bind = function (url, content) {
    return request(url, {
        headers:         {
            "Content-Length": Buffer.byteLength(content),
            "Content-Type": "text/plain;charset=UTF-8",
            "Accept-Encoding": "UTF-8",
            "User-Agent": "E",
            "Accept-Charset": "UTF-8"
        },
        formData:        undefined,
        data:            content,
        query:           "",
        handleAs:        "text",
        method:          "POST"
    });
};

Response:

Code highlighting:

<?xml version="1.0"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="..." xmlns:xsi="..." xmlns:SOAP-ENC="...">
        <SOAP-ENV:Body SOAP-ENC:encodingStyle="...">
                <NS1:GetAvailableHotelListResponse xmlns:NS1="urn:webservice-electrasoft-ru:types-twsReservationServiceIntf-ItwsReservationService">
                        <return xsi:type="SOAP-ENC:Array" SOAP-ENC:arrayType="xsd:string[2119]">
                                <item>��лексик(Трускавец)</item>
                                <item>Ате��ика Гамма курорт</item>
                                <item>Горная Грузия 12 дней</item>
                                <item>Georgia 3_5дней</item>
                                <item>Новотель Шереметьево</item>
                        </return>
                </NS1:GetAvailableHotelListResponse>
        </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

What a � character?

I thought about it a little: http://nodejs.org/api/http.html#http_http_request_options_callback

In the end, making a small change in the dojo/request/node:

Code highlighting:

      req.on('response', function(clientResponse){
          clientResponse.setEncoding("utf8");
      ...

After that I got the desired result:

Code highlighting:

<?xml version="1.0"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="..." xmlns:xsi="..." xmlns:SOAP-ENC="...">
        <SOAP-ENV:Body SOAP-ENC:encodingStyle="...">
                <NS1:GetAvailableHotelListResponse xmlns:NS1="urn:webservice-electrasoft-ru:types-twsReservationServiceIntf-ItwsReservationService">
                        <return xsi:type="SOAP-ENC:Array" SOAP-ENC:arrayType="xsd:string[2119]">
                                <item>Алексик(Трускавец)</item>
                                <item>Ателика Гамма курорт</item>
                                <item>Горная Грузия 12 дней</item>
                                <item>Georgia 3_5дней</item>
                                <item>Новотель Шереметьево</item>
                        </return>
                </NS1:GetAvailableHotelListResponse>
        </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

I'm not sure about the competence of the decision, but would love to see any result on this issue in the upcoming version.

Thank you very much! ..sorry my poor English

Change History (15)

comment:1 Changed 5 years ago by dylan

Component: CoreIO
Milestone: tbd1.10
Owner: set to dylan
Priority: undecidedhigh
Status: newassigned

Not sure I can get to this for 1.10, but I'll see if this can be easily fixed.

comment:2 Changed 5 years ago by bill

Technically since we've already cut the 1.10 RC, this shouldn't go into 1.10 unless it's a regression.

comment:3 Changed 5 years ago by dylan

Milestone: 1.101.11

Fair enough, it's probably a bug more than a regression.

comment:4 Changed 5 years ago by tsofist

I've investigated this isue a little bit and this what I found out:

"clientResponse" as the argument of the "response" event handler is an WritableStream? instance, "data" event handler of which has one argument (it appears as "chunk" in the code). Here we are faced with known streaming feature (WritableStream?) - if the encoding is not specified for it the data are represented as an instance of Buffer. If the stream encoding is initially set there won't be any problems with concatenation, which occurs on "data" event.

But you are using different way by gathering Buffer instances into an array and then execute body.join() , which is equivalent to loop through items with implicit call item.toString() (even if toString without any arguments it is equivalent to toString("utf8")). But by unknown reason the concatenation of these data leads to the isues I've mentioned before.

Taking all of the above, It seems to me this code has never worked correctly and requires changes that I've suggested above.

Last edited 5 years ago by tsofist (previous) (diff)

comment:5 Changed 5 years ago by tsofist

A simple example with a Russian site, as a demonstration of the problem:

Code highlighting:

      var options = {
          host: "habrahabr.ru",
          port: 80,
          path: "/company/knopka/blog/225027/",
          method: "POST"
      };

      var mCharPattern = /?/g;

      var reqBad = http.request(options, function(res) {
          console.log("\n*** [BAD REQUEST] ***\n", "\nSTATUS: " + res.statusCode, "\nHEADERS: " + JSON.stringify(res.headers));
          var data = [];
          res.on("data", function (chunk) {
              data.push(chunk);
          });
          res.on("end", function () {
              var res = "";
              data.forEach(function (item) {
                  res += item.toString("utf8");
              });
              console.log("CHUNKS: ", data.length, "\nDATA LEN: ", res.length);
              var matchRes = res.match(mCharPattern);
              if (matchRes)
                  console.log("Mystical symbols count: ", matchRes.length);
              else
                  console.log("No mystical symbols!");
          });
      });
      // write data to request body
      reqBad.write("data\n");
      reqBad.write("data\n");
      reqBad.end();

      var reqOk = http.request(options, function(res) {
          console.log("\n*** [GOOD REQUEST] ***\n", "\nSTATUS: " + res.statusCode, "\nHEADERS: " + JSON.stringify(res.headers));
          res.setEncoding("utf8");
          var data = [];
          res.on("data", function (chunk) {
              data.push(chunk);
          });
          res.on("end", function () {
              var res = data.join("");
              console.log("CHUNKS: ", data.length, "\nDATA LEN: ", res.length);
              var matchRes = res.match(mCharPattern);
              if (matchRes)
                  console.log("Mystical symbols count: ", matchRes.length);
              else
                  console.log("No mystical symbols!");
          });
      });
      // write data to request body
      reqOk.write("data\n");
      reqOk.write("data\n");
      reqOk.end();

Well, what about 1.10? =)

Last edited 5 years ago by tsofist (previous) (diff)

comment:6 Changed 5 years ago by tsofist

Mystical symbol - a problem for Code highlighting Perhaps as displayed correctly: var mCharPattern = /�/g;

comment:7 Changed 5 years ago by dylan

1.10 is past the release candidate stage and scheduled to be released next week.

We will fix for 1.11, and then consider backporting to 1.10.x

Thanks for the detailed explanation @tsofist.

comment:8 Changed 5 years ago by tsofist

Let's wait for new versions. Thank you very much, Dylan!

comment:9 Changed 3 years ago by dylan

Priority: highblocker

comment:10 Changed 3 years ago by dylan

The proposed patch would always force it to be UTF8 encoded. I think we probably instead need to check the headers passed in and set the encoding based on that, when they are specified.

comment:11 Changed 3 years ago by tsofist

I agree. But what if you can not match the encoding? Returns a buffer?

comment:12 Changed 3 years ago by dylan

Priority: blockerhigh

comment:13 Changed 3 years ago by dylan

Milestone: 1.111.12

Ok, after massive triage, ended up with about 80 tickets for 1.11 and 400 or so for 1.12. That's a bit unrealistic, so first I changed all 1.12 to 1.13 (with the plan to move some forward to the new 1.12. Now, I'm moving some of the 1.11 tickets that are less likely to get done this month without help to 1.11. Feel free to help out in January if you want to see this ticket land in 1.11.

comment:14 Changed 3 years ago by dylan

Milestone: 1.121.13

Ticket planning... move current 1.12 tickets out to 1.13 that likely won't get fixed in 1.12.

comment:15 Changed 20 months ago by dylan

Milestone: 1.131.14
Note: See TracTickets for help on using tickets.