Opened 8 years ago

Closed 3 years ago

Last modified 3 years ago

#8995 closed enhancement (fixed)

[patch] [cla] reusable html escape function

Reported by: matthw Owned by: bpayton
Priority: high Milestone: 1.10
Component: Core Version: 1.3.0rc2
Keywords: escaping, xss, innerText, html, escape, dijit, attributeMap Cc:
Blocked by: Blocking:

Description

Two things:

Dojo seems to lack a reusable core implementation of html escaping. This is something pretty basic when doing client-side templating and html work, and seems to be re-implemented in various dijit and dojox modules.

In 1.2 the following all implement html escaping themselves:

dijit/_editor/html.js
dijit/_editor/RichText.js
dijit/form/ComboBox.js
dijit/form/TextArea.js
dijit/InlineEditBox.js
dojox/xmpp/util.js

And the story doesn't seem much better in 1.3.

The other (connected) thing: It's actually rather annoying when you have slots in a dijit template which you want to fill with XSS-safe, correctly-escaped text, rather than HTML.

At present we have to do, eg

<span dojoAttachPoint="titleNode"></span>

this.titleNode.innerHTML = '';
// or maybe dojo.html.set(this.title '')

this.titleNode.appendChild(dojo.doc.createTextNode(title))

it would be nice if you could do, eg

dojo.html.set(this.titleNode, title, {asText: true})

or

dojo.setInnerText(this.titleNode, title)
dojo.html.setInnerText(this.titleNode, title)

It would also be nice if you could use dijit's attributeMap magic with text-based setters as well as html-based. eg:

attributeMap: {
  title: { node: "titleNode", type: "innerText" }
}

rather than the example given where the title attribute gets set as innerHTML. When you're dealing with dynamic or user-input content you'll almost always want to be treating it as text rather than as html... or at least so I've found. So the option would be nice.

Attachments (1)

[PATCH][CLA]string.js.diff (890 bytes) - added by Paul Christopher 4 years ago.

Download all attachments as: .zip

Change History (28)

comment:1 Changed 8 years ago by peller

  • Component changed from General to Dijit
  • Owner anonymous deleted

You might wish to file unrelated comments separately or they'll be hard to track here.

The escape XML problem has been so noted in the code, though I don't know if there's a ticket for it. The code is a one-liner, so I think we were hesitant to place it in a module for fear of code bloat (include the module and you get unrelated stuff) OTOH, it's commonly used and isn't entirely straightforward.

As for the template substitution, you can substitute variables using the following syntax: ${!foo} and that will avoid the escapes. Otherwise, what you propose only seems like a few lines of code.

comment:2 Changed 8 years ago by bill

I agree this would be nice to add to attributeMap... actually I think it should be the default.

One problem I have though is that the spec wasn't well defined when all this was added, so for something like Button, I think of myButton.attr('label', 'hello world') as taking plain-text but I'm sure there's *somebody* that's using it to insert HTML and will complain vehemently if I fix the "bug" inserting special characters.

comment:3 in reply to: ↑ description Changed 8 years ago by matthw

Yup, good point, have made a separate ticket #9000 for the attributeMap stuff (and setInnerText, which would be used to implement the innerText attribute type).

peller: I understand re code bloat, but I feel like it wouldn't hurt to put it in dojo.html or dojo.string, if not dojo._base.html. It seems of equivalent usefulness to a lot of the code already in dojo._base.html. If it goes in an optional module and widgets really care a lot about avoiding dependencies they'd still be free to re-implement it if they want to. It's just nice to have a canonical implementation somewhere in the library which one can use if one wishes.

About the exclamation mark syntax in template substitutions, wasn't aware of that but just spotted it in dijit._Templated source. Perhaps could use documenting somewhere, I always assumed it was using vanilla dojo.string.substitute.

That said it is not performed full HTML escaping by default, just escaping quotes:

value.toString().replace(/"/g,"&quot;"); //TODO: add &amp? use encodeXML method?

Which is not really much use unless you're only substituting into a HTML attribute, and even then should (as the comment notes) be escaping & -> &amp; beforehand.

bill: re attributeMap, I agree it should probably be the default too, but I don't mind if innerHTML remains the default on existing dijits for now, provided I can have innerText as an option when writing my own widgets. Happy to make a patch adding this, see #9000

comment:4 Changed 8 years ago by dante

plugd has this, FYI. It gets requested alot. perhaps it should be dojo.string.escape module.

http://code.google.com/p/plugd/source/browse/trunk/escape.js

comment:5 Changed 8 years ago by jburke

  • Component changed from Dijit to Core
  • Milestone changed from tbd to 1.4
  • Owner set to jburke

I have needed this regularly too, it should go in core, probably around dojo.string somewhere. I"ll convert this ticket to be a core ticket to track that change since the attributeMap stuff is now a separate ticket.

comment:6 Changed 8 years ago by jburke

  • Owner changed from jburke to phiggins

I am open to something based on phiggin's plugd escape code, but we just need escape, no decode, and the solution should have some benefit over just doing:

text.replace(/</g, "lt;").replace(/>/g, "gt;").replace(/\&/g, "amp;").replace(/\"/g, "quot;")

dojo.string seems like a good place for it. Passing to phiggins for him to play with.

comment:7 Changed 8 years ago by nic

  • Owner changed from phiggins to dante

comment:8 Changed 8 years ago by bill

  • Milestone changed from 1.4 to 1.5
  • Summary changed from No reusable html escape function, and no easy way to safely insert text (not html) into a dijit template via attributeMap or dojo.html to No reusable html escape function

Updating title since half of this was split to #9000. And updating milestone since we passed 1.4 beta deadline.

comment:9 Changed 7 years ago by ceefour

It's really strange. Dojo has it in version 0.4 but removed it in 0.9:

http://wolfram.kriesing.de/blog/index.php/2007/dojo-migration-04-to-09

<blockquote>dojo.string.escape() is gone too. I just copied the old 0.4 function (dojo.string.escapeXml() see here for the source) into my local project namespace, I didnt find a replacement in dojo0.9 yet :-(. Probably also a simple thing like the removeNode thing, but I don’t know yet.</blockquote>

Looking at the difference between:

http://api.dojotoolkit.org/jsdoc/1.2/dojo.string.substitute

and

http://api.dojotoolkit.org/jsdoc/HEAD/dojo.string.substitute

it seems this is intentional, that people need to use their own escaping function.

I disagree, this (html escaping) should be part of Dojo core.

comment:10 Changed 7 years ago by ceefour

I like the implementation here:

http://snipplr.com/view.php?codeview&id=13954

My version of it (dojo-ized, is this correct?):

dojo.provide('soluvas.string');

soluvas.string.escapeHTML = function(str) {
    var div  = document.createElement('div');
    var text = document.createTextNode(str);
    div.appendChild(text);
    return div.innerHTML;
};
	
soluvas.string.unescapeHTML = function(str) {
    var div       = document.createElement('div');
    div.innerHTML = str.replace(/<\/?[^>]+>/gi, '');
    return div.childNodes[0] ? div.childNodes[0].nodeValue : '';
};

comment:11 Changed 7 years ago by peller

  • Milestone changed from 1.5 to 1.6

comment:12 Changed 6 years ago by dante

  • Milestone changed from 1.6 to 1.7
  • Status changed from new to assigned

i mean it this time.

comment:13 Changed 6 years ago by chrism

  • Milestone changed from 1.7 to 1.8

missed 1.7 cutoff, 1.8 will open soon.

comment:15 Changed 5 years ago by Paul Christopher

http://code.google.com/p/google-caja/source/browse/trunk/src/com/google/caja/plugin/html-sanitizer.js

Cannot really judge this: But I would have expected that a full-blown js framework has some built in methods to escape and/or sanitize user input. I do it now on the server with https://github.com/theSmaw/Caja-HTML-Sanitizer (which wraps the above mentioned project for node.js).

Last edited 5 years ago by Paul Christopher (previous) (diff)

comment:16 Changed 5 years ago by csnover

  • Milestone changed from 1.8 to 2.0

1.8 is frozen. Move all enhancements to next release. If you need an exemption from the freeze for this ticket, contact me immediately.

comment:17 Changed 4 years ago by Mark Hug

I am a spammer. I hate myself.

Last edited 4 years ago by csnover (previous) (diff)

comment:18 Changed 4 years ago by dylan

  • Resolution set to patchwelcome
  • Status changed from assigned to closed

A patch would be welcome to fix this. Otherwise I don't think it will get fixed given the lack of activity.

comment:19 Changed 4 years ago by Paul Christopher

Added a patch for a dojo.string.escape function as well as for a dojo.string.isNullOrEmpty function.

The HTML escape function is based on underscore.js and prototype.js, see http://stackoverflow.com/questions/6020714/escape-html-using-jquery

Changed 4 years ago by Paul Christopher

comment:20 Changed 4 years ago by bill

  • Resolution patchwelcome deleted
  • Status changed from closed to reopened
  • Summary changed from No reusable html escape function to [patch] [cla] reusable html escape function

comment:21 Changed 4 years ago by dylan

  • Milestone changed from 2.0 to 1.10

Another fundamental thing that we should have fixed a few years ago. Let's target this for 1.10 please.

comment:22 Changed 3 years ago by ido

Note: Using a regexp looks faster than using textNode: http://jsperf.com/escaping-html-ido

comment:23 Changed 3 years ago by ido

Pull request available here : https://github.com/dojo/dojo/pull/60

comment:24 Changed 3 years ago by dylan

  • Owner changed from dante to bpayton
  • Status changed from reopened to assigned

comment:25 Changed 3 years ago by mm

And please, think in codePoints not chars. again evaluate simple and quite fast enough a.in.the.k solution.

comment:26 Changed 3 years ago by ben hockey <neonstalwart@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In a361d74ebec562ccc22e59ebfbc8bfd94e2008b4/dojo:

re-usable text escaping

fixes #8995

comment:27 Changed 3 years ago by neonstalwart

mm - you're welcome to open a PR with your approach and some additional tests to show the advantages of your approach. for the sake of having something rather than nothing, i've merged what was provided so far.

Note: See TracTickets for help on using tickets.