#8995 closed enhancement (fixed)
[patch] [cla] reusable html escape function
Reported by: | matthw | Owned by: | bpayton |
---|---|---|---|
Priority: | high | Milestone: | 1.10 |
Component: | Core | Version: | 1.3.0rc2 |
Keywords: | escaping, xss, innerText, html, escape, dijit, attributeMap | Cc: | |
Blocked By: | Blocking: |
Description
Two things:
Dojo seems to lack a reusable core implementation of html escaping. This is something pretty basic when doing client-side templating and html work, and seems to be re-implemented in various dijit and dojox modules.
In 1.2 the following all implement html escaping themselves:
dijit/_editor/html.js dijit/_editor/RichText.js dijit/form/ComboBox.js dijit/form/TextArea.js dijit/InlineEditBox.js dojox/xmpp/util.js
And the story doesn't seem much better in 1.3.
The other (connected) thing: It's actually rather annoying when you have slots in a dijit template which you want to fill with XSS-safe, correctly-escaped text, rather than HTML.
At present we have to do, eg
<span dojoAttachPoint="titleNode"></span> this.titleNode.innerHTML = ''; // or maybe dojo.html.set(this.title '') this.titleNode.appendChild(dojo.doc.createTextNode(title))
it would be nice if you could do, eg
dojo.html.set(this.titleNode, title, {asText: true})
or
dojo.setInnerText(this.titleNode, title) dojo.html.setInnerText(this.titleNode, title)
It would also be nice if you could use dijit's attributeMap magic with text-based setters as well as html-based. eg:
attributeMap: { title: { node: "titleNode", type: "innerText" } }
rather than the example given where the title attribute gets set as innerHTML. When you're dealing with dynamic or user-input content you'll almost always want to be treating it as text rather than as html... or at least so I've found. So the option would be nice.
Attachments (1)
Change History (28)
comment:1 Changed 12 years ago by
Component: | General → Dijit |
---|---|
Owner: | anonymous deleted |
comment:2 Changed 12 years ago by
I agree this would be nice to add to attributeMap... actually I think it should be the default.
One problem I have though is that the spec wasn't well defined when all this was added, so for something like Button, I think of myButton.attr('label', 'hello world') as taking plain-text but I'm sure there's *somebody* that's using it to insert HTML and will complain vehemently if I fix the "bug" inserting special characters.
comment:3 Changed 12 years ago by
Yup, good point, have made a separate ticket #9000 for the attributeMap stuff (and setInnerText, which would be used to implement the innerText attribute type).
peller: I understand re code bloat, but I feel like it wouldn't hurt to put it in dojo.html or dojo.string, if not dojo._base.html. It seems of equivalent usefulness to a lot of the code already in dojo._base.html. If it goes in an optional module and widgets really care a lot about avoiding dependencies they'd still be free to re-implement it if they want to. It's just nice to have a canonical implementation somewhere in the library which one can use if one wishes.
About the exclamation mark syntax in template substitutions, wasn't aware of that but just spotted it in dijit._Templated source. Perhaps could use documenting somewhere, I always assumed it was using vanilla dojo.string.substitute.
That said it is not performed full HTML escaping by default, just escaping quotes:
value.toString().replace(/"/g,"""); //TODO: add &? use encodeXML method?
Which is not really much use unless you're only substituting into a HTML attribute, and even then should (as the comment notes) be escaping & -> & beforehand.
bill: re attributeMap, I agree it should probably be the default too, but I don't mind if innerHTML remains the default on existing dijits for now, provided I can have innerText as an option when writing my own widgets. Happy to make a patch adding this, see #9000
comment:4 Changed 12 years ago by
plugd has this, FYI. It gets requested alot. perhaps it should be dojo.string.escape module.
http://code.google.com/p/plugd/source/browse/trunk/escape.js
comment:5 Changed 12 years ago by
Component: | Dijit → Core |
---|---|
Milestone: | tbd → 1.4 |
Owner: | set to James Burke |
I have needed this regularly too, it should go in core, probably around dojo.string somewhere. I"ll convert this ticket to be a core ticket to track that change since the attributeMap stuff is now a separate ticket.
comment:6 Changed 11 years ago by
Owner: | changed from James Burke to phiggins |
---|
I am open to something based on phiggin's plugd escape code, but we just need escape, no decode, and the solution should have some benefit over just doing:
text.replace(/</g, "lt;").replace(/>/g, "gt;").replace(/\&/g, "amp;").replace(/\"/g, "quot;")
dojo.string seems like a good place for it. Passing to phiggins for him to play with.
comment:7 Changed 11 years ago by
Owner: | changed from phiggins to dante |
---|
comment:8 Changed 11 years ago by
Milestone: | 1.4 → 1.5 |
---|---|
Summary: | No reusable html escape function, and no easy way to safely insert text (not html) into a dijit template via attributeMap or dojo.html → No reusable html escape function |
Updating title since half of this was split to #9000. And updating milestone since we passed 1.4 beta deadline.
comment:9 Changed 11 years ago by
It's really strange. Dojo has it in version 0.4 but removed it in 0.9:
http://wolfram.kriesing.de/blog/index.php/2007/dojo-migration-04-to-09
<blockquote>dojo.string.escape() is gone too. I just copied the old 0.4 function (dojo.string.escapeXml() see here for the source) into my local project namespace, I didnt find a replacement in dojo0.9 yet :-(. Probably also a simple thing like the removeNode thing, but I don’t know yet.</blockquote>
Looking at the difference between:
http://api.dojotoolkit.org/jsdoc/1.2/dojo.string.substitute
and
http://api.dojotoolkit.org/jsdoc/HEAD/dojo.string.substitute
it seems this is intentional, that people need to use their own escaping function.
I disagree, this (html escaping) should be part of Dojo core.
comment:10 Changed 11 years ago by
I like the implementation here:
http://snipplr.com/view.php?codeview&id=13954
My version of it (dojo-ized, is this correct?):
dojo.provide('soluvas.string'); soluvas.string.escapeHTML = function(str) { var div = document.createElement('div'); var text = document.createTextNode(str); div.appendChild(text); return div.innerHTML; }; soluvas.string.unescapeHTML = function(str) { var div = document.createElement('div'); div.innerHTML = str.replace(/<\/?[^>]+>/gi, ''); return div.childNodes[0] ? div.childNodes[0].nodeValue : ''; };
comment:11 Changed 11 years ago by
Milestone: | 1.5 → 1.6 |
---|
comment:14 Changed 9 years ago by
I like this one: http://ainthek.blogspot.com/2010/10/fast-and-correct-htmlencoding-for.html (it is mine ;-)))
comment:15 Changed 9 years ago by
Cannot really judge this: But I would have expected that a full-blown js framework has some built in methods to escape and/or sanitize user input. I do it now on the server with https://github.com/theSmaw/Caja-HTML-Sanitizer (which wraps the above mentioned project for node.js).
comment:16 Changed 9 years ago by
Milestone: | 1.8 → 2.0 |
---|
1.8 is frozen. Move all enhancements to next release. If you need an exemption from the freeze for this ticket, contact me immediately.
comment:18 Changed 8 years ago by
Resolution: | → patchwelcome |
---|---|
Status: | assigned → closed |
A patch would be welcome to fix this. Otherwise I don't think it will get fixed given the lack of activity.
comment:19 Changed 8 years ago by
Added a patch for a dojo.string.escape function as well as for a dojo.string.isNullOrEmpty function.
The HTML escape function is based on underscore.js and prototype.js, see http://stackoverflow.com/questions/6020714/escape-html-using-jquery
Changed 8 years ago by
Attachment: | [PATCH][CLA]string.js.diff added |
---|
comment:20 Changed 8 years ago by
Resolution: | patchwelcome |
---|---|
Status: | closed → reopened |
Summary: | No reusable html escape function → [patch] [cla] reusable html escape function |
comment:21 Changed 8 years ago by
Milestone: | 2.0 → 1.10 |
---|
Another fundamental thing that we should have fixed a few years ago. Let's target this for 1.10 please.
comment:22 Changed 7 years ago by
Note: Using a regexp looks faster than using textNode: http://jsperf.com/escaping-html-ido
comment:23 Changed 7 years ago by
Pull request available here : https://github.com/dojo/dojo/pull/60
comment:24 Changed 7 years ago by
Owner: | changed from dante to bpayton |
---|---|
Status: | reopened → assigned |
comment:25 Changed 7 years ago by
And please, think in codePoints not chars. again evaluate simple and quite fast enough a.in.the.k solution.
comment:26 Changed 7 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:27 Changed 7 years ago by
mm - you're welcome to open a PR with your approach and some additional tests to show the advantages of your approach. for the sake of having something rather than nothing, i've merged what was provided so far.
You might wish to file unrelated comments separately or they'll be hard to track here.
The escape XML problem has been so noted in the code, though I don't know if there's a ticket for it. The code is a one-liner, so I think we were hesitant to place it in a module for fear of code bloat (include the module and you get unrelated stuff) OTOH, it's commonly used and isn't entirely straightforward.
As for the template substitution, you can substitute variables using the following syntax: ${!foo} and that will avoid the escapes. Otherwise, what you propose only seems like a few lines of code.