Opened 10 years ago

Closed 5 years ago

#10291 closed enhancement (fixed)

Provide a cross-browser equivalent for textContent

Reported by: wolfram Owned by: dylan
Priority: high Milestone: 1.10
Component: HTML Version: 1.4.0b
Keywords: attr Cc:
Blocked By: Blocking:

Description

Make the following work

// dojo.create uses dojo.attr
n = dojo.create("h1", {textContent:"Whatever Headline"}, dojo.body()) 
dojo.attr(n, "textContent");

before the patch it was resulting in

<h1 textcontent="Whatever Headline"></h1>

now after the patch

<h1>Whatever Headline</h1>

Attachments (1)

attr_textContent.diff (2.2 KB) - added by wolfram 10 years ago.

Download all attachments as: .zip

Change History (34)

Changed 10 years ago by wolfram

Attachment: attr_textContent.diff added

comment:1 Changed 10 years ago by wolfram

Summary: Make dojo.attr() work with "textContent" properly tooMake dojo.attr() understand "textContent" properly

comment:2 Changed 10 years ago by dante

Milestone: tbd1.5
Owner: changed from anonymous to Eugene Lazutkin
Type: defectenhancement

not sure what this gains over doing your example with innerHTML?

dojo.create("h1", { innerHTML:"Expected" });

should generate

<h1>Expected</h1>

is there a valid use case for setting textContent over innerHTML (and couldn't you just .replace("<", "&lt;") before setting innerHTML to get the escaped version?

either way, too late for 1.4. Will decide for 1.5

comment:3 Changed 10 years ago by wolfram

I think the basic use case is the compliance to DOM Level3, and the code

dojo.create("h1", {innerHTML:"Whatever"})

makes me clearly expect that

dojo.create("h1", {textContent:"Whatever"})

should work too, because it kinda says "use any DOM attribute there, we map it properly for you". And "textContent" might be faster, but I didn't measure it. So it boils down to an API look+feel and maybe speed. My 2 cents.

comment:4 Changed 10 years ago by davidaurelio

textContent is important for escaping of input retrieved from “untrusted sources” like user input or data coming from other servers. Using innerHTML creates the same XSS vulnerabilities in JavaScript? that other languages have when not escaping strings in HTML.

The following example should illustrate the point:

dojo.create("h1", {innerHTML: "<script>do.something.evil()</script>"});

/* vs. */

dojo.create("h1", {textContent: "<script>do.something.evil()</script>"});

comment:5 Changed 10 years ago by wolfram

And I guess mentioning that those who want to break in find various ways, so just converting "<" and ">" would not really solve it, but using the native "textContent" attr makes it as secure as the browser allows. my 2 cents ...

comment:6 Changed 10 years ago by James Burke

Note: dojo.NodeList?-manipulate's text() function uses a recursive function getting node.nodeValue for get operations and createTextNode() for set operations to work better across browsers.

comment:7 Changed 9 years ago by Eugene Lazutkin

Component: GeneralCore
Status: newassigned

comment:8 in reply to:  6 Changed 9 years ago by Eugene Lazutkin

Replying to jburke:

Note: dojo.NodeList-manipulate's text() function uses a recursive function getting node.nodeValue for get operations and createTextNode() for set operations to work better across browsers.

I can explain the tricky things that James mentioned.

According to the official MS documentation on innerText (http://msdn.microsoft.com/en-us/library/ms533899%28v=VS.85%29.aspx) it differs from textContent in following ways:

  • The innerText property is valid for block elements only.
  • The innerText property is read-only on the html, table, tBody, tFoot, tHead, and tr objects.
  • You can set this property only after the onload event fires on the window.

The patch should be improved.

comment:9 Changed 9 years ago by Eugene Lazutkin

Milestone: 1.51.6

Rescheduling to 1.6. There is still a possibility to recast this ticket as 1.5 and include in 1.5.x release.

comment:10 Changed 9 years ago by Eugene Lazutkin

Milestone: 1.61.7

Given all troubles with IE, especially that it works only after onload fired, I want to postpone this ticket for now. We sanitize the incoming HTML in templates, so I suspect that this is the right way to do it rather than trying to support disparate IE and non-IE facilities.

I suggest to expose a sanitizing function and reuse it + innerHTML to set a sanitized text.

comment:11 Changed 9 years ago by Eugene Lazutkin

Dup of #7921.

comment:12 Changed 8 years ago by Chris Mitchell

Milestone: 1.71.8

comment:13 Changed 7 years ago by bill

Component: CoreHTML

comment:14 Changed 7 years ago by Colin Snover

Milestone: 1.82.0

1.8 is frozen. Move all enhancements to next release. If you need an exemption from the freeze for this ticket, contact me immediately.

comment:15 Changed 7 years ago by Mark Macdonald

comment:16 Changed 7 years ago by nosuchluke

Guys, this really needs some attention.

Although the title suggests using attr, and that's probably no longer the right place, there are underlying issues (see above and in #7921) that the toolkit really ought to address.

Basically, we should provide a cross-browser equivalent for textContent, which all modern browsers now have. That's actually two distinct features: a getter and a setter.

Setter. A convenient and reliable method to set element text is sorely needed. It's too tempting to just say node.innerHTML = myText and forget that the text is not escaped. We all know what happens when we forget to escape things, right? A cursory check finds some hundred cases in the toolkit where createTextNode workarounds have been re-implemented in various ways (not to mention all those that should have but didn't).

A suitable setter is pretty straightforward to implement, I think. You can assign to textContent if it exists, which naturally is the most efficient way, otherwise fall back to using createTextNode and such.

#8995 brings up the need for a reusable HTML-escape function. One of the best ways of implementing that (see ticket:8995#comment:10) is actually to create a detached div, add a text node, then get its innerHTML. Ironically, though, the most common reason people do HTML-escaping is just so they can push the result into innerHTML, for lack of any convenient direct way to set the text.

Getter. For browsers that lack the standard textContent, mainly IE8 and earlier, it's tempting to use innerText, but the behavior is somewhat different (besides the above, see http://clubajax.org/plain-text-vs-innertext-vs-textcontent/). NodeList-manipulate has a text() function that may already do the right thing (though I haven't examined all the edge cases) and just needs to be extracted.

I would guess dom-prop is where this belongs, so that we can do things like domProp.get(node, "textContent") or domConstruct.create("a", { textContent: "I <3 Dojo" }).

Once this is done, I certainly suggest updating all the tutorials (e.g. http://dojotoolkit.org/documentation/tutorials/promises/) so that users don't pick up the bad habit of using innerHTML for setting text.

comment:17 Changed 6 years ago by dylan

Milestone: 2.01.10

Agreed, we should fix this for 1.10.

comment:18 in reply to:  17 Changed 6 years ago by bjoern.burger

Replying to dylan:

Agreed, we should fix this for 1.10.

My company would really be pleased, if you can fix this for 1.10. Thanks!

comment:19 Changed 6 years ago by dylan

Owner: changed from Eugene Lazutkin to dylan
Summary: Make dojo.attr() understand "textContent" properlyProvide a cross-browser equivalent for textContent

comment:20 Changed 5 years ago by ido

I am working on a patch for this ticket. However I have a question for you guys. According to the tests I made: textContent provides the exact same result on IE10, Chrome 33, Firefox 27.

However according to Mark Wilcox's post (http://clubajax.org/plain-text-vs-innertext-vs-textcontent/) on oldest versions of Chrome and Firefox the result was not consistent. And IE8 also provides another kind of result.

So, my question is: for get('textContent'), should we rely on browser's textContent property when it exists and use a custom function when it does not?

If so, the result may not be consistent across browsers... In other hand, a custom function always used will be consistent across browser but not standards compliant (and far slower...)

My current changes are available here: https://github.com/ben8p/dojo/compare/textContent

Basically, I moved text() from Nodelist-manipulate to dom-attr. And I use it for IE <= 8, for any other browser I use native textContent

Last edited 5 years ago by ido (previous) (diff)

comment:21 Changed 5 years ago by Colin Snover

I would not rely on the veracity of a report from 2010, on that site.

comment:22 Changed 5 years ago by ido

@csnover I agree. But the fact is: there is no way to have the same value between IE8 and other browsers (because IE8 does not return the indentation of the HTML, just the new lines). So I wonder if that's an issue IE8 (and lower) provides a different value than any other browsers... if so, we can only go for a custom getter...

Last edited 5 years ago by ido (previous) (diff)

comment:23 Changed 5 years ago by Colin Snover

Yes, old IE has a habit of mangling data passed into the DOM, but it does that with innerHTML too, so if people are OK with how that has worked for years even though it’s not consistent, I would think this to be OK, too, especially if it is in documentation. Use textContent for all browsers that support it, and traverse elements for old IE.

comment:24 Changed 5 years ago by ido

I think so too.

Pull request is available here : https://github.com/dojo/dojo/pull/61

(The title needs to be updated with [CLA][PATCH])

Last edited 5 years ago by ido (previous) (diff)

comment:25 Changed 5 years ago by Brandon Payton

@ido, thanks for tackling this. This issue has been a pain for me in the past.

Regarding the pull request, it seems like this change should be made to dom-prop instead of dom-attr because textContent is a DOM property, not an attribute. The difficulty here is that dom-construct uses dom-attr which means adding textContent support to dom-prop won't help dom-construct.

A counterpoint to my concern is that dom-attr already supports innerHTML.

Any thoughts on this?

comment:26 Changed 5 years ago by ido

@brandonp, indeed, dom-prop is a better place... I've updated the pull request. The code is now in dom-prop but dom-attr is still mapping it. dom-attr just need a small if() in the getter.

Let me know if you think I should revert and let the code in dom-attr

Last edited 5 years ago by ido (previous) (diff)

comment:27 Changed 5 years ago by Brandon Payton

@ido, I think that is the right move.

comment:28 Changed 5 years ago by ido

@brandonp I still have an little worry... Let's consider the following content:

<div>first line
	<div>second line</div>
	<div>third line</div>
	fourth line
</div>

If you do a console.log(domProp.get(node, 'textContent'));

  • New browsers will spit:
    first line
    	second line
    	third line
    	fourth line
    
    

And that's what we expect.

  • IE8- will return:
    first line second line third line fourth line
    

This consistent with the current value returned by Nodelist-manipulate::text() However, we could (IMO we should) consider using IE innerText property.

  • innerText will return:
    first line
    second line
    third line
    fourth line
    

It's still not the same behavior as other browser but, it's closer to what the user may expect.

innerText cannot be used for setting the value because of some restrictions. However, it can be used for reading values and I think we should use it.

The drawback of using innerText is the backward compatibility. The result of Nodelist-manipulate::text() will be slightly different in Dojo 1.9- compared to Dojo 1.10+. Is that acceptable ?

Note: the value returned by innerText change according to the node. For instance:

<div>
	first line
		second line
			third line
	fourth line
</div>

return

first line second line third line fourth line

While

<pre>
	first line
		second line
			third line
	fourth line
</pre>

return

	first line
		second line
			third line
	fourth line
Last edited 5 years ago by ido (previous) (diff)

comment:29 Changed 5 years ago by Brandon Payton

@ido, given the choice between different whitespace (textContent) and different content (innerText), I'd prefer different whitespace. Do others have thoughts on this?

comment:30 in reply to:  29 Changed 5 years ago by Brandon Payton

@ido, after talking about this with others, I think we should continue with your current implementation and not use innerText.

comment:31 Changed 5 years ago by ido

@brandonp: thanks, the pull request is ready. I applied latest comments.

comment:32 Changed 5 years ago by dylans <dylan@…>

In 5646569b5d936834a063e6abaf2ab262ca704f8a/dojo:

Error: Processor CommitTicketReference failed
Unsupported version control system "git": Can't find an appropriate component, maybe the corresponding plugin was not enabled? 

comment:33 Changed 5 years ago by dylan

Resolution: fixed
Status: assignedclosed

Committed to master, thanks for contributing an updated patch for this long-requested enhancement.

Note: See TracTickets for help on using tickets.