Opened 11 years ago

Closed 10 years ago

Last modified 10 years ago

#7890 closed enhancement (fixed)

add support for inserting html fragments in DOM ("dojo.html.insert")

Reported by: Mike Wilson Owned by: Eugene Lazutkin
Priority: high Milestone: 1.3
Component: HTML Version: 1.2.0
Keywords: Cc: Eugene Lazutkin, dante
Blocked By: Blocking:

Description

There is currently no support in Dojo for inserting HTML fragments at flexible locations like IE's insertAdjacentHtml does. Syntax is

elem.insertAdjacentHTML(placement, html)
placement = {beforeBegin|afterBegin|beforeEnd|afterEnd}

See http://msdn.microsoft.com/en-us/library/ms536452(VS.85).aspx

Prototype implements this feature cross-browser as Element.insert() with f ex the following syntax:

Element.insert(elem, {placement: html})

See http://www.prototypejs.org/api/element/insert

As suggested by jburke it is probably a good idea to line up with dojo.place syntax (and possibly the withdrawn-just-before-1.2 dojo.elem) and get something like:

dojo.html.insert(html, elem, placement)
placement = {before|after|first|last}

(placement uses constants from dojo.place)

The feature is great to use for working with templating of different kinds so it would be really nice to have it implemented in Dojo.

Attachments (3)

makeNodes1.js (4.1 KB) - added by James Burke 10 years ago.
A makeNodes version that does not use innerHTML
makeNodes2.js (1.8 KB) - added by James Burke 10 years ago.
makeNodes that uses innerHTML and returns document fragment
toDom.js (1.6 KB) - added by James Burke 10 years ago.
Supercedes makeNodes1 and 2. Based on makeNodes2, but with slight reductions and changed name to dojo.toDom

Download all attachments as: .zip

Change History (77)

comment:1 Changed 11 years ago by Eugene Lazutkin

Cc: Eugene Lazutkin added

comment:2 Changed 11 years ago by James Burke

Milestone: tbd1.3
Owner: changed from sjmiles to James Burke

I wonder if this can/should be included into dojo.place() directly: if the first argument starts with a "<", then we create the nodes and then insert them.

Would like to consider for 1.3, but will have to weight it with the element code, and perhaps any updates to dojo.attr.

elazutkin, any thoughts you have would be appreciated.

comment:3 Changed 11 years ago by Eugene Lazutkin

My thoughts are simple. :-)

First on the innerHTML treatment:

  • innerHTML is an attribute of the node => logically dojo.attr() should deal with it.
  • innerHTML has one known caveat (not counting inserting bad HTML): some IE elements (e.g., almost all table-related nodes) fail on assignment to innerHTML because it is read-only.

In order to remedy the last one we need to do the following (on IE): check the root node tag and guess the container element. If the snippet has several root nodes, let's assume that they all "compatible" with the first one. After that we instantiate the snippet in the proper context. Example: our snippet is

<td>1</td>

The correct context for it is <tr>, so we'll add a prefix and a suffix to make it "correct":

<table><tbody><tr>
<td>1</td>
</tr></tbody></table>

After instantiating the augmented snippet we extract our <td> element and appendChild() it to the target. The rest should be cleared away.

Regarding placing it similar to dojo.place(). It makes sense, and clearly it is a more generic mechanism than the innerHTML assignment that replaces all children. We have two possible ways to implement it:

  • Implement the above trick in all browsers keeping track of more possible contents. Example: <li> should be instantiated in the context of <ul> or <ol>, and so on. When the snippet is instantiated it is "placed" under the parent as required.
  • Temporarily remove and save all children, assign innerHTML (or do the trick on IE, if needed), and move old children back in proper places to simulate the right placement.

It is doable too, but I feel that the "placement" functionality may have a relatively large codebase keeping it out of Dojo Base.

PS: insertAdjacentHTML might help on IE, but I didn't play with it yet but from the documentation it cannot be applied to elements with the read-only innerHTML.

comment:4 Changed 11 years ago by Mike Wilson

I have to disagree with both of you! (no worries though :-)

Naming
I think the appropriate place is dojo.html as we have dojo.htmt.set() there, which is an abstraction of setting innerHTML. "Overwriting" and "inserting" on the same attribute would belong in the same place, right?

Wrapping tables etc on IE
Yes, the solution you mention is standard procedure for updating tables with HTML on IE. AFAIK this is the only browser issue so wouldn't need to workaround any other browsers or elements, specifically I don't think <LI> needs fixing as it is up to the user to create valid HTML and I know of no browser that has read-only innerHTML on UL/OL.
Note that the code for doing "table wrapping" is already in dojo.html.set().

Implementation
I don't think it is a good idea to temporarily move all children out. Prototype has managed to do this without that, and so have I in some previous work. I used insertAdjacentHTML() directly on IE and createContextualFragment() on standard browsers, and it worked great.

Placement
The codebase for placement should be no problem sharing with dojo.place() after the HTML is converted to a fragment using createContextualFragment().

insertAdjacentHTML on IE
Yes, it is my understanding too that the table limitations are in this function too.

comment:5 Changed 11 years ago by Eugene Lazutkin

createContextualFragment() is a good idea, but is it supported by all our non-IE browsers? I know for sure that Opera used to have a crippling bug exactly with table fragments. Either better compliance information is needed, or we have to get it by experimenting with different fragments on all supported browsers.

Regarding dojo.html.set(). Actually it needs to be fixed: while it support setting a fragment on <tr>, <tbody>, and <table> (this part is commendable) it leaves in the dark completely <thead>, <tfoot>, <col>, <colGroup>, and more exotic <style>, <frameset>, <head>, <title>, and <html>. IMHO, its code should be amended and, hopefully, reduced. Interesting that dojo.html.set() does its node wrapping regardless of the type of the browser.

To sum up mikewse's proposal --- we have to support at least 3 separate branches:

  1. Wrapping nodes on IE for read-only innerHTML nodes.
  2. insertAdjacentHTML() on IE for everything else.
  3. createContextualFragment() on non-IE browsers that support it.

The potential problem is the code size --- the conditional wrapping node solution can do the task on all browsers regardless of the need. Actually this is the way dojo.html.set() had decided to take. The 2nd and the 3rd branches would just increase the code without providing any additional functionality.

Please correct me if I missed anything.

comment:6 Changed 11 years ago by Mike Wilson

Hi elazutkin, most of what you say makes sense and below are a few follow-ups!

createContextualFragment()
It sounds like we may better stay away from it.

Alternative to moving out children and then assign innerHTML
Actually, couldn't just the opposite be done? Ie, assigning the new html to a temporary element (just like dojo.html.set does) and then dojo.place the new elements into the right position?

Implementation
With the above change I fully agree with your suggestion to do one code path for all browsers. It would both be a great design, and save code lines, if the new insert function could reuse the "element wrapping" code in dojo.html.set and the placing code in dojo.place. Then the only new functionality needed is to expand the html string into the corresponding child elements.
(It also sounds good to make improvements to dojo.html.set.)

Looking at Prototype
I had a look at the Prototype source, and incidentally they are doing things as described above. They have a quite elegant design for wrapping table and other elements that is completely lookup based (lines 1-4 in func below), so adding wrapping of a new element is just about adding a new entry in the lookup map.

As you can also see in the code below they are creating a temporary div for expanding the html into. I'm a bit sceptic about creating a new element for every call as this decreases performance, but maybe there is a problem with reusing the same div for all calls that I am not aware of? (I can see the same is done in dojo.html.set)

A final note is that Prototype applies the element wrapping workaround for all browsers and not just IE, right or wrong...

Element._getContentFromAnonymousElement = function(tagName, html) {
  var div = new Element('div'), t = Element._insertionTranslations.tags[tagName];
  if (t) {
    div.innerHTML = t[0] + html + t[1];
    t[2].times(function() { div = div.firstChild });
  } else div.innerHTML = html;
  return $A(div.childNodes);
}

Best regards
Mike

comment:7 Changed 11 years ago by Eugene Lazutkin

I didn't propose moving out children, just listed it as a possible 2nd alternative to do the task.

Even without looking at other implementations it is clear that dojo.html.set can be reduced. I still don't understand the rationale behind the "extended" version of dojo.html.set() --- the one that requires a special object to implement it. It looks over-engineered, and it is poorly documented. For example, try to guess the difference between cleanContent and extractContent just from reading the inline documentation.

But I guess we stuck with this API until 2.0. We can add more accurate treatment of the HTML fragments, and possible rely more on other facilities while carefully watching the size of Dojo Base.

In general I favor small functions with a limited well-defined behavior. In this particular case we can have a function similar to the standard createContextualFragment (takes a fragment, returns a node). Later on it can be reused in dojo.attr() and other places, like dojo.html.set, and the proposed here dojo.html.insert.

To sum it up (and including previous proposals): I advocate that this function (based on createContextualFragment) + dojo.html._emptyNode + dojo._destroyElement should be moved to the base, and reused consistently. The last two functions should be renamed to dojo.clearElement and dojo.destroyElement respectively (we'll keep the alias dojo._destroyElement for awhile).

I can prepare a patch with these changes, if we have an agreement.

comment:8 Changed 11 years ago by Mike Wilson

I think this sounds good but I'd like to emphasize that I don't regard myself as having a saying in how you guys implement this :-) My opinions from "outside the box" is that this stuff should have a nice API (which we seem to have agreed on from the start) and having good performance characteristics.
For the latter my worries are that creating new temporary elements on every call may take unnecessary cpu cycles and possibly may cause GC problems. Just for the record it would also be interesting to compare the performance of the new "dojo.createContextualFragment" with the native functions of IE and Firefox, but I am not so worried here as most of the work is done by an innerHTML call anyway.

It sounds nice to have many small functions, and it is also nice if they are public so they can then be reused by client code.

I'm wondering what would be the best name for the "dojo.createContextualFragment" method? "Contextual fragment" should probably be avoided as this is a mozilla extension and also has dependencies on ranges. Something along the lines "creating nodes out of a html fragment" or maybe something with "DocumentFragment?" as this is what is [could be] returned by the method. (Sadly, there seems to be no support for innerHTML on DocumentFragments? in f ex Firefox, otherwise the dojo method could have been a thin wrapper around this functionality on supporting browsers.)

Cleaning up set() and integrating it better with other base functions already available sounds good, and it would also be nice to see the return of a public destroyElement, there was something like it in 0.4 right?

A final minor note is that "dojo.createContextuaFragment" would return nodes (as in plural) as the fragment may have multiple roots. An even smaller note is that I happened to see that _base/html.js does its require before provide so you may want to fix that at the same time :-).

Best regards Mike

comment:9 Changed 10 years ago by Mike Wilson

FWIW, the HTML5 spec currently seems to standardize insertAdjacentHTML():
http://www.w3.org/html/wg/html5/#dynamic-markup-insertion
so one option could be to make a similar function that acts as a thin wrapper on future browsers that implement it by themselves.

Best regards Mike

comment:10 Changed 10 years ago by Eugene Lazutkin

NodeList defines addContent() method, which is not available for stand-alone nodes. It should be factored out and integrated with the upcoming solution in this ticket. This is mentioned in #7802.

comment:11 Changed 10 years ago by Mike Wilson

Some stuff about insertAdjacentHTML() on John Resig's blog: http://ejohn.org/blog/dom-insertadjacenthtml/

comment:12 Changed 10 years ago by Eugene Lazutkin

Owner: changed from James Burke to Eugene Lazutkin
Status: newassigned

Changed 10 years ago by James Burke

Attachment: makeNodes1.js added

A makeNodes version that does not use innerHTML

comment:13 Changed 10 years ago by James Burke

Eugene, I just attached two js files: makeNodes1.js and makeNodes2.js: the function focuses on converting an HTML string into a DOM node that can be inserted. It is not a full "insert" API method, just the nasty bit of DOM node conversion.

Both return a DocumentFragment?, so it makes it easier to insert into the rendered DOM: you can use appendChild() on the document fragment and it just appends all the the children, so no need to cycle through the childNodes and attach each one individually.

makeNodes1.js tries not to use innerHTML -- since there seemed to be lots of special cases for innerHTML, I wanted to see if we could avoid it. Unfortunately it feels too big, and it relies on regexp parsing.

makeNodes2.js uses innerHTML but uses a wrapNode to do proper insertion of the innerHTML for the weird cases. The wrapNode listing needs to be expanded to all the special cases, but hopefully it gives an idea.

I generally prefer makeNodes2.js, but I did find a case where it does not behave as expected:

dojo.byId("select").appendChild(dojo.makeNodes('<option value="3" selected>Three</option>  <option value="4">Four</option>'));
});

When these options were added to the SELECT tag, "Three" did not come up as selected (at least in FF3). The SELECT already had two existing options in it and "Three" and "Four" were appended to the SELECT. However in the makeNodes1.js version it did become selected. Not sure if this is a case we need to support, maybe it is fringe case.

As far as the API for this ticket, I was preferring perhaps overloading dojo.create() to do this work. So, if the string passed to dojo.create starts with a "<", then do this makeNodes route and inject it in the DOM:

dojo.create('<div></div>', refNode, "after");

So basically, for the dojo.create branch that makes nodes, the second argument would not be for attributes, but for the refNode to do the relative insertion, and the third arg is the insertion location.

comment:14 Changed 10 years ago by James Burke

Just updated makeNodes2.js after doing more jquery compat testing:

  • Added a toLowerCase on the tag name regexp match
  • Convert <tag/> into <tag></tag> for some tags, since some of the jquery tests use stuff like "<div/><b/>".
  • More tagWrap entries. Not exhaustive yet.

Changed 10 years ago by James Burke

Attachment: makeNodes2.js added

makeNodes that uses innerHTML and returns document fragment

comment:15 Changed 10 years ago by James Burke

I am incredibly lame. The makeNodes2.js did not work in IE precisely because it was not including wrapping HTML in the innerHTML call. I just updated makeNodes2.js now that I have run the code in IE 7. Still would like to make it smaller.

comment:16 Changed 10 years ago by Mike Wilson

Just a note on makeNodes1.js: while it is an interesting exercise to parse the HTML and build a copy through DOM calls, I think it should generally be avoided.
One reason is performance (DOM vs innerHTML), another reason is simplicity (letting the browser parse the HTML instead of us) and the third reason is IE. Building generic DOM trees with IE's DOM API is a world full of pain and will f ex have you resort to IE's custom createElement syntax:

document.createElement("<div attr1=... attr2=...>");

So, stick with the wrapHtml solution, or any native API that accepts HTML. (It's a pity DocumentFragment? doesn't support innerHTML.)

Changed 10 years ago by James Burke

Attachment: toDom.js added

Supercedes makeNodes1 and 2. Based on makeNodes2, but with slight reductions and changed name to dojo.toDom

comment:17 Changed 10 years ago by James Burke

Just attached toDom.js: it is based on makeNodes2.js, but has some slight reductions and I like the name for it now: dojo.toDom("<div></div>"). It has a similar feel to dojo.toJson().

So perhaps the API works like this: dojo.toDom() does the basic conversion into a DocumentFragment?!. Then dojo.place() is changed to do a switch on its first argument: if dojo.place's first arg is a string and starts with "<" then call dojo.toDom() to generate the node (really a DocumentFragment? that acts like a node), then proceed as normal in the dojo.place function.

This avoids any weird overloading with dojo.create() and having dojo.toDom() as a public API allows it to be used in other situations (perhaps dijit templateString processing?) where the dojo.toDom() may want to be cached and cloned in other uses not directly related to placement in the document via dojo.place().

comment:18 Changed 10 years ago by Mike Wilson

As I said earlier you dev guys know better than me how to best make the method name blend in with the rest of Dojo. toDom is a nice idea, but I am also concerned that it should blend with whatever the current dojo.html.set() is renamed to.
I think there should be an "IE DOM restrictions safe" one-liner method to just set innerHTML, as it is such a common operation.

(BTW: I didn't realize until now what a "lucky" number this issue has! The "7890" keys are in a very nice line on the keyboard :-)

comment:19 Changed 10 years ago by James Burke

mikewse: I am hoping we can get dojo.toDom() and the dojo.place() changes into Dojo Base (dojo.js), then change dojo.html.set() to use dojo.toDom() and possibly dojo.place() where appropriate. I think that works out since dojo.html.set() tries to do more than dojo.place and dojo.toDom(). But other thoughts on it (particularly Eugene's) would be appreciated.

I share Eugene's feeling that the current dojo.html module (that contains dojo.html.set) is not structured optimally, but I do not think we can change it appreciably in the Dojo 1.x timeframe for API compatibility reasons.

As for a one liner "IE DOM restriction safe", I think just doing node.innerHTML = "" is sufficient -- no need for a dojo library function. But that does bring up something new I think dojo.place might want to suppport:

It would be nice if dojo.place() also allowed a "position" argument that meant "wipe out other child nodes", similar to doing a refNode.innerHTML = "". So, perhaps support dojo.place("<div></div>", refNode, "only"), and the "only" arg would remove refNode's children and replace it with the nodes in the HTML string passed to dojo.place.

Not sure if "only" is the best word for this behavior. I do not want to use "replace" since that could be confusing: it could mean "replace refNode with these contents", but what we want to convey is "replace refNode's children with these contents".

comment:20 Changed 10 years ago by Mike Wilson

Due to IE's DOM restrictions the following will throw an exception:

tr.innerHTML = "<td></td>"

so no, innerHTML isn't sufficient. This is something that apps need to workaround all the time, and handling of this is currently offered in dojo.html.set().

I am interpreting part of what you are saying as this is not an interesting use case, which I would strongly object to, but then you are also suggesting the dojo.place "only" parameter which would do exactly what I ask for, so I guess we agree on that an "IE safe set innerHTML" method is desired.

I think this method belongs in Dojo Base so no problem breaking with dojo.html.set() for me after 2.0, but I might think that dojo.place(myhtml, refnode, "only-or-whatever") feels a bit unintuitive for an innerHTML setter.

And I think it is important that this operation is done in one method call so it can be optimized to doing a straight innerHTML assignment for safe cases, instead of always going the DocumentFragment? route.

comment:21 Changed 10 years ago by James Burke

mikewse: sorry I misread, I read "IE DOM restriction safe" as meaning the code you wanted to insert was safe from IE's DOM restrictions. So disregard my comment on that, innerHTML will not be sufficient for that case.

As for an alias, something like dojo.innerHTML(refNode, myhtml), I think underneath it would just map to the dojo.place() call, since we want to support innerHTML operations where the nodes go adjacent to the refNode ("before" or "after"). So, in essence, dojo.place is our innerHTML/insertAdjacentHTML call, but more flexible since place can accept an integer position.

We could consider some dojo.innerHTML/dojo.insertAdjacentHTML aliases for the dojo.place equivalents, but I would probably want to evaluate that after more feedback, after the base functionality is in the code via dojo.place. Having those aliases will take up more space though in _base.html and _base.NodeList?, so I like keeping it as small as we can for now.

Note that dojo.html.set() would still be usable as it is today.

As far as the speed concerns for a fast path for innerHTML like operations:

I believe the number of calls to either attach the created nodes to a DocumentFragment? (DF) to be nearly the same as the number to insert those same nodes into the document (DF path has a constant 2 extra calls), but the DF insert should be faster since it is not attached to the visible DOM. Plus, since the DF appears like one node, it makes other code that consumes the DF easier: you do not have to do NodeList? cycling, and figuring out how best to place those nodes so they are together in the document (mostly a concern when the insert is "before" or "after" or a certain position relative to the refNode).

The DF route does have two extra calls, the creation of the fragment and the insertion of that fragment in the visible document. We could do a small optimization in dojo.toDom: do not create a DF if the the HTML to be inserted only had one top level node. I can see about adding that, but otherwise I do not see an appreciable speed advantage to having some other fast path, particularly given the code size cost with dealing with node collections vs. one node.

comment:22 Changed 10 years ago by bill

Hi guys,

Looks like you are doing good work on this.

It would be nice if dojo.place() also allowed a "position" argument that meant "wipe out other child nodes"

That sounds confusing to me because, as Eugene said, that's what dojo.attr(node, "innerHTML", ...) is for. Well, I guess it's a bit different since dojo.place() takes a DOMNode as it's argument, rather than a String, but I still wonder if dojo.attr() should be the interface.

Just attached toDom.js...

Looks good. Note that dijit's _Templated.js also has code to do this, in addition to dojo.html.set(), so you should check that out too.

Is your new code handling iframe issues? I had to add special code to _Templated.js for that, and looking at your code now ISTM that we aren't guaranteed that the dojo.doc referenced in:

tagWrap.node = dojo.doc.createElement("div");

will match the dojo.doc in:

dojo.doc.createDocumentFragment()

Also, a few possible optimizations to your code might be:

  • pre-compile the regexps
  • cache the <table>, <tr>, <select> helper nodes. and reuse them.

Of course those optimizations need to be weighed against code size/complexity, and also tested; maybe they won't speed things up at all.

And speaking of testing, of course you will need good unit tests for the new functionality.

Anyway, looking good!

comment:23 Changed 10 years ago by Mike Wilson

james: innerHTML "fast path"

The type of case I'm thinking about is:

dojo.innerHTML(myNode, "<div></div>");

This could be done as a straight myNode.innerHTML= assignment in all browsers. I suspect that it would be much faster than going the route over DocumentFragment?. If that is true, it would be good to special case it in the code. (But maybe testing reveals it's not true due to disconnected DOM nodes etc.)

bill/james: iframe issues

Yes, I think Bill has a good point there. In DWR I have tried to use the parent element's document reference in cases like this, ie myNode.document above.

My impression is though, that Dojo doesn't seem to have the ambition to handle multi-doc scenarios in general? Take f ex all the use of instanceof throughout the codebase, even in the public dojo.is<Type> methods. They will not work in multi-doc scenarios. (I found a WONTFIX report #5334 on isArray that I recently reopened.)

If the desire is to support multi-doc I think you need to put up a big sign on both your roadmap and in the coding guidelines.

james/eugene/bill: naming and plan for API

It seems there are several different plans for what functionality belongs under what name going on in paralell. And it seems some of them are not being fully implemented either, like what Eugene mentioned about dojo.html.set and I can also see some cases not being handled by dojo.attr.

I think all the methods we've mentioned in this report, and probably some more from Dojo Base, would benefit from being organized in a "User Task" matrix to try to map out the desired functionality to methods in a natural way. Together with this there should be global goals like "we should [not] support multi-doc scenarios". Then it would be much easier creating a coherent API that will be easier for newcomers to learn/understand. The User Task matrix would be an invaluable resource for docs as well.

comment:24 in reply to:  22 Changed 10 years ago by James Burke

I forgot about dojo.attr(node, "innerHTML"). I suppose we could live with that for doing innerHTML operations (the attr() code could use toDom() then do the insertion). But for the other insert operations, dojo.place seems like it still needs to change, since it already supports "before", "after", "first", "last". But with dojo.attr picking up the innerHTML case, dojo.place just needs to change to support getting "<div>" HTML strings and using dojo.toDom() first.

I do think though that dojo.attr needs more work: it seems like it is morphing into a general node property and attribute getter/setter. I ran into this with the jquery compat module: it needs more work. Seems like it should favor a DOM property over a DOM attribute if the property exists. But that is a discussion for another ticket.

Mike: if we feel like the dojo.attr innerHTML path could use a speed adjustment, we can look at that as a secondary ticket. To me, those are internal details, and at least we have the robust functionality in place and can optimize from there after doing performance profiling.

Of course, Eugene might have something else up his sleeve, so I defer to him if he has worked out something else.

Mike: do you have an example of a "User Task matrix" that could guide us in what we do for Dojo? I am interested in using it for a Dojo 2.0 exercise. See below for more info on multidoc support.

Replying to bill:

Looks good. Note that dijit's _Templated.js also has code to do this, in addition to dojo.html.set(), so you should check that out too.

OK, from what I can see, the toDom() stuff should be compatible with dijit's _Templated method. One question, have you seen where doing node.normalize() makes a difference in the output? I would have assumed that since we are doing an innerHTML operation, the browser would not be creating extraneous text nodes that need to be merged.

Is your new code handling iframe issues? I had to add special code to _Templated.js for that, and looking at your code now ISTM that we aren't guaranteed that the dojo.doc referenced in:

Yes, I was concerned about that, but it has not come up in the jquery tests I have been using to test toDom() so far. However, I will change toDom to accept an optional doc parameter to allow for specifying the document to use. I'll see about stealing how _Templated only uses a cached temp node if the doc matches the cached node's doc.

Mike, your question on multiframe support: generally we do not support it. However, there are some operations that we need to support for something like the RichText? area that may use an iframe. In particular this sort of HTML to DOM stuff seems like it should support multiple docs.

Mike, the isX language methods are trickier since things do get weird across frames. I liked the idea of using the "Miller Device" for isArray, but was not sure if it would work for Array subclasses or Array tricks we do in dojo.NodeList?. It would be interesting if you wanted to do more tests to confirm the impact.

Also, a few possible optimizations to your code might be:

  • pre-compile the regexps

I was under the impression this was less important in modern browsers, but we do still support IE 6. I can pull those regexps out as vars inside the anonymous function.

  • cache the <table>, <tr>, <select> helper nodes. and reuse them.

I did try that at first, in a previous makeNodes2.js variant, but the issue is that the innerHTML did not even work unless the whole table or select stuff was innerHTML'd into the temp node. I am trying to cache the temp node though, and I like _Templated's "use a cached temp node except if doc changes" idea.

And speaking of testing, of course you will need good unit tests for the new functionality.

Yes, right now I have been using the jQuery tests, but definitely, need to get the unit tests in, but I did not want to anything with that until we have heard with Eugene and any work he has done. As I mentioned before, I just did this stuff so far because I needed it for the jquery compat stuff.

comment:25 Changed 10 years ago by Mike Wilson

Performance:
No probs waiting as long as the API allows the optimization to be done. Making it a one-liner guarantees that.

User task matrix:
I guess I can whip something up. Maybe we take that discussion offline, outside Trac?

is<Type>:
"Miller Device"; that's funny I didn't know it was called that. From looking on the net it seems this was presented in October 2008 so it should be called the "Mike Wilson device" instead :-) I've been using it for ages, most recently on a DWR checkin mentioned in June: http://www.nabble.com/RE:-Parameter-marshalling-between-iframes-%2B-DWR-p18179220.html[[BR]] But I'm quite sure lots of other people have used it before me, funny though that Crockford didn't know about it.
I'm not sure how much effort to put into tweaking isArray further than what I suggest. The "Mike Wilson device" ;-) will return a true answer for any object that delegates directly to the native Array class, which all "real" Arrays need to do to have a live .length property. This includes the dojo.NodeList?, which is not an Array subclass (as it is not possible to subclass Array) but an Array decorator.
Anything that doesn't directly delegate to Array is usually an "ArrayLike?" which dojo (and DWR) has another method for.

comment:26 Changed 10 years ago by Mike Wilson

comment:27 Changed 10 years ago by bill

OK, from what I can see, the toDom() stuff should be compatible with dijit's _Templated method. One question, have you seen where doing node.normalize() makes a difference in the output? I would have assumed that since we are doing an innerHTML operation, the browser would not be creating extraneous text nodes that need to be merged.

All I can say there is that IE is famous for leaving in text nodes that affect the spacing of a widget, which is why even with the normalize code a lot of our widget templates make sure to eliminate all spaces between the ">" of end tags (like "</div>") and the "<" in start tags (like "<div>")

However, I will change toDom to accept an optional doc parameter to allow for specifying the document to use.

Up to you but ISTM that dojo core does everything using dojo.doc as a sort of global variable, rather than having "doc" as a function parameter. For example, how do I create a node that I will later place in document myDoc? It isn't

dojo.create("div", myDoc);

but rather

dojo.withDoc(myDoc, function(){ dojo.create("div"); })

right?

(Incidentally, maybe dojo.create("div", attr, refNode) should be smart enough to realize that it needs to create the div using the refNode's document. Currently it isn't.)

comment:28 Changed 10 years ago by bill

Oh, one more thing. Starting from the original code back in [700], we always attach the wrapper <span> to the document. Presumably this is/was necessary for some case, but I'm not sure what. I'd suggest removing that line from _Templated.js and seeing what happens.

comment:29 Changed 10 years ago by Mike Wilson

I've had that kind of issues in Firefox when working with Ranges. FF throws when not having the elements connected to the document. Maybe you experienced something similar?

comment:30 Changed 10 years ago by Eugene Lazutkin

(In [16230]) Adding dojo.toDom() with unit tests, the code by James Burke (thank you!) with minor optimizations, and an extra argument to specify the document explicitly, !strict, refs #7890.

comment:31 Changed 10 years ago by Eugene Lazutkin

The major difference is the optional second argument for a document. If it is specified, it is used to create the master node (cached) and the document fragment, otherwise dojo.doc is used.

comment:32 Changed 10 years ago by Eugene Lazutkin

All interested parties: please review if I missed anything, and let's close the ticket.

One obvious use for dojo.toDom() is to use it in dojo.attr() to implement the innerHTML setter. Are we ready for it?

comment:33 Changed 10 years ago by liucougar

why not use dojo.withDoc instead of introducing the extra parameter?

dojo.withDoc is designed to aid in this situation

comment:34 Changed 10 years ago by James Burke

1) dojo.withDoc: I would prefer not to have to depend on withDoc in the future: I would like to move withDoc/withGlobal/setContext out of Base at some point (maybe Dojo 2.0). In general, we do not support multiple frame use of Dojo very extensively, so I think moving them out and not requiring all functions to accept an optional doc is fine. However, for something like DOM insertion it seems easy enough to support the doc parameter, and it is an operation that is safe to do across frames and may be one of the more frequent cross frame actions.

dojo.withDoc can still be used with dojo.toDom(), but requiring extra 2 function calls with a try/catch block (the cost of withDoc()) does not seem worth it in this situation, particularly when it just saves an
d.doc and the doc argument on the toDom() function (as far as code size). But this is not something that I believe in strongly. I think the only use case we have for dojo.toDom() in another frame may be some RichText? module cases, so liucougar may know better there.

2) I would prefer to use doc+ dojo._scopeName + "ToDomId"? (or make that property name a var we construct once) for multiversion support. I think in reality it most likely would not cause a problem if multiple versions shared the same node, but who knows in the future what we might do to the cached node.

3) It seems like "input" would also qualify for the selfClosedTags section?

4) With 2 and 3 addressed, I'm ready to see this used in dojo.attr("innerHTML") as well as for dojo.place() to allow placing the nodes around/inside another node.

comment:35 Changed 10 years ago by Eugene Lazutkin

I agree with all three points:

  1. Let's keep the code as is without relying on withDoc() --- it doesn't cost us much.
  2. Using dojo._scopeName is the right call.
  3. I'll add <input> to the list.

After that I'll patch up attr() and see what I can do with place().

comment:36 Changed 10 years ago by bill

Cc: dante added

I'll put it on my list to convert dijit._Templated to use dojo.toDom().

BTW, I was also surprised by your break with the dojo philosophy of using dojo.withDoc(). If we have an (optional) document argument for dojo.toDom() we should also have one for dojo.create() (although when refNode is specified we can infer the document).

comment:37 Changed 10 years ago by Mike Wilson

3) It seems like "input" would also qualify for the selfClosedTags section?

I'll add <input> to the list.

For a complete list I think you also need to add:

area, base, basefont, col, frame, isindex, link, param, embed.

comment:38 Changed 10 years ago by James Burke

For selfClosedTags:

I don't think we should sweat deprecated HTML elements, so that removes BASEFONT and ISINDEX. Even though FRAME is not deprecated, I see it hard for someone wanting XML-ish markup passing in a FRAME tag with a />. Similar with AREA.

We should just add tags that would probably freak out if we added a </tag> with it, and I saw at least one example on an old Netscape site about <embed></embed> in a tag reference.

But I am fine with adding COL, PARAM, LINK, BASE

withDoc stuff and dojo.create():

do we need dojo.create anymore if we have dojo.toDom() and dojo.place that accepts an HTML string? When dojo.place calls dojo.toDom() it should use the refNode's ownerDocument. If we change dojo.toDom() to return just a node if there is only one top level node in the HTML string, I can see the case where we would not need dojo.create() anymore, and dojo.toDom() is likely to be better performing?

comment:39 Changed 10 years ago by Eugene Lazutkin

It looks like we need to compare the performance of dojo.create() and dojo.toDom(). If the latter is consistently more performant than the latter, we should think about outfitting the latter with placing arguments like dojo.create().

Regarding dojo.withDoc() I can go either way --- I don't have strong feelings about it. Removing it will not save a lot of code nor dramatically improve the performance.

comment:40 Changed 10 years ago by James Burke

So for now, let's track create() maybe needing a doc parameter in the create() ticket, and the discussion of create() vs. toDom()/augmented place() that uses toDom(), for a meeting/trac ticket/thread discussion and wrap up the changes Eugene outlined in comment 35, with the addition of some more selfClosedTags.

comment:41 Changed 10 years ago by Mike Wilson

We should just add tags that would probably freak out if we added a </tag> with it

Sorry if being thick, but is all this rewriting at all needed? Doesn't all current browsers correctly render both HTML and XHTML being supplied to innerHTML independent of page DOCTYPE?

I think the cpu cycles being spent on parsing and rewriting the supplied html string needs to be justified, and I think the user code could be required to supply a html string matching the page's DOCTYPE, if needed.

comment:42 Changed 10 years ago by James Burke

mikewse: At least in Firefox with this doctype:

!DOCTYPE html PUBLIC "-W3CDTD XHTML 1.0 TransitionalEN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"

Doing a node.innerHTML = "<div/><b/>test"

generates a DOM where the structure is: -div --b ---test

instead of: -div -b -test

So that was the issue I was trying to correct. I may not have done the best job with it, other ideas are welcome.

comment:43 Changed 10 years ago by liucougar

I am with mikewse on this as well: if the user is trying to set something like "<div/><b/>test", I'd say they are asking for trouble, and I think we don't need to fix that case

comment:44 in reply to:  42 Changed 10 years ago by Mike Wilson

mikewse: At least in Firefox with this doctype:

!DOCTYPE html PUBLIC "-W3CDTD XHTML 1.0 TransitionalEN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"

Doing a node.innerHTML = "<div/><b/>test"

generates a DOM where the structure is: -div --b ---test

instead of: -div -b -test

Hm, I suspect that you are serving as text/plain, right? Firefox will use its HTML parser when you serve with text/plain, independently of DOCTYPE. Do a "View Page Source" and you will see red highlights on the "/>" meaning you've been a bad boy ;-).
If you switch to application/xhtml+xml then you will get your expected DOM. I haven't done extensive tests but IIRC quite a few browsers do it this way. The XHTML 1.0 text/plain compatibility thing is hopelessly broken. (I usually avoid XHTML as serving it as text/plain is meaningless anyway.)

Note also that as long as you are serving as text/plain you will get the same unexpected DOM even for when putting "<div/><b/>test" right into the doc, without js stuff. The same parser is used for both document content and innerHTML assignments.

So, to sum up, my experience is that innerHTML usually does the same thing as is done to "normal" content in the page. If the user wants to insert fragments alien to the page, ie HTML into an XHTML(application/xhtml+xml) page or XHTML into a page that doesn't support it (which XHTML text/plain in practice doesn't), then I think it would be better to have a separate "convert/reparse HTML fragment" API so it may be chosen whether to spend the cycles. John Resig has done some elegant reparsing stuff on http://ejohn.org/blog/pure-javascript-html-parser/ with the latest code on http://github.com/jeresig/env-js/tree/master/src/htmlparser.js

comment:45 Changed 10 years ago by Eugene Lazutkin

(In [16264]) Adding more elements to the self closed tag dictionary, and to the wrap dictionary, !strict, refs #7890.

comment:46 Changed 10 years ago by Eugene Lazutkin

(In [16265]) Making the unique string scope-based, !strict, refs #7890.

comment:47 Changed 10 years ago by Eugene Lazutkin

(In [16266]) Adding dojo.toDom() to process dojo.attr(node, "innerHTML", frag) on IE correctly, minor changes in dojo.toDom() to save bytes, correcting the argument override in html element tests, adding unit tests for new functionality, !strict, refs #7890.

comment:48 Changed 10 years ago by bill

For dijit template we can't match them to the index.html's DOCTYPE since we don't know what it is.

But, we'd never has something like <b/>... we just use <b></b> which works in all cases.

Things like <input/> are trickier but I suspect that <input> (without the slash) will work in all cases? Or perhaps dijit just doesn't support XHTML mode (with the mode correctly specified in the metadata from the webserver) since it's pretty unusual.

comment:49 Changed 10 years ago by James Burke

The <div/> to <div></div> expansion/selfClosedTags discussion:

The doc is served as text/html, with XHTML 1.0 Transitional DOCTYPE -- it was the jquery test suite. dojo.toDom() grew out of what I needed to pass their tests. So jquery is doing something like that now, and I like the idea of matching behavior more closely across toolkits to make migration easier.

I also liked that it allowed for shorter syntax for html strings.

We do not have to include it if no one else sees the value, but I expect a form of it will come back in Dojo 2.0. Please put your votes for or against it. If it is a mixed vote, it will stay in. So far, liucougar and mikewse are against.

comment:50 Changed 10 years ago by Eugene Lazutkin

After talking to Alex I think that dojo.place() is the right place to invoke dojo.toDom() --- the first thing you usually want is to insert it somewhere after the creation. So this change will be my next goal.

On tags expansion: it doesn't affect our API, use cases, nor the existing documentation => we can remove this part easily if we want, so I don't see it as a priority. I understand points from both sides. It boils down to saving some bytes and CPU cycles by removing it vs. handling more relaxed syntax.

comment:51 Changed 10 years ago by Eugene Lazutkin

(In [16271]) Acting on James Burke's suggesting to simplify the placement of DocumentFragment?, !strict, refs #7890.

comment:52 Changed 10 years ago by Eugene Lazutkin

(In [16272]) Implementing James Burke's suggestion to return a singular element directly without wrapping it in DocumentFragment?, !strict, refs #7890.

comment:53 in reply to:  49 Changed 10 years ago by Mike Wilson

Replying to bill:

For dijit template we can't match them to the index.html's DOCTYPE since we don't know what it is.

That's a perfect use case for reparsing that John Resig mentioned in his article. There could be a method "toSafeHtml()", or similar, in the Dojo API that could be used whenever there is need for this. I think it could provide value to both users and other parts of Dojo.
Or you can write your HTML fragment to parse correctly in both modes from the start. See below.

But, we'd never has something like <b/>... we just use <b></b> which works in all cases. Things like <input/> are trickier but I suspect that <input> (without the slash) will work in all cases?

Putting "/>" on empty elements is currently the best for a universal fragment, and will be supported and valid in HTML5. For non-empty elements the syntax should be avoided as there is no current behaviour or future HTML standard that supports it. See overview below:

Empty elements (input, img, ...)
<input>:

  • works in all HTML and XHTML(text/html) modes
  • breaks in XHTML(application/xhtml+xml)
  • validates for HTML4 and HTML5
  • does not validate for XHTML1 or XHTML5

<input/>:

  • works in all current browsers in all modes
  • validates for HTML5, XHTML1 and XHTML5
  • does not validate for HTML4

Non-empty elements (div, b, ...)
<div></div>:

  • works in all browsers in all modes
  • validates for all DOCTYPEs

<div/>:

  • breaks in all current browsers in HTML and XHTML(text/html) modes
  • works in XHTML(application/xhtml+xml)
  • validates for XHTML1 and XHTML5
  • does not validate for HTML4 or HTML5

Or perhaps dijit just doesn't support XHTML mode (with the mode correctly specified in the metadata from the webserver) since it's pretty unusual.

If you are using empty elements without "/>" in your fragment it will break in browsers that enforce XML parsing rules for innerHTML in application/xhtml+xml mode, like f ex Firefox does.

comment:54 Changed 10 years ago by Mike Wilson

Replying to jburke:

The <div/> to <div></div> expansion/selfClosedTags discussion:

The doc is served as text/html, with XHTML 1.0 Transitional DOCTYPE

Yes, I was meaning text/html (but wrote text/plain :-).

it was the jquery test suite. dojo.toDom() grew out of what I needed to pass their tests. So jquery is doing something like that now, and I like the idea of matching behavior more closely across toolkits to make migration easier.

While I also appreciate that, I think in this case it is questionable to have tests that require your function to correct invalid HTML.

I also liked that it allowed for shorter syntax for html strings.

IMO it is bad to offer services that encourage the user to write invalid HTML. And I see no reason that dojo.toDom() should correct fragments that would not be correct when inserted in the document itself. For that purpose I would recommend:

dojo.toDom(dojo.toSafeHtml("<div/>"));

where toSafeHtml() produces a representation that works correctly both in HTML and (real) XHTML mode. (the method name is just an example)

I think you should offer the ability to do toDom without incurring regexp replacement overhead. Personally I will not use a function with this kind of overhead when I know it is not needed and I am working with large fragments.

I think this whole "invalid HTML correction" thing is something that you should discuss for your Global Goals (referring to the User Task matrix I shared with you).

comment:55 Changed 10 years ago by Mike Wilson

Eugene:
When I submitted this enhancement request I did it because I saw a need for functionality in my project, that could be beneficial for other Dojo users. Unfortunately, the way this method is being implemented now makes it uninteresting to actually use the result of the enhancement request, as my own implementation is more efficient.

Earlier on you said:

In general I favor small functions with a limited well-defined behavior.

That is what I favor too, and I think you should break up the current toDom() implementation in separate methods for the different tasks that could all be used in other contexts:

  1. reparse HTML fragment correcting "/>"
  2. detect if fragment needs wrapping
  3. convert fragment to DOM node(s)

(2) and (3) could be called by functions that do not want (1), and (2) could be used to decide whether a fast path to set innerHTML directly is possible, without translating to nodes.

Having these APIs (public or private) would make it possible for me to use the outcome of this enhancement instead of staying with my own codebase, which would be great. Probably others could also benefit from it.

comment:56 Changed 10 years ago by Eugene Lazutkin

@mikewse: Splitting introduces its own set of problems, so the balanced approach is the best. My ultimate goal is to introduce the new functionality (or repackage the old one) to reduce the overall complexity. Splitting one function in three like you suggested has following problems:

  1. Common tasks will require more complex expressions, not less complex. This alone will make their use more error-prone. Problem: these three functions should be made internal and some user-facing wrapper(s) should be introduced; we can be swamped with "this doesn't work" bug reports from users not digging the whole deal with the complex instantiation of HTML fragments.
  2. I don't see how newly defined functions can be used in different contexts, for example, when I need a function to correct HTML without instantiating the fragment. Problem: splitting does not introduce the synergy => doesn't buy us much in terms of additional functionality.
  3. Packaging some functionality as 3 functions will require ... more packaging increasing the footprint and introducing extra calls. Problem: more technical expense for roughly the same value.

In all fairness there is one good thing about the split: advanced users can custom-tailor their code to exclude unnecessary steps increasing the performance. Does it really buy us a lot?

One possible solution is to make split differently:

  1. Remove the reparsing code and move it in a separate function outside of the Base. It assumes that users are smart enough to pass in the well-formed HTML in normal cases. Otherwise they should call it.
  2. Keep the wrapping code in the main function. If you don't need wrapping => don't use this function, use innerHTML directly.
  3. Dojo facilities (dojo.attr(), dojo.place()) will use the main function (without the reparsing). If user needs reparsing, it is up to her to call the reparsing function on her HTML string before passing it in.

The drawback is if somebody wants raw speed (unadulterated innerHTML), she should do it on her own without much of our help, e.g., splitting out innerHTML property from dojo.attr() call and assigning it directly. Personally I think this is a fair trade-off.

How does it sound to you guys?

comment:57 Changed 10 years ago by James Burke

Eugene,

That sounds good. Some extra info:

  • Change dojo.toDom to dojo._toDom, from meeting yesterday.
  • Remove selfClosedTags from dojo._toDom. Let's not create another function for the selfClosedTags stuff until someone asks for it. I can empathize with the argument of wanting to encourage proper HTML construction. For the jquery compat stuff, I'll just put that logic in the area before I call dojo._toDom.

comment:58 Changed 10 years ago by Mike Wilson

Moving out the reparsing is the most important in my opinion, as correcting HTML is an API contract that could be hard to remove once released. Doing the innerHTML fast path can wait as it is not a change of API contract if f ex dojo.attr("innerHTML") would start using it. We can wait and see if there are any significant performance wins before considering it again. That's good enough for me.

Looking at the code, I think the "fieldset" entry is not necessary in tagWrap, as <fieldset> is an independent block-level element, and not a child bound to <form>:s. The "legend" entry can be adjusted accordingly.

comment:59 Changed 10 years ago by Eugene Lazutkin

(In [16283]) Taking out the HTML cleanup code, !strict, refs #7890.

comment:60 Changed 10 years ago by Eugene Lazutkin

(In [16284]) Removing fieldset-related records in the tag wrapping dictionary, !strict, refs #7890.

comment:61 Changed 10 years ago by Eugene Lazutkin

(In [16285]) Renaming dojo.toDom() to dojo._toDom(), updating all known referencing, !strict, refs #7890.

comment:62 Changed 10 years ago by Eugene Lazutkin

Relates to #7669.

comment:63 Changed 10 years ago by Eugene Lazutkin

Regarding dojo.place():

The first argument can be a string already --- it'll be treated as the id of a node. So we need to be smart about HTML fragments.

One possible solution is to check the first symbol for "<". It is fast, but precludes instantiating text nodes as the first node of a fragment.

Another possible solution is to check for "<" anywhere using indexOf() assuming that ids cannot have this symbol inside. It is more flexible, but assumes that the fragment always contains tags. The downside a possible slowdown to check for "<". Most probably the slowdown will be negligible comparing with the fragment instantiation.

Any preferences over these choices? Any other ideas?

comment:64 Changed 10 years ago by James Burke

Checking just for first character being a "<" should be good enough.

comment:65 Changed 10 years ago by bill

(In [16286]) Convert _Templated to use dojo._toDom(), refs #7890 !strict. Performance is about the same as before (7s or 8s on IE6 in parallels on my mac).

comment:66 Changed 10 years ago by Eugene Lazutkin

(In [16287]) Adding the HTML fragment support to dojo.place(), !strict, refs #7890.

comment:67 Changed 10 years ago by Eugene Lazutkin

(In [16288]) More comprehensive tests for dojo.place(), !strict, refs #7890.

comment:68 Changed 10 years ago by Eugene Lazutkin

The work on this ticket is essentially done. I am keeping it open to reuse its results systematically across the toolkit.

Further dojo.place() enhancements are separated into #8380.

Further reuse in NodeList will be tracked by #8282.

comment:69 Changed 10 years ago by Eugene Lazutkin

(In [16302]) Replacing the removed dijit._Templated._createNodesFromText() with dojo._toDom() in DojoX DTL, all unit tests work ok, !strict, fixes #8391, refs #7890.

comment:70 in reply to:  65 Changed 10 years ago by Mark Wubben

Replying to bill:

(In [16286]) Convert _Templated to use dojo._toDom(), refs #7890 !strict. Performance is about the same as before (7s or 8s on IE6 in parallels on my mac).

I'll assume !strict is meant to indicate this, but I'd like to point out that this change broke a template which had some old HTML after (and outside of) the container node. dojo._toDom() returned a Document Fragment rather than one Element, messing up template parsing after that.

This may be an issue others will run into when upgrading? Is this meant to fail, and was the previously working behaviour by accident, or should there be an explicit check for Document Fragments to use the first element rather than the fragment?

comment:71 Changed 10 years ago by dante

@mark - !strict means that code in the checkin doesn't pass some form of lint validation.

The template in question was a fragment, and didn't reduce to a single node:

<div dojoattachpoint='containernode'></div>
<div>somethingelse</div>

afaik, that has never been supported and is surprising it works in 1.x at all.

on an aside, I'm confused why this made it into dojo.place but not dojo.create

comment:72 Changed 10 years ago by Eugene Lazutkin

Reuse in dojo.create() is a separate ticket #8446.

comment:73 Changed 10 years ago by Eugene Lazutkin

Resolution: fixed
Status: assignedclosed

The only subject of this ticket is dojo._toDom() which is the internal function. The rest was "outsourced" to other tickets, or fixed in the code already. Ergo, the ticket can be closed now.

comment:74 Changed 10 years ago by bill

(In [20111]) Throw error when template has multiple nodes in it (usually by accident).

People keep getting tripped up by this, especially since dojo._toDom() silently returns a DocumentFragment? instead of a DomNode? in this case.

Refs #7890 !strict.

Note: See TracTickets for help on using tickets.