Opened 7 years ago

Closed 4 years ago

#16599 closed enhancement (fixed)

dojo/store: missing deep queries

Reported by: Adrian Vasiliu Owned by: dylan
Priority: high Milestone: 1.11
Component: Data Version: 1.8.3
Keywords: Cc: cjolif, ben hockey
Blocked By: Blocking:

Description (last modified by Brian Arnold)

I think it would be helpful that dojo/store provides an implementation of a query engine allowing (optionally) to recurse into the children of items, such that hierarchical data can be queried. This could be either an option of the default engine (store/util/SimpleQueryEngine) (say, similar to the deep:true option of dojo/data), or it could be implemented in a new, distinct engine.

As a bonus, it would be nice that dojo/store provides an in-memory store that implements the getChildren() method. (A related ticket: #13781.)

Adding such features would help people to replace the usage of the old dojo/data stores by dojo/store without loosing features and without forcing them to convert hierarchical data into flat data.

Attachments (2)

hierarchyChildren.js (446 bytes) - added by ben hockey 6 years ago.
hierarchyParent.js (395 bytes) - added by ben hockey 6 years ago.

Download all attachments as: .zip

Change History (28)

comment:1 Changed 7 years ago by cjolif

Cc: cjolif added

comment:2 Changed 7 years ago by ben hockey

Cc: ben hockey added

comment:3 Changed 7 years ago by ben hockey

deep queries can be supported by providing a function as a query (or a string that represents a property on a store which is a function). for example:

        function hasInCircle(name) {
            return function (item) {
                return ~item.circle.indexOf(name);
            };
        }
        
        var data = [
                {
                    id: 'joe',
                    circle: ['alice', 'bob', 'carl'],
                    address: {
                        state: 'NY'
                    }
                },
                {
                    id: 'alice',
                    circle: [],
                    address: {
                        state: 'KY'
                    }
                },
                {
                    id: 'bob',
                    circle: ['alice', 'carl'],
                    address: {
                        state: 'NY'
                    }
                },
                {
                    id: 'carl',
                    circle: ['joe'],
                    address: {
                        state: 'NJ'
                    }
                }
            ],
            store = new Memory({
                data: data,
                livesInNY: function (item) {
                    return item.address.state === 'NY';
                }
            });
        
        console.log('lives in NY...');
        console.log(store.query('livesInNY'));
        console.log('has joe in their circle');
        console.log(store.query(hasInCircle('joe')));

you can see a live copy of this at http://jsfiddle.net/neonstalwart/4EtWU/

is there something that can't be done this way?

as for getChildren, isn't that something that is going to be fairly unique to each store? i'm not sure how it could be written in a way that was generic enough and simpler than just writing your own. what approach would you suggest?

comment:4 Changed 7 years ago by ben hockey

Owner: changed from Kris Zyp to Adrian Vasiliu
Status: newpending

comment:5 Changed 7 years ago by Adrian Vasiliu

Status: pendingnew

Indeed, a code example helps to clarify the need I tried to describe. So here's a modified version of your code:

function nameContainsFoo(name) {
  return function (item) {
    return ~item.name.indexOf("Foo");
  };
}
        
var data = [
  {
    name: 'item Foo 1',
      children: [
        {
          name: 'item Bar 1',
          children: [
            {
              name: 'item Bar 1.1',
              children: [
                {
                  name: 'item Foo 1.1.1'
                }
              ]
            }
          ]
        }
     ]
  },
  {
    name: 'item Foo 2'
  }
],

store = new Memory({
   data: data,
   nameContainsFoo: function (item) {
   return ~item.name.indexOf("Foo");
}
       
console.log('name contains Foo (using function defined in store):');
console.log(store.query('nameContainsFoo'));
        
console.log('name contains Foo (using function defined in app):');
console.log(store.query(nameContainsFoo('item')));
                                
var regExp = new RegExp("^.*Foo.*$", "mi");
console.log("name contains Foo (using regexp): ");
console.log(store.query({name: regExp}));

I've put it live at http://jsfiddle.net/adrian_vasiliu/vFnkR/3/

As you can see, I've added a regexp-based query, that I'd like to return all items for which the 'name' property contains "Foo". The code above returns only two top-level items. The need would be to be able to get (in this case) the three items, including the third matching item located deep inside the children ('item Foo 1.1.1')


Concerning store's getChildren property: I fully agree the implementation would be fairly unique to each store. My point was that a store such as dojo/store/Memory could have a childrenProperty:String (say, default: "children"), and an implementation of getChildren which would rely on the childrenProperty.

More generally speaking, I don't understand why getChildren is an (optional) part of the Store API while it is never called by any piece of code in dojo/store. I think it would be useful that, for a store that happens to implement getChildren, the queries rely on it for recursing inside child items (depending on a "deep" option in Store.QueryOptions).

comment:6 Changed 7 years ago by cjolif

I agree the lack of "reference" implementation of getChildren is a problem. I have always been reluctant of making components that do rely on it as we have no implementation I can test against in the toolkit itself. I think each API should have at least a "reference" implementation.

comment:7 Changed 7 years ago by ben hockey

Resolution: invalid
Status: newclosed

this is turning into more of a support ticket than a bug so i'm going to end up closing this out because we try to avoid providing support in tickets, but here's my help...

Adrian, as far as any of dojo's existing store implementations are concerned, you only have 2 items in your store - so of course any query can only return at most 2 items. all the current dojo stores expect to have all data items exist at the top level of the data provided to the store. this means you have 2 options,

  • you manipulate your data (either on the client or on the server) to match the expectations of the store
  • you build a custom store that understands how it should traverse the data in order to find all the items.

for data items that are related to each other, consumers of the store will use the getChildren function of the store to get related items. the store API is intentionally trivial to implement with the expectation that you will often need to implement your own custom store rather than use one provided by dojo. once you have a store that understands the data format you're providing, our examples we've used so far will work with SimpleQueryEngine?. so the problem is not the query engine but rather that the store is not being populated with items as you seem to be expecting.

adding getChildren to something like Memory store is bloat. there is no one size fits all solution. most times a custom implementation will be needed and so providing anything is just unnecessary bloat. to demonstrate, even the idea of a single property on the parent that points to the children is short sighted... what if the children have a single property pointing to their parent? this is why getChildren is part of the API but not implemented.

I don't understand why getChildren is an (optional) part of the Store API while it is never called by any piece of code in dojo/store.

based on the data sample you provided and this comment, i'm wondering if you have an expectation that a store would use getChildren to traverse the raw data provided to it? getChildren is to be used by consumers of the store -see dijit/tree/ObjectStoreModel for an example of a model that consumes a dojo/store and uses getChildren. it's possible that in a custom store implementation getChildren could double as the way to traverse the raw data when populating the store but it's primary purpose is for consumers of the store to be able to get the children of an item they already have a reference to.

Christophe, we have a tutorial that provides a reference implementation of getChildren - http://dojotoolkit.org/documentation/tutorials/1.6/store_driven_tree/.

for both of you, building custom stores should become second nature to anyone using dojo and because there is no one size fits all solution for hierarchical data, you are going to have to customize somewhere - even if it's just to add getChildren to an existing store.

i'm closing this ticket out because i don't see any bugs or work to be done.

comment:8 Changed 7 years ago by Adrian Vasiliu

Let me answer some of your points (as an attempt to further clarify my expectations):

"i'm wondering if you have an expectation that a store would use getChildren to traverse the raw data provided to it?"

Yes, I do have this expectation. More exactly, I have the expectation to be able to execute a recursive query using a query engine provided by dojo/store. One of the reasons for which I have this expectation is that the dojo/data stores do provide this feature. Hence, this appears as a regression when migrating from the old to the new stores...

"you will often need to implement your own custom store"

My own context is different, let me describe it. I'm implementing a filtering feature for dojox/mobile list widgets. These widgets can be backed by either dojo/data, or dojo/store stores. The filtering is done thanks to a regexp-based query in the store. Now, the dojo/data store allow to match items inside hierarchical data (thanks to their "deep" config options), and this is not the case with the dojo/store. That's my context. Since our widgets are supposed to consume stores provided by our users, implementing a custom store is not something it makes sense to do on my side - our widgets need to consume stores provided by our end-users. And I can't really ask our end-users to implement stores with recursive capabilities by pointing them to the "store driven tree" tutorial that you pointed - it would be of a limited help for a user who doesn't use a dijit/tree, isn't it?

"you are going to have to customize somewhere - even if it's just to add getChildren to an existing store."

I think the trouble for these use-cases isn't that much that getChildren() isn't implemented by any of the store implementations in dojo/store. It would be feasible to ask the user to implement it. The trouble is that just implementing getChildren() is far from being enough, since the query engine provided by dojo/store doesn't use it. All in one, in the current situation, the user needs to implement getChildren() *and* a custom query engine. I still don't understand why it wouldn't make sense that dojo/store provides a query engine that calls getChildren (if and only if the store implements getChildren).

comment:9 Changed 7 years ago by ben hockey

there is one simple change to make and your problem goes away... you need to have a store that understands the format of the hierarchical data you're providing. there are 2 ways to do this, manipulate the data to conform to an existing store OR create a custom store.

for simplicity, i've updated your fiddle at http://jsfiddle.net/neonstalwart/vFnkR/5/ to manipulate the data passed to Memory store in a way it expects and i've added an implementation of getChildren that will work with the data format i've used. the same could have been achieved by writing a custom store that understood the data passed to the constructor. without any changes to the query engine, everything you want is there once the store has data in a format it expects. hierarchical querying has been achieved. once you have an item from the store, you can find it's children via store.getChildren(item);

the implementation i've provided does not recurse into children to populate their children. however, as a widget developer, you should not need to worry about that. to traverse the hierarchy, you use getChildren regardless and you assume that the store provided to you will handle it properly.

as a widget developer it's reasonable to require your users to provide a properly functioning store - so if they provide a dojo/store implementation, it should work according to that API and if they provide a dojo/data implementation, it should work according to that API.

it's also unreasonable for users to expect that they can switch between dojo/data and dojo/store just by changing the constructor of the store... that's insane - the APIs are different. changes to code that consumes the store and some manipulation of data should not be out of the question for a migration like this.

anyhow... i've shown that there is no need for changing the query engine once the store is working as expected and as a widget developer you just need to use getChildren to traverse hierarchies without a concern for the format of the original data.

comment:10 Changed 7 years ago by Adrian Vasiliu

Sure, it works fine provided that the data is flattened. The concern was about the ability of dojo/store to work with unflattened hierarchical data.

"it's also unreasonable for users to expect that they can switch between dojo/data and dojo/store just by changing the constructor of the store... that's insane - the APIs are different."

I have no problem with the API changes, which are perfectly understandable.

"changes to code that consumes the store and some manipulation of data should not be out of the question for a migration like this."

Well, if the change involves a flattening operation of the hierarchical data, plus building back the hierarchical info by browsing the results of the query (as in your fiddle), I would think this is a questionable overhead, a bit in terms of amount of code and a lot in terms of performance.

For comparison, this is all you need with the old dojo/data store:

store = new /*dojo/data/*/ItemFileReadStore({
  idProperty: 'label',
  data: data // hierarchical data (unflattened)
});
        
var regExp = new RegExp("^.*Foo.*$", "mi");

store.fetch({
  query: { name: regExp},
  queryOptions: { deep: true },
  onComplete: function(items){
    console.log("name contains Foo (fetch using regexp): ");
    console.log(items);
  }
});

comment:11 Changed 7 years ago by cjolif

I really don't understand why you consider this an invalid request. We can discuss it and maybe discard it in the end but just deciding so quickly this is not valid seems a bit arbitrary to me. If I'm not mistaken what you have proven with your example is that if you flatten the data the query engine works on it, which we knew. The idea is that _because_ store API provides hierarchical API (getChildren) the query engine that we provide should be on par and provide hierarchical API (deep: true). Otherwise there is an obvious inconsistency between the two. In other words either we should force stores to be flat or we should allow hierarchical treatment both at store and query level.

Obviously there are solutions from the outside as you shown, but they are not as simple as if the two were consistent. I totally agree users should write their own stores for performance and specific data access (even though I think it would be good to provide a default memory implementation that covers the hierarchical case) but this does not remove the fact that as we provide them with a hierarchical API on store we should guide them at query level as well by defining a recursive option and implementing it in our query implementation.

Last edited 7 years ago by cjolif (previous) (diff)

comment:12 Changed 7 years ago by Kris Zyp

Resolution: invalid
Status: closedreopened

Brian Arnold is going to implement a hierarchical store, based on parent-keyed identification of children, to go in dojo/store, so we have a recommended best-practice approach to hierarchical data in dojo/stores.

comment:13 Changed 7 years ago by Kris Zyp

Owner: changed from Adrian Vasiliu to barnold
Status: reopenedassigned

comment:14 Changed 7 years ago by Brian Arnold

The problem with this request is that it runs counter to the things that make the dojo/store API great. It's very light and flexible and imposes almost zero structure on your data, but this also means that it's difficult to write a generic solution in this case. For example, this is the logic that dojo/data uses in the IFRS to determine things that are items:

        function valueIsAnItem(/* anything */ aValue){
                // summary:
                //              Given any sort of value that could be in the raw json data,
                //              return true if we should interpret the value as being an
                //              item itself, rather than a literal value or a reference.
                // example:
                //      |       false == valueIsAnItem("Kermit");
                //      |       false == valueIsAnItem(42);
                //      |       false == valueIsAnItem(new Date());
                //      |       false == valueIsAnItem({_type:'Date', _value:'1802-05-14'});
                //      |       false == valueIsAnItem({_reference:'Kermit'});
                //      |       true == valueIsAnItem({name:'Kermit', color:'green'});
                //      |       true == valueIsAnItem({iggy:'pop'});
                //      |       true == valueIsAnItem({foo:42});
                return (aValue !== null) &&
                        (typeof aValue === "object") &&
                        (!lang.isArray(aValue) || addingArrays) &&
                        (!lang.isFunction(aValue)) &&
                        (aValue.constructor == Object || lang.isArray(aValue)) &&
                        (typeof aValue._reference === "undefined") &&
                        (typeof aValue._type === "undefined") &&
                        (typeof aValue._value === "undefined") &&
                        self.hierarchical;
        }

When the IFRS is processing its data, it basically applies that check to every top-level item in the array of data passed to it, and then it applies that check to every single attribute of those top-level items, and if those are items, it applies it to every attribute of those items, so on and so forth. Internally, it's building up a giant flattened structure of all items, which is how it's implementing its deep query later on.

This process is incredibly heavy and absolutely unnecessary for a vast majority of common cases for data management, and one of our goals should be meeting the common need. By not imposing all of this weight on the average user, we're providing them with much lighter, simpler to use stores that don't go through and perform incredibly heavily operations and data munging (as every item effectively gets copied and mutated heavily as well).

The solutions that Ben has proposed all work very well and require little effort from the end user. By putting the onus on the user, it allows them to easily adapt and adjust the store to their needs, and not adjust their data to match the store. Given that many of our users have no control over their backend, a flexible customized store is much easier than having to munge data in processing.

It does not seem unreasonable to have a reference implementation of a store with a getChildren method, though. In discussions with dozens of clients and hundreds of individuals at training classes, the vast majority of users are working with flat data that keys into parents, typically coming from some sort of RDBMS on the backend. As Kris mentioned, I'll implement a reference that works with the most common model that we see time and again with clients, and there may be great value in having that.

However, it definitely won't live in dojo/store/Memory. That should remain as-is as a very, very light store, to serve as a base for more complex in-memory stores.

As for deep queries, it'd be easy enough to implement a slightly more advanced query engine that supports a known property for children and to iterate through that, and so we could likely implement a reference version of that as well, but it's more debateable as to whether or not that has worth being in Dojo itself, as it really does seem to be fairly uncommon. We should be striving to meet the needs of all users, not just ourselves, and adding weight where it isn't usually required is a poor approach to that. :)

comment:15 Changed 7 years ago by Adrian Vasiliu

Thanks, I think this new store may be great, however honestly I'm mostly interested in a recursive query engine working on any store that happens to implement getChildren().

"As for deep queries, it'd be easy enough to implement a slightly more advanced query engine that supports a known property for children and to iterate through that"

I may be missing the point, but wouldn't be better that it relies on store's getChildren() (if and only if the store implements this method), instead of a known property? Then, a getChildren() implementation in a concrete store may rely on a known property.

By the way, concerning the (heavy) logic that dojo/data uses to determine things that are items among the values of all item properties, I would think that dojo/store wouldn't need something equivalent for implementing a recursive query, precisely because it could rely instead on store's getChildren() (when it is implemented). And this may also avoid the need of an internal flattening of the data, as done by dojo/data (which lacks the getChildren method).

It seems to me that introducing a RecursiveQueryEngine (as alternative for the current SimpleQueryEngine) wouldn't necessarily make dojo/store significantly heavier. At least this is how it looks seen from the outside, I may of course miss some things. In any case I thank you all for your patience and time.

comment:16 Changed 7 years ago by Brian Arnold

Description: modified (diff)

I can appreciate what you're asking for, and understand its value to you, but the problem is that what you're asking for is a thing that very specifically meets your needs, and is not something that would make for a good general implementation.

Implementing what you've described, basically taking SimpleQueryEngine? and making it detect getChildren and iterate when a deep flag is set would honestly not be that hard to do, but it doesn't really meet any general needs. It meets yours.

I really hope that doesn't sound like a jerk thing to say, but the need for a data structure that's truly hierarchical and not flattened is fairly uncommon, and queries against flattened structures are inherently deep. If a user doesn't want a deep query in that context, it's easy enough to query for items with a specific parent key (or lack thereof).

The main problem with hierarchical data of this nature is that in order for a simple get to work, you'd still have to do a full traversal of the data in order to index all items by identifier, and this traversal would ideally happen at initialization. If your data is flat, you simply loop. If it's not, you have to recurse -- but you still wind up generating a flat representation of the data in the ideal case to simplify querying, and that starts to approach the weight (both LOC and memory) and proscriptive data structures that dojo/data requires.

I would go so far as to argue that representating your data with deeper hierarchical structures is an antipattern, that doing a flattening is going to save time and effort for both the developer of the store and the user of the store. A reference store that demonstrates that approach would not be immensely difficult, but hierarchical data structures are distinct enough that it'd be difficult to create a generalized implementation. Parent-based keying is incredibly common and simple to implement in a general fashion without being proscriptive.

Last edited 7 years ago by Brian Arnold (previous) (diff)

comment:17 Changed 7 years ago by Brian Arnold

Description: modified (diff)

Whoops, completely blew out the description by accident. Shouldn't update these when my mind is elsewhere. :) Fixed.

comment:18 Changed 7 years ago by ben hockey

adrian, without meaning to be rude i think you are missing the point and you're misunderstanding the purpose of getChildren - it is for converting a flat structure into hierarchical data. it is *not* for flattening a hierarchical structure into it's components. maybe that explains why we have different view points here.

the currently provided dojo/store implementations work on the following basis:

  • data provided to a store is flat (this is an implementation constraint and is not necessarily a constraint of the dojo/store API)
  • logical hierarchies can be formed by consumers of the store via getChildren (this is part of the dojo/store API)

based on this, there is no need for a RecursiveQueryEngine?. i've demonstrated how the SimpleQueryEngine? works when your store works.

to further demonstrate why the problem is not the query engine, let's consider your example (http://jsfiddle.net/adrian_vasiliu/vFnkR/3/) and try to use parts of the store that don't rely on the query engine. assuming that we at least update your store so that it uses the name property as the identifier, let's try to see if this store works

    store = new Memory({
        idProperty: 'name',
        data: data,
        nameContainsFoo: function (item) {
            return ~item.name.indexOf("Foo");
        }
    });

// try to get one of the nested items - does not use a query engine at all
console.log(store.get('item Bar 1.1')); // hmm... it's undefined

// try to get one of the root items
console.log(store.get('item Foo 1')); // yay!  it works

// try to get all the items in the store
console.log(store.query()); // hmm... just 2 items?!

the problem is not the query engine but it is the mismatch between the way data is being provided to a store and the way the store is expecting the data. you seem to be of the understanding that if we were to provide a getChildren function to the store then it would be able to deconstruct your hierarchical data into it's component items. this is not the purpose of getChildren. the purpose of getChildren is so that a consumer of your store can reconstruct a hierarchy.

comment:19 Changed 7 years ago by Adrian Vasiliu

brianarn:

"The main problem with hierarchical data of this nature is that in order for a simple get to work, you'd still have to do a full traversal of the data in order to index all items by identifier, and this traversal would ideally happen at initialization. If your data is flat, you simply loop. If it's not, you have to recurse."

Well, again I may miss something but in my eyes when an item gets added (store.put) its id (store.getIdentity) could be stored in a hash map that store.get could then use (as long as ids are immutable).

"but the need for a data structure that's truly hierarchical and not flattened is fairly uncommon"

I surely admit flat data structures are more frequent than hierarchical ones. That said, some application domains do use hierarchical data structures, for instance IBM Dojo Diagrammer, because they are a natural match given the data (nested diagrams/graphs in this case). Yes, such apps can also produce/convert a flat equivalent of the hierarchical data. I was simply thinking that also supporting hierarchical data would be a plus.


neonstalwart:

"you're misunderstanding the purpose of getChildren - it is for converting a flat structure into hierarchical data"

Thanks for the explanation. Yes, I wasn't understanding this way the doc sentence "Returns the children of an object." Both API and Reference doc of getChildren just say that (and the tutorial doesn't say anything), which somehow seems to open the door to wrong expectations ;-)

comment:20 Changed 6 years ago by dylan

Milestone: tbd1.10
Owner: changed from barnold to Adrian Vasiliu
Status: assignedpending

So what should we do with this bug? Update the documentation? Create a community store example that isn't general purpose?

comment:21 in reply to:  14 Changed 6 years ago by ben hockey

Owner: changed from Adrian Vasiliu to Brian Arnold
Status: pendingassigned

dylan,

the ticket was assigned to brian arnold because he was going to take a look at doing something whenever he had some time to get to it. it's probably very low priority but that was where we were up to with this ticket.

brianarn:

It does not seem unreasonable to have a reference implementation of a store with a getChildren method, though. In discussions with dozens of clients and hundreds of individuals at training classes, the vast majority of users are working with flat data that keys into parents, typically coming from some sort of RDBMS on the backend. As Kris mentioned, I'll implement a reference that works with the most common model that we see time and again with clients, and there may be great value in having that.

Changed 6 years ago by ben hockey

Attachment: hierarchyChildren.js added

Changed 6 years ago by ben hockey

Attachment: hierarchyParent.js added

comment:22 Changed 6 years ago by ben hockey

brian, i had to write these the other day for some new training materials so i'm attaching them in case they are useful to you. of course, the hardest part still remains - commenting, formatting, naming the files, etc :)

comment:23 Changed 5 years ago by dylan

Owner: changed from Brian Arnold to Kris Zyp
Priority: undecidedhigh

comment:24 Changed 5 years ago by dylan

Milestone: 1.101.11

comment:25 Changed 4 years ago by dylan

Owner: changed from Kris Zyp to dylan

dstore has this built in ( https://github.com/SitePen/dstore/blob/master/src/Tree.js ), and with dojo/store, it's as simple as the attached examples. Will consider adding to docs for the 1.11 release.

comment:26 Changed 4 years ago by dylan

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.