Hi list,
I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data  like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.
I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.
What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )
Thanks, 11 14248
John Henry wrote:
Hi list,
I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data  like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.
I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.
What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )
Thanks,
I make it 4 bins:
a_exclusive_keys
b_exclusive_keys
common_keys_equal_values
common_keys_diff_values
Something like:
a={1:1, 2:2,3:3,4:4}
b = {2:2, 3:3, 5:5}
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya  keyb
b_xclusive = keyb  keya
_common = keya & keyb
common_eq = set(k for k in _common if a[k] == b[k])
common_neq = _common  common_eq
If you now simple set arithmatic, it should read OK.
 Paddy.
Paddy wrote:
John Henry wrote:
Hi list,
I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data  like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.
I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.
What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )
Thanks,
I make it 4 bins:
a_exclusive_keys
b_exclusive_keys
common_keys_equal_values
common_keys_diff_values
Something like:
a={1:1, 2:2,3:3,4:4}
b = {2:2, 3:3, 5:5}
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya  keyb
b_xclusive = keyb  keya
_common = keya & keyb
common_eq = set(k for k in _common if a[k] == b[k])
common_neq = _common  common_eq
If you now simple set arithmatic, it should read OK.
 Paddy.
Thanks, that's very clean. Give me good reason to move up to Python
2.4.
John Henry wrote:
Paddy wrote:
John Henry wrote:
Hi list,
>
I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data  like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.
>
I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.
>
What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )
>
Thanks,
I make it 4 bins:
a_exclusive_keys
b_exclusive_keys
common_keys_equal_values
common_keys_diff_values
Something like:
a={1:1, 2:2,3:3,4:4}
b = {2:2, 3:3, 5:5}
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya  keyb
b_xclusive = keyb  keya
_common = keya & keyb
common_eq = set(k for k in _common if a[k] == b[k])
common_neq = _common  common_eq
If you now simple set arithmatic, it should read OK.
 Paddy.
Thanks, that's very clean. Give me good reason to move up to Python
2.4.
Oh, wait, works in 2.3 too.
Just have to:
from sets import Set as set
John Henry wrote:
Hi list,
I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data  like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.
I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.
What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )
Paddy has already pointed out a necessary addition to your requirement
definition: common keys with different values.
Here's another possible addition: you say that "each key contains a
dozen or so of records". I presume that you mean like this:
a = {1: ['rec1a', 'rec1b'], 42: ['rec42a', 'rec42b']} # "dozen" 2 to
save typing :)
Now that happens if the other dictionary contains:
b = {1: ['rec1a', 'rec1b'], 42: ['rec42b', 'rec42a']}
Key 42 would be marked as different by Paddy's classification, but the
values are the same, just not in the same order. How do you want to
treat that? avalue == bvalue? sorted(avalue) == sorted(bvalue)? Oh, and
are you sure the buckets don't contain duplicates? Maybe you need
set(avalue) == set(bvalue). What about 'rec1a' vs 'Rec1a' vs 'REC1A'?
All comparisons are equal, but some comparisons are more equal than
others :)
Cheers,
John
John,
Yes, there are several scenerios.
a) Comparing keys only.
That's been answered (although I haven't gotten it to work under 2.3
yet)
b) Comparing records.
Now it gets more fun  as you pointed out. I was assuming that there
is no short cut here. If the key exists on both set, and if I wish to
know if the records are the same, I would have to do record by record
comparsion. However, since there are only a handful of records per
key, this wouldn't be so bad. Maybe I just overload the compare
operator or something.
John Machin wrote:
John Henry wrote:
Hi list,
I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data  like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.
I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.
What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )
Paddy has already pointed out a necessary addition to your requirement
definition: common keys with different values.
Here's another possible addition: you say that "each key contains a
dozen or so of records". I presume that you mean like this:
a = {1: ['rec1a', 'rec1b'], 42: ['rec42a', 'rec42b']} # "dozen" 2 to
save typing :)
Now that happens if the other dictionary contains:
b = {1: ['rec1a', 'rec1b'], 42: ['rec42b', 'rec42a']}
Key 42 would be marked as different by Paddy's classification, but the
values are the same, just not in the same order. How do you want to
treat that? avalue == bvalue? sorted(avalue) == sorted(bvalue)? Oh, and
are you sure the buckets don't contain duplicates? Maybe you need
set(avalue) == set(bvalue). What about 'rec1a' vs 'Rec1a' vs 'REC1A'?
All comparisons are equal, but some comparisons are more equal than
others :)
Cheers,
John
John Machin wrote:
John Henry wrote:
Hi list,
I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data  like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.
I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.
What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )
Paddy has already pointed out a necessary addition to your requirement
definition: common keys with different values.
Here's another possible addition: you say that "each key contains a
dozen or so of records". I presume that you mean like this:
a = {1: ['rec1a', 'rec1b'], 42: ['rec42a', 'rec42b']} # "dozen" 2 to
save typing :)
Now that happens if the other dictionary contains:
b = {1: ['rec1a', 'rec1b'], 42: ['rec42b', 'rec42a']}
Key 42 would be marked as different by Paddy's classification, but the
values are the same, just not in the same order. How do you want to
treat that? avalue == bvalue? sorted(avalue) == sorted(bvalue)? Oh, and
are you sure the buckets don't contain duplicates? Maybe you need
set(avalue) == set(bvalue). What about 'rec1a' vs 'Rec1a' vs 'REC1A'?
All comparisons are equal, but some comparisons are more equal than
others :)
Cheers,
John
Hi Johns,
The following is my attempt to give more/deeper comparison info.
Assume you have your data parsed and presented as two dicts a and b
each having as values a dict representing a record.
Further assume you have a function that can compute if two record level
dicts are the same and another function that can compute if two values
in a record level dict are the same.
With a slight modification of my earlier prog we get:
def komparator(a,b, check_equal):
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya  keyb
b_xclusive = keyb  keya
_common = keya & keyb
common_eq = set(k for k in _common if check_equal(a[k],b[k]))
common_neq = _common  common_eq
return (a_xclusive, b_xclusive, common_eq, common_neq)
a_xclusive, b_xclusive, common_eq, common_neq = komparator(a,b,
record_dict__equality_checker)
common_neq = [ (key,
komparator(a[key],b[key], value__equality_checker) )
for key in common_neq ]
Now we get extra info on intra record differences with little extra
code.
Look out though, you could get swamped with data :)
 Paddy.
John Henry wrote:
John,
Yes, there are several scenerios.
a) Comparing keys only.
That's been answered (although I haven't gotten it to work under 2.3
yet)
(1) What's the problem with getting it to work under 2.3?
(2) Why not upgrade?
>
b) Comparing records.
You haven't got that far yet. The next problem is actually comparing
two *collections* of records, and you need to decide whether for
equality purposes the collections should be treated as an unordered
list, an ordered list, a set, or something else. Then you need to
consider how equality of records is to be defined e.g. case sensitive
or not.
>
Now it gets more fun  as you pointed out. I was assuming that there
is no short cut here. If the key exists on both set, and if I wish to
know if the records are the same, I would have to do record by record
comparsion. However, since there are only a handful of records per
key, this wouldn't be so bad. Maybe I just overload the compare
operator or something.
IMHO, "something" would be better than "overload the compare operator".
In any case, you need to DEFINE what you mean by equality of a
collection of records, *then* implement it.
"only a handful":. Naturally 0 and 1 are special, but otherwise the
number of records in the bag shoudn't really be a factor in your
implementation.
HTH,
John
John Machin wrote:
John Henry wrote:
John,
Yes, there are several scenerios.
a) Comparing keys only.
That's been answered (although I haven't gotten it to work under 2.3
yet)
(1) What's the problem with getting it to work under 2.3?
(2) Why not upgrade?
Let me comment on this part first, I am still chewing other parts of
your message.
When I do it under 2.3, I get:
common_eq = set(k for k in _common if a[k] == b[k])
^
SyntaxError: invalid syntax
Don't know why that is.
I can't upgrade yet. Some part of my code doesn't compile under 2.4
and I haven't got a chance to investigate further.
In <11*********************@h48g2000cwc.googlegroups. com>, John Henry
wrote:
When I do it under 2.3, I get:
common_eq = set(k for k in _common if a[k] == b[k])
^
SyntaxError: invalid syntax
Don't know why that is.
There are no generator expressions in 2.3. Turn it into a list
comprehension::
common_eq = set([k for k in _common if a[k] == b[k]])
Ciao,
Marc 'BlackJack' Rintsch
Thank you. That works.
Marc 'BlackJack' Rintsch wrote:
In <11*********************@h48g2000cwc.googlegroups. com>, John Henry
wrote:
When I do it under 2.3, I get:
common_eq = set(k for k in _common if a[k] == b[k])
^
SyntaxError: invalid syntax
Don't know why that is.
There are no generator expressions in 2.3. Turn it into a list
comprehension::
common_eq = set([k for k in _common if a[k] == b[k]])
Ciao,
Marc 'BlackJack' Rintsch
I have gone the whole hog and got something thats runable:
========dict_diff.py=============================
from pprint import pprint as pp
a = {1:{'1':'1'}, 2:{'2':'2'}, 3:dict("AA BB CC".split()), 4:{'4':'4'}}
b = { 2:{'2':'2'}, 3:dict("BB CD EE".split()), 5:{'5':'5'}}
def record_comparator(a,b, check_equal):
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya  keyb
b_xclusive = keyb  keya
_common = keya & keyb
common_eq = set(k for k in _common if check_equal(a[k],b[k]))
common_neq = _common  common_eq
return {"A excl keys":a_xclusive, "B excl keys":b_xclusive,
"Common & eq":common_eq, "Common keys neq
values":common_neq}
comp_result = record_comparator(a,b, dict.__eq__)
# Further dataon common keys, neq values
common_neq = comp_result["Common keys neq values"]
common_neq = [ (key, record_comparator(a[key],b[key], str.__eq__))
for key in common_neq ]
comp_result["Common keys neq values"] = common_neq
print "\na =",; pp(a)
print "\nb =",; pp(b)
print "\ncomp_result = " ; pp(comp_result)
==========================================
When run it gives:
a ={1: {'1': '1'},
2: {'2': '2'},
3: {'A': 'A', 'C': 'C', 'B': 'B'},
4: {'4': '4'}}
b ={2: {'2': '2'}, 3: {'C': 'D', 'B': 'B', 'E': 'E'}, 5: {'5': '5'}}
comp_result =
{'A excl keys': set([1, 4]),
'B excl keys': set([5]),
'Common & eq': set([2]),
'Common keys neq values': [(3,
{'A excl keys': set(['A']),
'B excl keys': set(['E']),
'Common & eq': set(['B']),
'Common keys neq values': set(['C'])})]}
 Paddy. This discussion thread is closed Replies have been disabled for this discussion. Similar topics
reply
views
Thread by William Stacey [MVP] 
last post: by

21 posts
views
Thread by Helge Jensen 
last post: by

59 posts
views
Thread by Chris Dunaway 
last post: by

2 posts
views
Thread by Locia 
last post: by

50 posts
views
Thread by lovecreatesbea... 
last post: by

6 posts
views
Thread by Tony 
last post: by

7 posts
views
Thread by shapper 
last post: by

21 posts
views
Thread by Peter Duniho 
last post: by

14 posts
views
Thread by Jukka K. Korpela 
last post: by
          