Opened 10 years ago
Closed 10 years ago
#12428 closed enhancement (fixed)
improve performance of docscripts
Reported by: | liucougar | Owned by: | liucougar |
---|---|---|---|
Priority: | high | Milestone: | 1.7 |
Component: | Doc parser | Version: | 1.6.0 |
Keywords: | Cc: | ||
Blocked By: | Blocking: |
Description (last modified by )
the generate.php script currently has two O(n2) complexity loops, by introducing a hash, they can be changed to O(n) complexity
in addition, the default file store is very inefficient: when it writes new information to a file, it creates a new tmp file for each new row of information, copying over all contents in the current file, then write new content, rename the tmp file to the current file
by again using hash, instead of writing to files, the performance can be dramatically improved (this only involves O(n) file IO, instead of O(n2) file IO proportional to how many symbols there are in the parsed code)
before the change, the:
time php generate.php --serialize=xml --store=file dojo dijit real 3m20.817s user 3m11.700s sys 0m8.309s (memory usage 2,601KB before the script exits)
after applying attached patch
time php generate.php --serialize=xml --store=hash dojo dijit real 0m57.803s user 0m56.968s sys 0m0.396s (memory usage 9,475KB before the script exits)
so the performance improvement is about 3x faster when parsing dojo and dijit with the patch than the current version. (the performance improvement is much greater if parsing dojo, dijit and dojox, because the current algorithm has two loops with O(n2) complexity, and file IO proportional to symbols in the parsed content)
it does require 3x memory than current approach
if dojo, dijit and dojox are parsed, before this patch:
time php generate.php --serialize=xml --store=file dojo dijit dojox real 60m13.970s user 56m41.373s sys 3m19.736s (memory usage 5,828KB before the script exits)
after the patch, parsing dojo, dijit and dojox takes:
time php generate.php --serialize=xml --store=hash dojo dijit dojox real 4m25.095s user 4m21.132s sys 0m1.856s (memory usage 35,650KB before the script exits)
it is 13.6x faster while using 6x memory
Attachments (1)
Change History (8)
Changed 10 years ago by
Attachment: | 12428.patch added |
---|
comment:1 Changed 10 years ago by
Description: | modified (diff) |
---|
comment:2 Changed 10 years ago by
Description: | modified (diff) |
---|
comment:3 Changed 10 years ago by
Description: | modified (diff) |
---|
comment:4 Changed 10 years ago by
Description: | modified (diff) |
---|
comment:5 Changed 10 years ago by
Description: | modified (diff) |
---|
comment:7 Changed 10 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
(In [24080]) fixes #12428: improve performance of generate.php by eliminating two O(n2) loops with the help of a hash
added a new hash storage (which becomes the default storage), which is much faster than the original default file storage (though it uses more memory). this hash storage uses the same on-disk-format as the file storage
the attached patch also fixes problems such as code block not showing up in http://dojotoolkit.org/api/1.5/dojo/Deferred