Opened 8 years ago

Closed 8 years ago

#12428 closed enhancement (fixed)

improve performance of docscripts

Reported by: liucougar Owned by: liucougar
Priority: high Milestone: 1.7
Component: Doc parser Version: 1.6.0
Keywords: Cc:
Blocked By: Blocking:

Description (last modified by liucougar)

the generate.php script currently has two O(n2) complexity loops, by introducing a hash, they can be changed to O(n) complexity

in addition, the default file store is very inefficient: when it writes new information to a file, it creates a new tmp file for each new row of information, copying over all contents in the current file, then write new content, rename the tmp file to the current file

by again using hash, instead of writing to files, the performance can be dramatically improved (this only involves O(n) file IO, instead of O(n2) file IO proportional to how many symbols there are in the parsed code)

before the change, the:

time php generate.php --serialize=xml --store=file dojo dijit
real    3m20.817s
user    3m11.700s
sys     0m8.309s
(memory usage 2,601KB before the script exits)

after applying attached patch

time php generate.php --serialize=xml --store=hash dojo dijit
real    0m57.803s
user    0m56.968s
sys     0m0.396s
(memory usage 9,475KB  before the script exits)

so the performance improvement is about 3x faster when parsing dojo and dijit with the patch than the current version. (the performance improvement is much greater if parsing dojo, dijit and dojox, because the current algorithm has two loops with O(n2) complexity, and file IO proportional to symbols in the parsed content)

it does require 3x memory than current approach

if dojo, dijit and dojox are parsed, before this patch:

time php generate.php --serialize=xml --store=file dojo dijit dojox
real    60m13.970s
user    56m41.373s
sys     3m19.736s
(memory usage 5,828KB  before the script exits)

after the patch, parsing dojo, dijit and dojox takes:

time php generate.php --serialize=xml --store=hash dojo dijit dojox
real    4m25.095s
user    4m21.132s
sys     0m1.856s
(memory usage 35,650KB  before the script exits)

it is 13.6x faster while using 6x memory

Attachments (1)

12428.patch (8.8 KB) - added by liucougar 8 years ago.

Download all attachments as: .zip

Change History (8)

Changed 8 years ago by liucougar

Attachment: 12428.patch added

comment:1 Changed 8 years ago by liucougar

Description: modified (diff)

the attached patch also fixes problems such as code block not showing up in http://dojotoolkit.org/api/1.5/dojo/Deferred

comment:2 Changed 8 years ago by liucougar

Description: modified (diff)

comment:3 Changed 8 years ago by liucougar

Description: modified (diff)

comment:4 Changed 8 years ago by liucougar

Description: modified (diff)

comment:5 Changed 8 years ago by liucougar

Description: modified (diff)

comment:6 Changed 8 years ago by dante

+1. go go go.

comment:7 Changed 8 years ago by liucougar

Resolution: fixed
Status: newclosed

(In [24080]) fixes #12428: improve performance of generate.php by eliminating two O(n2) loops with the help of a hash

added a new hash storage (which becomes the default storage), which is much faster than the original default file storage (though it uses more memory). this hash storage uses the same on-disk-format as the file storage

Note: See TracTickets for help on using tickets.