Opened 10 years ago

Last modified 10 years ago

#12428 closed enhancement

improve performance of docscripts — at Version 1

Reported by: liucougar Owned by: liucougar
Priority: high Milestone: 1.7
Component: Doc parser Version: 1.6.0
Keywords: Cc:
Blocked By: Blocking:

Description (last modified by liucougar)

the generate.php script currently has two O2 complexity loops, by introducing a hash, they can be changed to O complexity

in addition, the default file store is very inefficient: when it writes new information to a file, it creates a new tmp file for each new row of information, copying over all contents in the current file, then write new content, rename the tmp file to the current file

by again using hash, instead of writing to files, the performance can be dramatically improved (this only involves one read, one write file IO, instead of file IO proportional to how many symbols there are in the parsed code)

before the change, the:

time php generate.php --serialize=xml --store=file dojo dijit
real    3m20.817s
user    3m11.700s
sys     0m8.309s
(memory usage 2,601KB before the script exists)

after applying attached patch

time php generate.php --serialize=xml --store=hash dojo dijit
real    0m57.803s
user    0m56.968s
sys     0m0.396s
(memory usage 9,475KB  before the script exists)

so the performance improvement is about 3x faster when parsing dojo and dijit with the patch than the current version. (the performance improvement is much greater if parsing dojo, dijit and dojox, because the current algorithm has two loops with O2 complexity, and file IO proportional to symbols in the parsed content)

it does require 3x memory than current approach, which I think is well worth it

Change History (2)

Changed 10 years ago by liucougar

Attachment: 12428.patch added

comment:1 Changed 10 years ago by liucougar

Description: modified (diff)

the attached patch also fixes problems such as code block not showing up in

Note: See TracTickets for help on using tickets.