Opened 10 years ago

Closed 8 years ago

#8985 closed enhancement (patchwelcome)

Use full range of alpha characters in ShrinkSafe

Reported by: Adam Peller Owned by: Adam Peller
Priority: low Milestone: future
Component: ShrinkSafe Version: 1.3.0b3
Keywords: Cc: alex, dante, Adam Peller
Blocked By: Blocking:

Description (last modified by Eugene Lazutkin)

instead of toHexString in TokenMapper, perhaps we could have slightly shorter vars if we used the full range of alphanumerics, something like base64? There would be some build-time performance tradeoff.

Attachments (2)

master.diff (3.3 KB) - added by Eugene Lazutkin 10 years ago.
master2.diff (1.5 KB) - added by Eugene Lazutkin 10 years ago.

Download all attachments as: .zip

Change History (16)

comment:1 Changed 10 years ago by Eugene Lazutkin

Description: modified (diff)
Milestone: tbdfuture
Owner: changed from alex to Eugene Lazutkin
Status: newassigned

comment:2 Changed 10 years ago by James Burke

I believe I tried something like this before, at least keeping shorter names, but the end result was larger file sizes when gzipped, so be sure to test the gzipped resulting size.

comment:3 Changed 10 years ago by dante

@jburke - good point. a larger win would be if a scope could keep it's own token count, and know if the local goes up scope (to global) at all, and reset the counter.

eg: if every first-local variable was named _1 (because no closure magic was going on), we'd win in size and gzip repetition patterns.

comment:4 Changed 10 years ago by dante

but it is worth noting too, the rough numbers from uhop on the var sizes anyway (without auto _ prefix) 1-55: 1 char, 55-3400: 2 char, 3400+ 3 char. We currently have 0..256 as 1 char, but go to 3 char a 512 (I think) ... but yes. need to test gzip sizes too.

comment:5 Changed 10 years ago by Eugene Lazutkin

Cc: Richard Backhouse alex dante Adam Peller added
Keywords: rbackhouse alex removed

I implemented the algorithm I was talking about. In addition to that I needed to weed out accidentally generated keywords --- in tests I hit "if" and "in". For completeness, I check against the whole list of keywords and reserved words.

Now we need to check the resulting id against global names or any names from any other closures.

Most tests work fine, but I hit some strange errors that look like name clashes. Instead of committing the change I add the patch, so the problem can be investigated fully by dante and peller (added to CC).

comment:6 Changed 10 years ago by Eugene Lazutkin

It looks like right now we generate ids on per-file basis. Simple optimization: generate ids on per-function basis. I don't know if it is possible with the current architecture.

comment:7 Changed 10 years ago by Eugene Lazutkin

To clarify my previous comment:

In order to debug the problem I used

./build.sh action=release profile=standard optimize=shrinksafe.keepLines

While all files were compressed as specified, dojo.js was one huge honking line --- keepLines option was not honored. Looking inside the file revealed that instead of 1-character names, a lot of 2-character names were generated for local temps => it looks like all files of the base were concatenated than compressed without keeping lines. I expected to see that files are compressed individually, then concatenated. (And I wanted to see lines instead of one line.)

It all would be a moot point if names were generated on the per-function basis. On per-file basis all temporary variables are globally unique even if they were meant to be locally unique. The low-hanging optimization will be to compress files individually, then concatenate them --- in this case we will have more names reused => shorter names will be frequent, more chances to compress names with gzip.

comment:8 Changed 10 years ago by James Burke

To get line breaks in layers, use layerOptimize=shrinksafe.keepLines: the optimization code uses two different flags for layers vs. normal files because of historical purposes. It might be good to consolidate those two when we can break backward compatibility.

comment:9 Changed 10 years ago by Eugene Lazutkin

Obviously it is no good to have a dependence on "_" in names, and being forced to prefix all internal names with "_". The same goes for the globalization of local names described above. But we can still be more efficient than "_" + an ordinal number. master2.diff implements this patch, it is less code, and it works on our tests without other changes.

Changed 10 years ago by Eugene Lazutkin

Attachment: master.diff added

comment:10 Changed 10 years ago by Eugene Lazutkin

I updated the original patch as well to account for recent changes to TokenMapper.java.

Changed 10 years ago by Eugene Lazutkin

Attachment: master2.diff added

comment:11 Changed 10 years ago by Eugene Lazutkin

I updated the "practical" patch (master2.diff) again. But it needs more work: just like jburke said it generates smaller file sizes (2190 bytes smaller for dojo.js) in the plain text, but when gzipped the final size of dojo.js is 385 bytes bigger. It can be improved by generating better names (e.g., more prefix-oriented), and suggestions from previous comments.

comment:12 Changed 10 years ago by Adam Peller

Cc: Richard Backhouse removed
Owner: changed from Eugene Lazutkin to Richard Backhouse
Status: assignednew

comment:13 Changed 8 years ago by ben hockey

Owner: changed from Richard Backhouse to Adam Peller
Priority: highlow
Status: newpending

Much of this is accounted for by using closure rather than shrinksafe. Any objections to closing this?

If there's no response in 14 days the ticket will be closed automatically.

comment:14 Changed 8 years ago by Adam Peller

Resolution: patchwelcome
Status: pendingclosed
Note: See TracTickets for help on using tickets.