tame/test at a22e8e79f70108dc62c310d41ab7b13740ff9e5c - tame

employer

tame

History

Mike Gerwitz 1cdb3fbbc5 tamer: tameld: Skip fragment unescaping only to re-escape on write Fragments' text were unescaped on reading, producing an owned String and spending time parsing the text to unescape. We were then copying that into an internement pool (so, copying twice, effectively). Further, we were then _re-escaping_ on write. This was all wasteful, since we do not do any manipulation of the fragment before outputting to the xmle file; we know that Saxon produced properly escaped XML to begin with, and can trust to propagate it. This also introduces a new global `clone_uninterned_utf8_unchecked` method. In profiling this change, I tested (a) before this change, (b) after writing without escaping, and (c) after both reading escaped and writing without escaping. (a) (b) (c) sec mem (B) sec B sec B 0:00.95 47896 -> 0:00.91 47988 -> 0:00.87 48288 0:00.40 30176 -> 0:00.37 25656 -> 0:00.36 25788 0:00.39 45672 -> 0:00.37 45756 -> 0:00.35 34952 0:00.39 20716 -> 0:00.38 19604 -> 0:00.36 19956 0:00.33 16836 -> 0:00.32 16988 -> 0:00.31 16892 0:00.23 15268 -> 0:00.23 15236 -> 0:00.22 15312 0:00.44 20780 -> 0:00.44 20048 -> 0:00.41 20148 0:00.54 44516 -> 0:00.50 36964 -> 0:00.49 36728 0:00.62 55976 -> 0:00.57 46204 -> 0:00.54 41468 0:00.31 28016 -> 0:00.30 27308 -> 0:00.28 23844 0:00.23 15388 -> 0:00.22 15316 -> 0:00.21 15304 0:00.05 4888 -> 0:00.05 4760 -> 0:00.05 4948 0:00.41 19756 -> 0:00.41 19852 -> 0:00.40 19992 0:00.47 20828 -> 0:00.46 20844 -> 0:00.44 20968 0:00.27 18152 -> 0:00.26 18184 -> 0:00.25 18312 Interestingly, the peak memory usage increases very slightly between the second and third steps (though decreases from the first), likely because the raw (encoded) is larger than the unencoded text (e.g. `>` takes more space than `>`).	2021-08-18 11:39:06 -04:00
..
quick_xml	tamer: tameld: Skip fragment unescaping only to re-escape on write	2021-08-18 11:39:06 -04:00
mod.rs	Copyright year update 2021	2021-07-22 15:00:15 -04:00

Mike Gerwitz 1cdb3fbbc5 tamer: tameld: Skip fragment unescaping only to re-escape on write

Fragments' text were unescaped on reading, producing an owned String and
spending time parsing the text to unescape.  We were then copying that into
an internement pool (so, copying twice, effectively).

Further, we were then _re-escaping_ on write.

This was all wasteful, since we do not do any manipulation of the fragment
before outputting to the xmle file; we know that Saxon produced properly
escaped XML to begin with, and can trust to propagate it.

This also introduces a new global `clone_uninterned_utf8_unchecked` method.

In profiling this change, I tested (a) before this change, (b) after writing
without escaping, and (c) after both reading escaped and writing without
escaping.

     (a)              (b)              (c)
  sec   mem (B)    sec     B        sec     B
0:00.95 47896 -> 0:00.91 47988 -> 0:00.87 48288
0:00.40 30176 -> 0:00.37 25656 -> 0:00.36 25788
0:00.39 45672 -> 0:00.37 45756 -> 0:00.35 34952
0:00.39 20716 -> 0:00.38 19604 -> 0:00.36 19956
0:00.33 16836 -> 0:00.32 16988 -> 0:00.31 16892
0:00.23 15268 -> 0:00.23 15236 -> 0:00.22 15312
0:00.44 20780 -> 0:00.44 20048 -> 0:00.41 20148
0:00.54 44516 -> 0:00.50 36964 -> 0:00.49 36728
0:00.62 55976 -> 0:00.57 46204 -> 0:00.54 41468
0:00.31 28016 -> 0:00.30 27308 -> 0:00.28 23844
0:00.23 15388 -> 0:00.22 15316 -> 0:00.21 15304
0:00.05 4888  -> 0:00.05 4760  -> 0:00.05 4948
0:00.41 19756 -> 0:00.41 19852 -> 0:00.40 19992
0:00.47 20828 -> 0:00.46 20844 -> 0:00.44 20968
0:00.27 18152 -> 0:00.26 18184 -> 0:00.25 18312

Interestingly, the peak memory usage increases very slightly between the
second and third steps (though decreases from the first), likely because the
raw (encoded) is larger than the unencoded text (e.g. `&gt;` takes more
space than `>`).

2021-08-18 11:39:06 -04:00

quick_xml

tamer: tameld: Skip fragment unescaping only to re-escape on write

2021-08-18 11:39:06 -04:00

mod.rs

2021-07-22 15:00:15 -04:00