1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
|
Known problems and areas to work on
===================================
* Not yet able to handle the specification of multiple projects
for one CVS repository. I.e. I can, for example, import all of
tcllib, or a single subproject of tcllib, like tklib, but not
multiple sub-projects in one go.
* We have to look into the pass 'InitCsets' and hunt for the
cause of the large amount of memory it is gobbling up.
Results from the first look using the new memory tracking
subsystem:
(1) The general architecture, workflow, is a bit wasteful. All
changesets are generated and kept in memory before getting
persisted. This means that allocated memory piles up over
time, with later changesets pushing the boundaries. This
is made worse that some of the preliminary changesets seem
to require a lot of temporary memory as part of getting
broken down into the actual ones. InititializeBreakState
seems to be the culprit here. Its memory usage is possibly
quadratic in the number of items in the changeset.
(2) A number of small inefficiencies. Like 'state eval' always
pulling the whole result into memory before processing it
with 'foreach'. Here potentially large lists.
(3) We maintain an in-memory map from tagged items to their
changesets. While this is needed later in the sorting
passes during the creation this is wasted space. And also
wasted time, to maintain it during the creation and
breaking.
Changes:
(a) Re-architect to create, break, and persist changesets one
by one, completely releasing all associated in-memory data
before going to the next. Should be low-hanging fruit with
high impact, as we have all the necessary operations
already, just not in that order, and that alone should
already keep the pile from forming, making the spikes of
(2) more manageable.
(b) Look into the smaller problems described in (2), and
especially (3). These should still be low-hanging fruit,
although of lesser effect than (a). For (3) disable the
map and its maintenace during construction, and put it
into a separate command, to be used when loading the
created changesets at the end.
(c) With larger effect, but more difficult to achieve, go into
command 'InitializeBreakState' and the preceding
'internalsuccessors', and rearchitect it. Definitely not a
low-hanging fruit. Possibly also something we can skip if
doing (a) had a large enough effect.
* Look at the dependencies on external packages and consider
which of them can be moved into the importer, either as a
simple utility command, or wholesale.
struct::list
assign, map, reverse, filter
|
<
<
|
<
<
|
<
<
<
<
<
<
<
<
<
<
<
<
|
<
<
<
<
<
>
|
|
|
|
|
|
|
<
|
|
>
|
|
|
|
|
>
|
|
<
|
|
<
<
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
|
Known problems and areas to work on
===================================
* Not yet able to handle the specification of multiple projects
for one CVS repository. I.e. I can, for example, import all of
tcllib, or a single subproject of tcllib, like tklib, but not
multiple sub-projects in one go.
* Consider to rework the breaker- and sort-passes so that they
do not need all changesets as objects in memory.
Current memory consumption after all changesets are loaded:
bwidget 6971627 6.6
cvs-memchan 4634049 4.4
cvs-sqlite 45674501 43.6
cvs-trf 8781289 8.4
faqs 2835116 2.7
libtommath 4405066 4.2
mclistbox 3350190 3.2
newclock 5020460 4.8
oocore 4064574 3.9
sampleextension 4729932 4.5
tclapps 8482135 8.1
tclbench 4116887 3.9
tcl_bignum 2545192 2.4
tclconfig 4105042 3.9
tcllib 31707688 30.2
tcltutorial 3512048 3.3
tcl 109926382 104.8
thread 8953139 8.5
tklib 13935220 13.3
tk 66149870 63.1
widget 2625609 2.5
* Look at the dependencies on external packages and consider
which of them can be moved into the importer, either as a
simple utility command, or wholesale.
struct::list
assign, map, reverse, filter
|