Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
50 most recent check-ins
|
2026-02-10
| ||
| 23:42 | chore: final renames, internal rewrites & fixes, some renames reverted because of the latter note: better structured fragment setup, and use (def, make) feat: initial changes for the row reducers, spec, fragments, generator, operator note: added reducers `profile` and `rprofile` note: consider making band comands for these Leaf check-in: 20139ce33f user: aku tags: trunk | |
| 17:42 | chore: more renames, plus some internal rewrites check-in: 460dce4aee user: aku tags: trunk | |
|
2026-02-08
| ||
| 22:12 | chore: renamed most reducer-related files to mention `band`, as they are about by-band reduction note: this prepares for the appearance of row and column reducers check-in: 0630960fcb user: aku tags: trunk | |
| 21:28 | feat: performance work on the by-band statistics feat: exposing the by-band reducer variants (baseline, bands unrolled, + pixels 4-unrolled) to testing and benchmarking note: testing cross-checks the implementations against each other for equality chore: updated the notes and plots for reduction performance note: benchmarking is slow because each iteration of the reducer is given a new random input. this broader sampling of the input space should make the average a better estimate of implementation performance check-in: b05421b92c user: aku tags: trunk | |
|
2026-02-05
| ||
| 21:34 | fix: typos in kahan test case names check-in: 53e3c91c17 user: aku tags: trunk | |
| 17:58 | feat: extended `from sparse points` with a `geometry` parameter, like is already supported by `from sparse ranges` note: added examples, extended and updated the testsuite check-in: 8d0c9c446c user: aku tags: trunk | |
| 16:45 | fix: path typo in template, was not updated by preceding work. check-in: 087ccf9b20 user: aku tags: trunk | |
|
2025-12-04
| ||
| 15:59 | feat: moved kahan summation (KS) to macros for inlining. chore: inlined KS at all users. chore: benchmarked new state, compared to pre-inline. note: inline is a boost for the reductions using the KS. next: unroll the pixel loops. check-in: 0e92d3c874 user: aku tags: trunk | |
| 13:22 | fix: issue in kahan summation (typo, `+` had to be `+=`). chore: added testing support for kahan summation. chore: added tests for kahan summation, vs plain. fix: broken existing tests, updated results. fix: broken example images in the docs. note: none of the changes were visually apparent. check-in: 84098c2508 user: aku tags: trunk | |
| 11:17 | fix: reducer definitions (finalize, reduce), first use of this code. feat: first boost for by-band statistics. note: fully unrolled inner loop (bands) for #bands <= 4. note: generic loop for #bands > 4, inlined actual code. todo: look into inlining the kahan summation. todo: look into unrolling the pixel loops. check-in: 36af8a2b88 user: aku tags: trunk | |
| 09:01 | fix: trace code missed in [ae6d6a67c9] (region sharing fix). note: had to be updated for the changes to the block regions. note: owner/reader instead of single region. check-in: 3c3a23dc17 user: aku tags: trunk | |
|
2025-12-03
| ||
| 20:12 | tweak: shuffled the various generators and helper files around for nicer structure. chore: updated all users to match the new paths. check-in: bcc5907141 user: aku tags: trunk | |
| 19:41 | feat: reducer code generation - foundations. feat: reducer code generation - baseline implementation, reuses existing functions. feat: redirected by-band reducers to the generated code. feat: benchmarking support for reducer. feat: benchmarks for reducer, and baseline results. check-in: cc95f55707 user: aku tags: trunk | |
| 13:40 | note: the easy by-band simplifications of arg max/min apply to rows and columns as well. chore: updated tests idoc: annotations about unroll possibilities for by-row/column stats check-in: 00545fb259 user: aku tags: trunk | |
| 13:10 | feat: stats by band - added easy simplifications for arg::max/min note: single-band image -> result is const 0 idoc: annotations about unroll possibilities for by-band stats idoc: thoughts about arg::<rel> simplification for single-band input idoc: looks to require extensions to the simplifer dsl idoc: multi-input rewrite, dag output check-in: 194ca87028 user: aku tags: trunk | |
| 12:05 | tweak: dropped ternaries of the form `x ? 1 : 0` and `x ? 0 : 1`. note: These are `x` and `!x` respectively. note: and C auto-casts these ints to double on assignment. check-in: 06c0da1251 user: aku tags: trunk | |
| 11:46 | chore: removed all references to highway from the trunk, except in benchmark commentary. refactor: reworked the loop code generation. note: replaced few large fragments with lots of replication with more smaller fragments for specific parts. note: more work in assembly, less replication in the generator input. check-in: bfdd91612a user: aku tags: trunk | |
|
2025-12-02
| ||
| 17:04 | separating the scalar and simd-based loops from each other, i.e. use iteration over vector length as inner loop instead of outer has not changed the result in a material way. we now only know that it is the scalar loops mostly driving the cpu temperature, not the simd-loops. with this the simd experiment is closed. in the following I will work on unrolling more core loops as means of gaining performance. Leaf check-in: d9994631e7 user: aku tags: highway-simd-experiment | |
| 12:25 | benchmarked unrolled simd-loops to see if that gains us performance. result: no, it does not. each level of unrolling is has actually worse performance. I.e. 4-unrolled is slower than 2-unrolled, is slower than no unrolling at all. strongly suspecting that the simd loops encounter thermal throttling and more unrolling ==> more use of the vector unit ==> higher temps, more throttling, eating up any performance gains we may have had. and some behaviour of the scalar loops may indicate that even they get throttled, likely because vector length is the outer loop, interleaving the scalar and simd loops. check-in: 39b2e0f85e user: aku tags: highway-simd-experiment | |
|
2025-12-01
| ||
| 14:23 | fix: missed two, remove references to ops which were not compared. check-in: 2a9b817a3a user: aku tags: highway-simd-experiment | |
| 14:19 | fix: remove references to ops which were not compared. check-in: 0444d7616c user: aku tags: highway-simd-experiment | |
| 14:08 | benchmarking the new simd code, and comparing against the 4-unrolled super-scalar code the results heavily favor the latter, except for some ops and very large vectors. as noted in the associated README, this is likely because we have only 2 lanes for simd double ops, making it equivalent to 2-unrolled super-scalar code at best. check-in: 9dfb3154ab user: aku tags: highway-simd-experiment | |
| 14:03 | fixed all the issues in the pre-existing unused theoretical highway code (templates, generator, etc). with the fixes in place, and activated highway code is generated properly, compiles ok, links ok, loads properly, and returns correct results. disabled the vector.bench(mark) file, and corrected a typo in the chart defs. check-in: 55730fb71e user: aku tags: highway-simd-experiment | |
|
2025-11-28
| ||
| 17:48 | chore: put the benchmarks results for loop-unrolling of the vector math into the permanent record. note: even the readme is generated. check-in: 4c6a79be86 user: aku tags: trunk | |
| 17:01 | tweak: messaging emitted for generic file reader (with format detection) check-in: 7364530827 user: aku tags: trunk | |
| 17:00 | feat: reworked math vector gen to properly support production/benchmarking modes. note: benchmarking => gen everything (scalar, unroll 2, 4, highway). note: production => gen either highway or unroll 4, highway prefered. note: we are now prepped for experimenting with basic highway/simd support. note: will be compared against fasted scalar code (4-unroll), per current benchmarks. check-in: b44d9d7a6b user: aku tags: trunk | |
| 15:02 | feat: updated benchmarking to use the loop-unrolled forms as well. tweak: extended size range to log10 ==8, i.e. 100 million values. note: moved from static to heap allocated arrays. note: static was too large for linker, triggered relocation errors. check-in: 631d512abc user: aku tags: trunk | |
| 14:59 | tweak: moved function templates into separate files. feat: extended scalar templates with loop-unrolled variants (2, 4). tweak: reworked generator to generate multiple scalar implementations. note: some formatting changes, check-in: 514b0e3dca user: aku tags: trunk | |
| 14:56 | tweak: execution tracing, chart generation feat: abort with error on empty charts check-in: 2d6d9e3389 user: aku tags: trunk | |
|
2025-11-06
| ||
| 22:35 | chore: regenerated embedded documentation check-in: 14b2502b2a user: aku tags: trunk | |
| 22:34 | feat: added operator computing structure tensor feat: added command computing line energy from structure tensor feat: extended example dsl with override preventing prefixing the last line with the command to show note: enables post-processing around the explicit use of command to show check-in: 241a73153a user: aku tags: trunk | |
| 22:32 | fix: file format description, added data for new AKTIVELE, shifted old AKTIVE fix: missing document about pixel interpolation methods used by the various warp commands fix: added references to the new interpolation document check-in: ea1163c719 user: aku tags: trunk | |
|
2025-11-05
| ||
| 23:03 | chore: regenerated docs check-in: c303efad8e user: aku tags: trunk | |
| 23:02 | fix: missing closing brackets check-in: 9429dbad0a user: aku tags: trunk | |
|
2025-11-04
| ||
| 21:27 | chore: regenerated embedded documentation check-in: 2f13409408 user: aku tags: trunk | |
| 21:27 | fix: use of `xz` for standard binary blitter. xz does not work for images of different depths feat: reworked all unary math ops to use blitter with xz axis and vector function feat: reworked all binary math ops to use blitter with xz axis and vector function for images of identical depth note: falls back to standard blitter without xz to handle images with different depths chore: renamed the files holding unary simplification rules to match the changes unary code note: the work on unary here exposed the issue with shared regions in the runtime structures again and solved with commit [ae6d6a67c9] note: now that this is solved the vector work could be concluded as well future: trial use of highway/simd in the vector functions check-in: 2a8e20e766 user: aku tags: trunk | |
| 21:17 | feat: vector function utilities for benchmarking check-in: 5b04993921 user: aku tags: trunk | |
| 21:15 | feat: extended blit generator with vector function support for xz axis (hiding the xz loop in the functions) feat: made the vector functions of [746d6abd3b] available as blit vector functions note: done directly from the mathfunc spec check-in: 15d3e65af5 user: aku tags: trunk | |
| 21:09 | rework: moved code generator for css color database into separate file feat: created code generator for vector functions implementing the unary/binary math ops beware: the highway output is untested, and not yet used check-in: 746d6abd3b user: aku tags: trunk | |
| 21:02 | fixup: undone unwanted changes committed to build(-only) and trial.tcl check-in: 7b87443206 user: aku tags: trunk | |
| 21:01 | fixup: ensure use of global variable, regardless of context sourcing it check-in: 0609d03d50 user: aku tags: trunk | |
| 20:18 | fix: deep-seated conceptual problem with the sharing of regions in the runtime structure ware: fixed the 2 bad test results exposed by the work (morphology) note: issue was seen before, not fully understood, tracked it down on this occurence note: solved via on-demand creation of additional pixel storage in the problem nodes of the graph note: detailed explanation of issue and solution is found in the `region.h` comments chore: updated all the fetch call sites to match changed signatures chore: regenerated docs for accumulated changes. check-in: ae6d6a67c9 user: aku tags: trunk | |
|
2025-10-08
| ||
| 16:58 | feat: create new conversion helper. reads any supported format. writes supported base formats. fix: decoder for AKTIVE to handle AKTIVELE as well. check-in: fa29ff49c7 user: aku tags: trunk | |
| 16:24 | refactor: move color management helpers into separate file. feat: read image in any supported format, with format auto-detection. chore: make netpbm, aktive(-be) known as supported, and added tests. check-in: 6f0df5320e user: aku tags: trunk | |
| 07:01 | tweak: chart helper. report charts to merge, with rough size info check-in: 12f9f2ab5e user: aku tags: trunk | |
|
2025-10-04
| ||
| 11:16 | fixups ... fix: missing C-level / operator support for big-endian AKTIVE read/write fix: missing README for the new bench plot section. fix: plot definitions. fix: tooling for conversion into AKTIVE little- and big-endian variants. check-in: 191a343c47 user: aku tags: trunk | |
| 10:56 | ---*** BACKWARD INCOMPATIBILITY ***--- The default AKTIVE file format is now little-endian, with new magic `AKTIVELE`. The old big-endian format is still available via `aktive-be` commands. ---*** BACKWARD INCOMPATIBILITY ***--- chore: extended aktive reader/writer with little-endian functionality. chore: moved the old reader/writer commands to `aktive-be`. chore: move old big-endian test assets to `.aktive-be` files. feat: added little-endian support under the old `aktive` names. feat: added little-endian test assets under the old `.aktive`. chore: extended the tests handle little- and big-endian commands. feat: added benchmarks comparing the little- and big-endian readers. note: big-endian reading is measurably slower than little-endian, as expected, due to the byte-swapping it has to perform. chore: regenerated documentation. check-in: 46db736f89 user: aku tags: trunk | |
| 10:44 | feat: expanded reader/writer to support little endian, full symmetry reached. chore: updated netpbm reader to match swap macro changes. check-in: fba02b0566 user: aku tags: trunk | |
|
2025-10-02
| ||
| 22:39 | feat: switched standard blitters to `xz`, where possible. chore: removed clear-all, replaced by `memset` (see [0ad289a03b]). warn: unclear why the two examples changed. check-in: e4cccc51e7 user: aku tags: trunk | |
| 22:35 | feat: benchmarks comparing current `from value` against new implementation using `xz` axis. result: somewhat better, good enough to keep. also a step towards further vectorization of that kind of loop. check-in: 5df4b87603 user: aku tags: trunk | |