# metagraph-optimization Directed Acyclic Graph optimization for QED ## Usage For all the julia calls, use `-t n` to give julia `n` threads. Instantiate the project first: `julia --project=./ -e 'import Pkg; Pkg.instantiate()'` ### Run Tests To run all tests, run `julia --project=./ -e 'import Pkg; Pkg.test()' -O0` ### Run Examples Get the correct environment for the examples folder: `julia --project=examples/ -e 'import Pkg; Pkg.develop(Pkg.PackageSpec(path=pwd())); Pkg.instantiate();'` Then execute a specific example: `julia --project=examples examples/.jl -O3` ## Concepts ### Generate Operations from chains We assume we have a (valid) graph given. We can generate all initially possible graph operations from it, and we can calculate the graph properties like compute effort and total data transfer. Goal: For some operation, regenerate possible operations after that one has been applied, but without having to copy the entire graph. This would be helpful for optimization algorithms to try paths of optimizations and build up tree structures, like for example chess computers do. Idea: Keep the original graph, a list of possible operations at the current state, and a queue of applied operations together. The "actual" graph is then the original graph with all operations in the queue applied. We can push and pop new operations to/from the queue, automatically updating the graph's global metrics and possible optimizations from there. Problems: - The list of operations can be very large, it would be helpful to somehow separately know which possible operations changed after applying one. - Operations need to be perfectly reversible, so we need to store replaced nodes and new nodes. - Lots of testing required because mistakes will propagate and multiply. ## Other TODOs - Reduce memory footprint of the graph - Memory layout of Nodes? They should lie linearly in memory, right now probably on heap? - Add scaling functions ## Benchmarks of graphs For graphs AB->AB^n: - Number of Sums should always be 1 - Number of ComputeTaskABC_S2 should always be (n+1)! - Number of ComputeTaskABC_U should always be (n+3) Times are from my home machine: AMD Ryzen 7900X3D, 64GB DDR5 RAM @ 6000MHz (not necessarily up to date, check Jupyter Notebooks in `notebooks/` instead) ``` $ julia --project examples/import_bench.jl AB->AB: Graph: Nodes: Total: 34, DataTask: 19, ComputeTaskABC_P: 4, ComputeTaskABC_S2: 2, ComputeTaskABC_V: 4, ComputeTaskABC_U: 4, ComputeTaskABC_Sum: 1 Edges: 37 Total Compute Effort: 185 Total Data Transfer: 102 Total Compute Intensity: 1.8137254901960784 Graph size in memory: 8.3594 KiB 11.362 μs (522 allocations: 29.70 KiB) AB->ABBB: Graph: Nodes: Total: 280, DataTask: 143, ComputeTaskABC_P: 6, ComputeTaskABC_S2: 24, ComputeTaskABC_V: 64, ComputeTaskABC_U: 6, ComputeTaskABC_Sum: 1, ComputeTaskABC_S1: 36 Edges: 385 Total Compute Effort: 2007 Total Data Transfer: 828 Total Compute Intensity: 2.4239130434782608 Graph size in memory: 88.2188 KiB 95.234 μs (4781 allocations: 270.82 KiB) AB->ABBBBB: Graph: Nodes: Total: 7854, DataTask: 3931, ComputeTaskABC_P: 8, ComputeTaskABC_S2: 720, ComputeTaskABC_V: 1956, ComputeTaskABC_U: 8, ComputeTaskABC_Sum: 1, ComputeTaskABC_S1: 1230 Edges: 11241 Total Compute Effort: 58789 Total Data Transfer: 23244 Total Compute Intensity: 2.5292118396145242 Graph size in memory: 2.0988 MiB 2.810 ms (136432 allocations: 7.57 MiB) AB->ABBBBBBB: Graph: Nodes: Total: 438436, DataTask: 219223, ComputeTaskABC_P: 10, ComputeTaskABC_S2: 40320, ComputeTaskABC_V: 109600, ComputeTaskABC_U: 10, ComputeTaskABC_Sum: 1, ComputeTaskABC_S1: 69272 Edges: 628665 Total Compute Effort: 3288131 Total Data Transfer: 1297700 Total Compute Intensity: 2.53381444093396 Graph size in memory: 118.4037 MiB 463.082 ms (7645256 allocations: 422.57 MiB) AB->ABBBBBBBBB: Graph: Nodes: Total: 39456442, DataTask: 19728227, ComputeTaskABC_S1: 6235290, ComputeTaskABC_P: 12, ComputeTaskABC_U: 12, ComputeTaskABC_V: 9864100, ComputeTaskABC_S2: 3628800, ComputeTaskABC_Sum: 1 Edges: 56578129 Total Compute Effort: 295923153 Total Data Transfer: 175407750 Total Compute Intensity: 1.6870585991782006 Graph size in memory: 12.7845 GiB ABAB->ABAB: Graph: Nodes: Total: 3218, DataTask: 1613, ComputeTaskABC_P: 8, ComputeTaskABC_S2: 288, ComputeTaskABC_V: 796, ComputeTaskABC_U: 8, ComputeTaskABC_Sum: 1, ComputeTaskABC_S1: 504 Edges: 4581 Total Compute Effort: 24009 Total Data Transfer: 9494 Total Compute Intensity: 2.528860332841795 Graph size in memory: 891.375 KiB 1.155 ms (55467 allocations: 3.09 MiB) ABAB->ABC: Graph: Nodes: Total: 817, DataTask: 412, ComputeTaskABC_P: 7, ComputeTaskABC_S2: 72, ComputeTaskABC_V: 198, ComputeTaskABC_U: 7, ComputeTaskABC_Sum: 1, ComputeTaskABC_S1: 120 Edges: 1151 Total Compute Effort: 6028 Total Data Transfer: 2411 Total Compute Intensity: 2.5002073828287017 Graph size in memory: 225.0625 KiB 286.583 μs (13996 allocations: 804.48 KiB) ```