metagraph-optimization
Directed Acyclic Graph optimization for QED
Usage
For all the julia calls, use -t n to give julia n threads.
Instantiate the project first:
julia --project=./ -e 'import Pkg; Pkg.instantiate()'
Run Tests
To run all tests, run
julia --project=./ -e 'import Pkg; Pkg.test()' -O0
Run Examples
Get the correct environment for the examples folder:
julia --project=examples/ -e 'import Pkg; Pkg.develop(Pkg.PackageSpec(path=pwd())); Pkg.instantiate();'
Then execute a specific example:
julia --project=examples examples/<file>.jl -O3
Concepts
Generate Operations from chains
We assume we have a (valid) graph given. We can generate all initially possible graph operations from it, and we can calculate the graph properties like compute effort and total data transfer.
Goal: For some operation, regenerate possible operations after that one has been applied, but without having to copy the entire graph. This would be helpful for optimization algorithms to try paths of optimizations and build up tree structures, like for example chess computers do.
Idea: Keep the original graph, a list of possible operations at the current state, and a queue of applied operations together. The "actual" graph is then the original graph with all operations in the queue applied. We can push and pop new operations to/from the queue, automatically updating the graph's global metrics and possible optimizations from there.
Problems:
- The list of operations can be very large, it would be helpful to somehow separately know which possible operations changed after applying one.
- Operations need to be perfectly reversible, so we need to store replaced nodes and new nodes.
- Lots of testing required because mistakes will propagate and multiply.
Other TODOs
- Reduce memory footprint of the graph
- Memory layout of Nodes? They should lie linearly in memory, right now probably on heap?
- Add scaling functions
Benchmarks of graphs
For graphs AB->AB^n:
- Number of Sums should always be 1
- Number of ComputeTaskABC_S2 should always be (n+1)!
- Number of ComputeTaskABC_U should always be (n+3)
Times are from my home machine: AMD Ryzen 7900X3D, 64GB DDR5 RAM @ 6000MHz (not necessarily up to date, check Jupyter Notebooks in notebooks/ instead)
$ julia --project examples/import_bench.jl
AB->AB:
Graph:
  Nodes: Total: 34, DataTask: 19, ComputeTaskABC_P: 4, 
         ComputeTaskABC_S2: 2, ComputeTaskABC_V: 4, ComputeTaskABC_U: 4, 
         ComputeTaskABC_Sum: 1
  Edges: 37
  Total Compute Effort: 185
  Total Data Transfer: 102
  Total Compute Intensity: 1.8137254901960784
  Graph size in memory: 8.3594 KiB
  11.362 μs (522 allocations: 29.70 KiB)
AB->ABBB:
Graph:
  Nodes: Total: 280, DataTask: 143, ComputeTaskABC_P: 6, 
         ComputeTaskABC_S2: 24, ComputeTaskABC_V: 64, ComputeTaskABC_U: 6, 
         ComputeTaskABC_Sum: 1, ComputeTaskABC_S1: 36
  Edges: 385
  Total Compute Effort: 2007
  Total Data Transfer: 828
  Total Compute Intensity: 2.4239130434782608
  Graph size in memory: 88.2188 KiB
  95.234 μs (4781 allocations: 270.82 KiB)
AB->ABBBBB:
Graph:
  Nodes: Total: 7854, DataTask: 3931, ComputeTaskABC_P: 8, 
         ComputeTaskABC_S2: 720, ComputeTaskABC_V: 1956, ComputeTaskABC_U: 8, 
         ComputeTaskABC_Sum: 1, ComputeTaskABC_S1: 1230
  Edges: 11241
  Total Compute Effort: 58789
  Total Data Transfer: 23244
  Total Compute Intensity: 2.5292118396145242
  Graph size in memory: 2.0988 MiB
  2.810 ms (136432 allocations: 7.57 MiB)
AB->ABBBBBBB:
Graph:
  Nodes: Total: 438436, DataTask: 219223, ComputeTaskABC_P: 10, 
         ComputeTaskABC_S2: 40320, ComputeTaskABC_V: 109600, ComputeTaskABC_U: 10, 
         ComputeTaskABC_Sum: 1, ComputeTaskABC_S1: 69272
  Edges: 628665
  Total Compute Effort: 3288131
  Total Data Transfer: 1297700
  Total Compute Intensity: 2.53381444093396
  Graph size in memory: 118.4037 MiB
  463.082 ms (7645256 allocations: 422.57 MiB)
AB->ABBBBBBBBB:
Graph:
  Nodes: Total: 39456442, DataTask: 19728227, ComputeTaskABC_S1: 6235290, ComputeTaskABC_P: 12, ComputeTaskABC_U: 12, ComputeTaskABC_V: 9864100, ComputeTaskABC_S2: 3628800, ComputeTaskABC_Sum: 1
  Edges: 56578129
  Total Compute Effort: 295923153
  Total Data Transfer: 175407750
  Total Compute Intensity: 1.6870585991782006
  Graph size in memory: 12.7845 GiB
ABAB->ABAB:
Graph:
  Nodes: Total: 3218, DataTask: 1613, ComputeTaskABC_P: 8, 
         ComputeTaskABC_S2: 288, ComputeTaskABC_V: 796, ComputeTaskABC_U: 8, 
         ComputeTaskABC_Sum: 1, ComputeTaskABC_S1: 504
  Edges: 4581
  Total Compute Effort: 24009
  Total Data Transfer: 9494
  Total Compute Intensity: 2.528860332841795
  Graph size in memory: 891.375 KiB
  1.155 ms (55467 allocations: 3.09 MiB)
ABAB->ABC:
Graph:
  Nodes: Total: 817, DataTask: 412, ComputeTaskABC_P: 7,
         ComputeTaskABC_S2: 72, ComputeTaskABC_V: 198, ComputeTaskABC_U: 7,
         ComputeTaskABC_Sum: 1, ComputeTaskABC_S1: 120
  Edges: 1151
  Total Compute Effort: 6028
  Total Data Transfer: 2411
  Total Compute Intensity: 2.5002073828287017
  Graph size in memory: 225.0625 KiB
  286.583 μs (13996 allocations: 804.48 KiB)