Prepare things to work with heterogeneity, make things work on GPU Reviewed-on: Rubydragon/MetagraphOptimization.jl#27 Co-authored-by: Anton Reinhard <anton.reinhard@proton.me> Co-committed-by: Anton Reinhard <anton.reinhard@proton.me>
63 lines
1.7 KiB
Plaintext
63 lines
1.7 KiB
Plaintext
CPU: AMD EPYC 7452 32 Cores, 64 Threads | 122.8 GFLOPS (?, source: https://www.cpubenchmark.net/cpu.php?cpu=AMD+EPYC+7452)
|
|
GPU: A30 24GB | 5.161 TFLOPS (source: https://www.techpowerup.com/gpu-specs/a30-pcie.c3792)
|
|
|
|
Benchmark Summary for QED Process: 'ke->ke':
|
|
Measured FLOPS by LIKWID: 5657.0
|
|
Total input size: 1.394 GiB
|
|
CPU, 64 threads
|
|
Time: 0.594810558
|
|
Rate: 1.681207548437632e7
|
|
GFLOPS: 88.57428190774921
|
|
GPU, NVIDIA A30
|
|
Time: 1.547648257
|
|
Rate: 6.461416510353748e6
|
|
GFLOPS: 34.041919930904314
|
|
|
|
Benchmark Summary for QED Process: 'ke->kke':
|
|
Measured FLOPS by LIKWID: 16256.0
|
|
Total input size: 1.768 GiB
|
|
CPU, 64 threads
|
|
Time: 1.294064702
|
|
Rate: 7.7275888790914565e6
|
|
GFLOPS: 116.99244828756034
|
|
GPU, NVIDIA A30
|
|
Time: 4.973188906
|
|
Rate: 2.0107822544072892e6
|
|
GFLOPS: 30.442398346629826
|
|
|
|
Benchmark Summary for QED Process: 'ke->kkke':
|
|
Measured FLOPS by LIKWID: 43433.0
|
|
Total input size: 2.632 GiB
|
|
CPU, 64 threads
|
|
Time: 3.232029091
|
|
Rate: 3.094031556784648e6
|
|
GFLOPS: 125.15398916399816
|
|
GPU, NVIDIA A30
|
|
Time: 14.597070187
|
|
Rate: 685068.9810963502
|
|
GFLOPS: 27.711131662091034
|
|
|
|
Benchmark Summary for ABC Process: 'AB->AB':
|
|
Measured FLOPS by LIKWID: 41.0
|
|
Total input size: 2.201 GiB
|
|
CPU, 64 threads
|
|
Time: 0.688079611
|
|
Rate: 1.453320203089116e7
|
|
GFLOPS: 0.5549390644454747
|
|
GPU, NVIDIA A30
|
|
Time: 0.013803574
|
|
Rate: 7.244500590933913e8
|
|
GFLOPS: 27.662564462822903
|
|
|
|
Benchmark Summary for ABC Process: 'AB->ABBB':
|
|
Measured FLOPS by LIKWID: 899.0
|
|
Total input size: 3.079 GiB
|
|
CPU, 64 threads
|
|
Time: 0.855687624
|
|
Rate: 1.1686507692204276e7
|
|
GFLOPS: 9.784633680518386
|
|
GPU, NVIDIA A30
|
|
Time: 0.014804518
|
|
Rate: 6.754694749265056e8
|
|
GFLOPS: 565.542893445984
|