14.2.2 Examples
In the following chart, we present the results of QW-Simulator GPU speed expressed in Millions processed FDTD cells per second [Mcells/s]. The time slot required for processing one cell depends on the total number of cells in the scenario. It usually decreases with the increasing number of cells. Thus we show the [Mcells/s] performance as a function of the total number of FDTD cells in particular scenarios. Five standard QW-3D scenarios (accessible from the QuickWave installation DVD) have been taken for comparison. In each case, the number of cells was adjusted by appropriate choice of the FDTD cell size.
For very small models the usage of highly accelerated QW-Simulator GPU version is unnecessary because of sufficient calculation efficiency on CPU. In the case of large models, the overall number of cells is limited by the GPU memory size, which currently (2018) for Nvidia Quadro GV100 is equal to 32 GB. The user can calculate models composed of approximately 120 mln FDTD cells on such a card.
It should be noted that CPU version speeds presented for comparison here have been obtained with a very fast contemporary CPU (Intel Desktop I7 950 processor) and with multithreaded OMP version of QuickWave. On older versions of CPUs or/and with a sequential QuickWave version the CPU performance would be significantly slower and thus the advantages of the QW-Simulator GPU version would be more pronounced.
Three different GPU cards are used in benchmarks:
· NVidia GTX 470 graphic card - 1280 MB memory, Max memory Bandwidth of 133.9 GB/s, GDDR5 memory DRAM type, Number of thread processors - 448, Compute capability 2.0
· NVidia GTX 580 graphic card - 3 GB memory, Max memory Bandwidth of 192.4 GB/s, GDDR5 memory DRAM type, Number of thread processors - 512, Compute capability 2.0
· NVidia GTX TITAN graphic card - 6 GB memory, Max memory Bandwidth of 288.4 GB/s, GDDR5 memory DRAM type, Number of thread processors - 2688, Compute capability 2.0
For example, it can be read from the charts that for the QW-V2D Cassegrain antenna model, one obtains the FDTD speed of about 2000 [Mcells/s] and a speed-up of 27 when comparing GPU GTX TITAN against CPU OMP I7 950. For the 3D beefburger model, one obtains the speed of about 1500 [Mcells/s] and a speed-up about 14.