Abstract [eng] |
Understanding elastic wave (or acoustic or any other type of wave for that matter) phenomenon is of great importance in areas such as seismology or non destructive testing (NDT). This phenomenon in case of elastic environment is described by dynamic elastic differential equations. However, computational models like finite element method consumes huge amounts of computational power as even for relatively small problems require dividing area of interest into millions of elements. In the advent of general purpose GPU computing new opportunities for speeding up computations as well as challenges for developing high performance algorithms suited for new kinds of processors arise. Therefore this work concentrates on developing a finite element based short elastic wave propagation model on GPU as well as CPU. Central difference explicit wave equation integration scheme has been chosen. It then was slightly modified in order to separate integration algorithm into three phases: external force evaluation, evaluation of forces that occur due to stresses of elements and recalculation of node shifts, speeds and forces. A parallel algorithm has been developed for executing third and seconds phases, based on strategy suggested in [1]. Then the algorithm of the second phase has been optimized 2 times: at first the array of element node indices was eliminated yielding 20% performance boost, then modifications have been made to process elements in blocks by using strategy described at [22]. This caused another 60% reduction of computational time. Then the algorithm was modified again to allow introduction of cracks and irregularities in the model. This caused only 20% of computational time increase. Finally suggestions for effectively using 2 GPUs and 1 CPU has been made. The following conclusions have been drawn after analyzing the results: • When model size increases so should the work performed by one thread. In other words, the bigger the model, the more elements must be processed to achieve maximum processor efficiency. The exact amount of work performed by one thread can only be determined by experiments. • It is beneficial to try modifying whatever integration scheme is used to make parallelization of an algorithm easier. • Small parts of integration algorithm that contain most of calculations performed, should be additionally optimized by unrolling loops and avoiding similar programming simplifications to reduce number of operations performed. • More universal models and algorithms are not as efficient as algorithms developed for wave simulations in the environment with some specific restrictions. |