Site Loader

The memory usage of the code implemented is not optimised. However, presented here are the methods implemented to address the challenges of memory limitations for ISPH on a GPU:
egin{itemize}
item To aid the issue of memory limitations on the GPU, mixed precision storage is used. With the exception of particle positions and CSR matrix arrays, all the particle data is stored in single precision. Particle positions, $mathbf{r}$, are stored in double precision to maintain accuracy in the kernel. Numerical experiments have concluded that the CSR matrix arrays are required to be stored in double precision in order for the linear solver to converge for simulations of approximately more than 500,000 particles.
item The majority of the code computations are performed in single precision, taking advantage of high single-precision FLOPS produced from Nvidia GPUs. For the CSR matrix arrays, calculations are performed with {fontfamily{pcr}selectfont float} register variables. The final values of the PPE matrix entries are cast to {fontfamily{pcr}selectfont double} for the matrix arrays. The library then performs the solving of the matrix using double precision. To minimise error in calculating the inverse of matrices for the MLS kernel and kernel gradient normalisation variables (Eqs~(
ef{eq:MLS}) and~(
ef{eq:KernelNormalistion}), {fontfamily{pcr}selectfont double} registers are created to perform the calculation and then cast to {fontfamily{pcr}selectfont float} for storage.
item Similar to the WCSPH DualSPHysics code, all memory is allocated before the main simulation execution to save time. In ISPH, it is unknown how many non-zero entries are required for the PPE matrix because of the moving computational points. However, given the kernel support size, it is possible to estimate the maximum value number of non-zero entries, $Nnz_{max}$, required for the matrix and therefore the memory allocation size of the CSR matrix arrays, {fontfamily{pcr}selectfont aValues} and {fontfamily{pcr}selectfont column}. In this work, $Nnz_{max}approx1.5N_{n,max}$, where $N_{n,max}$ is the maximum number of neighbours a particle has within a uniformly distributed arrangement. For example, for the 2-D cases in this work using the quintic spline, $N_{n,max}=44$ and therefore $Nnz_{max}=70$. The memory for the CSR arrays are subsequently allocated as follows:
egin{itemize}
item {fontfamily{pcr}selectfont aValues=new double$Nnz_{max} imes Np$;}
item {fontfamily{pcr}selectfont column=new unsigned$Nnz_{max} imes Np$;}
end{itemize}
This estimation is usually sufficient as the use of shifting will maintain an even distribution of particles throughout most of a simulation.
end{itemize}

Post Author: admin

x

Hi!
I'm Sonya!

Would you like to get a custom essay? How about receiving a customized one?

Check it out