# Cite Article

### Design and Evaluation of a Scalable Engine for 3D-FFT Computation in an FPGA Cluster

Choose citation format## BibTeX

@article{IJASEIT8308, author = {Roberto Ammendola and Pierpaolo Loreti}, title = {Design and Evaluation of a Scalable Engine for 3D-FFT Computation in an FPGA Cluster}, journal = {International Journal on Advanced Science, Engineering and Information Technology}, volume = {9}, number = {2}, year = {2019}, pages = {677--684}, keywords = {3D-FFT; FPGA; high-performance computing; cluster.}, abstract = {The Three Dimensional Fast Fourier Transform (3D-FFT) is commonly used to solve the partial differential equations describing the system evolution in several physical phenomena, such as the motion of viscous fluids described by the Navier–Stokes equations. Simulation of such problems requires the use of a parallel High-Performance Computing architecture since the size of the problem grows with the cube of the FFT size, and the representation of the single point comprises several double precision floating- point complex numbers. Modern High-Performance Computing (HPC) systems are considering the inclusion of FPGAs as components of this computing architecture because they can combine effective hardware acceleration capabilities and dedicated communication facilities. Furthermore, the network topology can be optimized for the specific calculation that the cluster must perform, especially in the case of algorithms limited by the data exchange delay between the processors. In this paper, we explore an HPC design that uses FPGA accelerators to compute the 3DFFT. We devise a scalable FFT engine based on a custom radix-2 double-precision core that is used to implement the Decimation in Frequency version of the Cooley–Tukey FFT algorithm. The FFT engine can be adapted to different technology constraints and networking topologies by adjusting the number of cores and configuration parameters in order to minimize the overall calculation time. We compare the various possible configurations with the technological limits of available hardware. Finally, we evaluate the bandwidth required for continuous FFT execution in the APEnet toroidal mesh network.

}, issn = {2088-5334}, publisher = {INSIGHT - Indonesian Society for Knowledge and Human Development}, url = {http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=8308}, doi = {10.18517/ijaseit.9.2.8308} }

## EndNote

%A Ammendola, Roberto %A Loreti, Pierpaolo %D 2019 %T Design and Evaluation of a Scalable Engine for 3D-FFT Computation in an FPGA Cluster %B 2019 %9 3D-FFT; FPGA; high-performance computing; cluster. %! Design and Evaluation of a Scalable Engine for 3D-FFT Computation in an FPGA Cluster %K 3D-FFT; FPGA; high-performance computing; cluster. %XThe Three Dimensional Fast Fourier Transform (3D-FFT) is commonly used to solve the partial differential equations describing the system evolution in several physical phenomena, such as the motion of viscous fluids described by the Navier–Stokes equations. Simulation of such problems requires the use of a parallel High-Performance Computing architecture since the size of the problem grows with the cube of the FFT size, and the representation of the single point comprises several double precision floating- point complex numbers. Modern High-Performance Computing (HPC) systems are considering the inclusion of FPGAs as components of this computing architecture because they can combine effective hardware acceleration capabilities and dedicated communication facilities. Furthermore, the network topology can be optimized for the specific calculation that the cluster must perform, especially in the case of algorithms limited by the data exchange delay between the processors. In this paper, we explore an HPC design that uses FPGA accelerators to compute the 3DFFT. We devise a scalable FFT engine based on a custom radix-2 double-precision core that is used to implement the Decimation in Frequency version of the Cooley–Tukey FFT algorithm. The FFT engine can be adapted to different technology constraints and networking topologies by adjusting the number of cores and configuration parameters in order to minimize the overall calculation time. We compare the various possible configurations with the technological limits of available hardware. Finally, we evaluate the bandwidth required for continuous FFT execution in the APEnet toroidal mesh network.

%U http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=8308 %R doi:10.18517/ijaseit.9.2.8308 %J International Journal on Advanced Science, Engineering and Information Technology %V 9 %N 2 %@ 2088-5334

## IEEE

Roberto Ammendola and Pierpaolo Loreti,"Design and Evaluation of a Scalable Engine for 3D-FFT Computation in an FPGA Cluster,"International Journal on Advanced Science, Engineering and Information Technology, vol. 9, no. 2, pp. 677-684, 2019. [Online]. Available: http://dx.doi.org/10.18517/ijaseit.9.2.8308.

## RefMan/ProCite (RIS)

TY - JOUR AU - Ammendola, Roberto AU - Loreti, Pierpaolo PY - 2019 TI - Design and Evaluation of a Scalable Engine for 3D-FFT Computation in an FPGA Cluster JF - International Journal on Advanced Science, Engineering and Information Technology; Vol. 9 (2019) No. 2 Y2 - 2019 SP - 677 EP - 684 SN - 2088-5334 PB - INSIGHT - Indonesian Society for Knowledge and Human Development KW - 3D-FFT; FPGA; high-performance computing; cluster. N2 -The Three Dimensional Fast Fourier Transform (3D-FFT) is commonly used to solve the partial differential equations describing the system evolution in several physical phenomena, such as the motion of viscous fluids described by the Navier–Stokes equations. Simulation of such problems requires the use of a parallel High-Performance Computing architecture since the size of the problem grows with the cube of the FFT size, and the representation of the single point comprises several double precision floating- point complex numbers. Modern High-Performance Computing (HPC) systems are considering the inclusion of FPGAs as components of this computing architecture because they can combine effective hardware acceleration capabilities and dedicated communication facilities. Furthermore, the network topology can be optimized for the specific calculation that the cluster must perform, especially in the case of algorithms limited by the data exchange delay between the processors. In this paper, we explore an HPC design that uses FPGA accelerators to compute the 3DFFT. We devise a scalable FFT engine based on a custom radix-2 double-precision core that is used to implement the Decimation in Frequency version of the Cooley–Tukey FFT algorithm. The FFT engine can be adapted to different technology constraints and networking topologies by adjusting the number of cores and configuration parameters in order to minimize the overall calculation time. We compare the various possible configurations with the technological limits of available hardware. Finally, we evaluate the bandwidth required for continuous FFT execution in the APEnet toroidal mesh network.

UR - http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=8308 DO - 10.18517/ijaseit.9.2.8308

## RefWorks

RT Journal Article ID 8308 A1 Ammendola, Roberto A1 Loreti, Pierpaolo T1 Design and Evaluation of a Scalable Engine for 3D-FFT Computation in an FPGA Cluster JF International Journal on Advanced Science, Engineering and Information Technology VO 9 IS 2 YR 2019 SP 677 OP 684 SN 2088-5334 PB INSIGHT - Indonesian Society for Knowledge and Human Development K1 3D-FFT; FPGA; high-performance computing; cluster. AB