Overview of Reconfigurable Computing Platforms and Their
Applications in Electromagnetics Applications

O. Kilic; M. Huang; M. Ujaldon; V. Demir; A. Z. Elsherbeni; M. Weldon; L. Maxwell; D. Cyca; M. Hughes; C. Whelan; M. Okoniewski; V. Demir; N. Gödel; N. Nunn; T. Warburton; M. Clemens; M. J. Inman; A. Z. Elsherbeni; C. J. Reddy; F. Rossi; C. McQuay; P. So; A. Capozzoli; C. Curcio; G. D\'Elia; A. Liseno; P. Vinetti; K. Chatterjee; M. Sandora; C. Mitchell; D. Stefan; D. Nummey; J. Poggie; H. Fangjing; N. Zaiping; H. Jun

ACES Home

ACES Publication Search

There are 13 search results for:

Title:	ACES April 2010 Full Journal
File Type:	Journal Paper
Issue:	Volume: 25 Number: 4 Year: 2010
Download Link:	Click here to download PDF File Size: 6648 KB

Title:	ACES April 2010 Front/Back Matter
File Type:	Journal Paper
Issue:	Volume: 25 Number: 4 Year: 2010
Download Link:	Click here to download PDF File Size: 319 KB

Title:	Overview of Reconfigurable Computing Platforms and Their Applications in Electromagnetics Applications
Abstract:	This paper investigates the utilization of field programmable gate arrays (FPGAs) in the acceleration of numerically intensive electromagnetics applications. We investigate the speed improvement by employing FPGAs for two different applications: (i) the optimization of a phased array antenna pattern by amplitude control using the ant colony optimization algorithm, (ii) implementation of the rigorous coupled wave (RCW) analysis technique for the design of engineered materials. The first application utilizes FPGAs as the only processor; i.e., all functionalities of the algorithm reside on the FPGA. The second one employs a hybrid hardware/software approach where the FPGA serves as a coprocessor to the CPU. The hybrid approach identifies the most numerically intensive part of the RCW algorithm and implements it on the FPGA. In both applications we demonstrate orders of magnitude of improvement in speed proving that FPGAs are highly flexible platforms suited well for the challenging electromagnetics problems. An overview of available FPGA platforms for scientific computing and how they compare are also presented in the paper.
Author(s):	O. Kilic, M. Huang
File Type:	Journal Paper
Issue:	Volume: 25 Number: 4 Year: 2010
Download Link:	Click here to download PDF File Size: 521 KB

Title:	Using GPUs for Accelerating Electromagnetic Simulations
Abstract:	The computational power and memory bandwidth of graphics processing units (GPUs) have turned them into attractive platforms for general-purpose applications at significant speed gains versus their CPU counterparts [1]. In addition, an increasing number of today's state-ofthe- art supercomputers include commodity GPUs to bring us unprecedented levels of performance in terms of raw GFLOPS and GFLOPS/cost. Inspired by the latest trends and developments in GPUs, we propose a new paradigm for implementing on GPUs some of the major aspects of electromagnetic simulations, a domain traditionally used as a benchmark to run codes in some of the most expensive and powerful supercomputers worldwide. After reviewing related achievements and ongoing projects, we provide a guideline to exploit SIMD parallelism and high memory bandwidth using the CUDA programming model and hardware architecture offered by Nvidia graphics cards at an affordable cost. As a result, performance gains of several orders of magnitude can be attained versus threadlevel methods like pthreads used to run those simulations on emerging multicore architectures
Author(s):	M. Ujaldon
File Type:	Journal Paper
Issue:	Volume: 25 Number: 4 Year: 2010
Download Link:	Click here to download PDF File Size: 309 KB

Title:	Compute Unified Device Architecture (CUDA) Based Finite- Difference Time-Domain (FDTD) Implementation
Abstract:	Recent developments in the design of graphics processing units (GPUs) have made it possible to use these devices as alternatives to central processor units (CPUs) and perform high performance scientific computing on them. Though several implementations of finitedifference time-domain (FDTD) method have been reported, the unavailability of high level languages to program graphics cards had been a major obstacle for scientists and engineers who would want to develop codes for graphics cards. Relatively recently, compute unified device architecture (CUDA) development environment has been introduced by NVIDIA and made GPU computing much easier. This paper presents an implementation of FDTD method based on CUDA. Two thread-to-cell mapping algorithms are presented. The details of the implementation are provided and strategies to improve the performance of the FDTD simulations are discussed.
Author(s):	V. Demir, A. Z. Elsherbeni
File Type:	Journal Paper
Issue:	Volume: 25 Number: 4 Year: 2010
Download Link:	Click here to download PDF File Size: 483 KB

Title:	A Practical Look at GPU-Accelerated FDTD Performance
Abstract:	This paper outlines several key features and conditions that impact the performance of FDTD on GPUs. It includes relevant performance measurements as well as practical suggestions on how to mitigate their impact. Among these factors are: PML depth, the number of unique materials, dispersive materials, the impact of field reads/observations, simulation orientation, and domain decomposition using multiple GPUs. The paper shows that the performance of FDTD on GPUs can be limited in certain extreme cases, but with proper care on the part of the designer these cases can be managed and maximum performance guaranteed.
Author(s):	M. Weldon, L. Maxwell, D. Cyca, M. Hughes, C. Whelan, M. Okoniewski
File Type:	Journal Paper
Issue:	Volume: 25 Number: 4 Year: 2010
Download Link:	Click here to download PDF File Size: 598 KB

Title:	A Stacking Scheme to Improve the Efficiency of Finite-Difference Time-Domain Solutions on Graphics Processing Units
Abstract:	Advances in computer hardware technologies accompanied by easy-to-use parallel programming software platforms have led to the wide spread use of parallel processing architectures, such as multi-core central processor units (CPUs) and graphic processing units (GPUs), in technical and scientific computing. Among electromagnetic numerical analysis methods, the finite-difference time-domain (FDTD) method is very well suited for parallel programming, and several implementations of FDTD have been developed and reported to solve electromagnetics problems orders of magnitude faster. Examination of performances of these implementations reveals that, in general, it is more efficient to solve larger FDTD domains than smaller domains. In this paper it is demonstrated that one can exploit the higher efficiency inherent to the solution of larger problem sizes to solve parameter sweep and optimization problems faster: instead of solving multiple smaller FDTD domains separately, these domains can be combined or stacked to form a larger problem and the large problem can be solved more efficiently. It has been shown that up to 40% faster solution can be achieved on GPUs with this method. Index Terms—FDTD methods,
Author(s):	V. Demir
File Type:	Journal Paper
Issue:	Volume: 25 Number: 4 Year: 2010
Download Link:	Click here to download PDF File Size: 302 KB

Title:	Accelerating Multi GPU Based Discontinuous Galerkin FEM Computations for Electromagnetic Radio Frequency Problems
Abstract:	A Graphics Processing Unit (GPU) accelerated simulation of Maxwell’s equations in the time domain is presented. The Discontinuous Galerkin Finite Element Method (DG-FEM) is used for discretization since the elementwise structure fits the parallelization design aspects of the GPU architecture and the NVIDIA Compute Unified Device Architecture (CUDA), a GPU programming model. The parallelization strategy for a multi-GPU setup using CUDA is focused. Several performance improvements are analyzed and investigated with the help of a realistic 3D electromagnetic scattering example.
Author(s):	N. Gödel, N. Nunn, T. Warburton, M. Clemens
File Type:	Journal Paper
Issue:	Volume: 25 Number: 4 Year: 2010
Download Link:	Click here to download PDF File Size: 1302 KB

Title:	CUDA Based LU Decomposition Solvers for CEM Applications
Abstract:	The use of graphical processing units to perform numerical computations required by electromagnetic analyses have been shown over the past several years significant increase in the computational speed. Most of the previous work concentrated on electromagnetic analyses that do not require matrix inversion. This paper uses the NVIDIA’s compute unified device architecture (CUDA) language to develop and modify routines for matrix solution based on the LU decomposition procedure to enhance and speed up a class of electromagnetic simulations. This implementation is utilizing the CPU and GPU for the inversion procedure. Various implementations for real, complex, single precision and double precision will be examined. The performance details of the developed LU decomposition routines especially for complex and double precision arithmetic are presented.
Author(s):	M. J. Inman, A. Z. Elsherbeni, C. J. Reddy
File Type:	Journal Paper
Issue:	Volume: 25 Number: 4 Year: 2010
Download Link:	Click here to download PDF File Size: 359 KB

Title:	GPU Based TLM Algorithms in CUDA and OpenCL
Abstract:	Recent advancements in graphics computing technology has brought highly parallel processing power to desktop computers. As multi-core multi-processor computing technology becomes mature, a new front in parallel computing technology based on graphics processing units has emerged. This paper reports a highly parallel symmetrical condensed node TLM procedure for the NVIDIA graphics processing units. The algorithm has been tested on three NVIDIA processors, from low-end laptop graphics card to highend workstation graphics processors.
Author(s):	F. Rossi, C. McQuay, P. So
File Type:	Journal Paper
Issue:	Volume: 25 Number: 4 Year: 2010
Download Link:	Click here to download PDF File Size: 1177 KB

Title:	Fast CPU/GPU Pattern Evaluation of Irregular Arrays
Abstract:	An approach for the fast analysis of “irregular”, i.e., of conformal, periodic or aperiodic, 2D arrays, based on the use of the pseries approach and Non-Uniform FFT (NUFFT) routines is proposed. The approach allows for modulating the computational burden depending on the array curvature and, thanks to the use of the NUFFT, the asymptotic growth of the computing time reduces to that of a few, standard FFTs. A sub-array partition strategy is also sketched and shown to further unburden the procedure and control the accuracy. The approach has been implemented in both sequential and parallel codes enabling its execution on CPUs and on costeffective, massively parallel computing platforms like Graphic Processing Units (GPUs). Its performance in terms of computational efficiency and accuracy has been assessed by an extensive numerical analysis and also against benchmarks provided by algorithms based on fast Matrix- Vector Multiplication routines.
Author(s):	A. Capozzoli, C. Curcio, G. D'Elia, A. Liseno, P. Vinetti
File Type:	Journal Paper
Issue:	Volume: 25 Number: 4 Year: 2010
Download Link:	Click here to download PDF File Size: 910 KB

Title:	A New Software and Hardware Parallelized Floating Random-Walk Algorithm for the Modified Helmholtz Equation Subject to Neumann and Mixed Boundary Conditions
Abstract:	A new floating random-walk algorithm for the one-dimensional modified Helmholtz equation subject to Neumann and mixed boundary conditions problems is developed in this paper. Traditional floating random-walk algorithms for Neumann and mixed boundary condition problems have involved “reflecting boundaries” resulting in relatively large computational times. In a recent paper, we proposed the elimination of the use of reflecting boundaries through the use of novel Green’s functions that mimic the boundary conditions of the problem of interest. The methodology was validated by a solution of the one-dimensional Laplace’s equation. In this paper, we extend the methodology to the floating random-walk solution of the onedimensional modified Helmholtz equation, and excellent agreement has been obtained between an analytical solution and floating random-walk results. The algorithm has been parallelized and a near linear rate of parallelization has been obtained with as many as thirty-two processors. These results have previously been published in [1]. In addition, a GPU implementation employing 4096 simultaneous threads displayed a similar near-linear parallelization gain and a one to two orders of magnitude improvement over the CPU implementation. An immediate application of this research is in the numerical solution of the electromagnetic diffusion equation in magnetically permeable and electrically conducting objects with applications in dielectrometry and magnetometry sensors that have the ability to detect sub-surface objects such as landmines. The ultimate goal of this research is the application of this methodology to the solution of aerodynamical flow problems.
Author(s):	K. Chatterjee, M. Sandora, C. Mitchell, D. Stefan, D. Nummey, J. Poggie
File Type:	Journal Paper
Issue:	Volume: 25 Number: 4 Year: 2010
Download Link:	Click here to download PDF File Size: 229 KB

Title:	An Efficient Parallel Multilevel Fast Multipole Algorithm for Large-scale Scattering Problems
Abstract:	In this paper, we present an efficient parallel multilevel fast multipole algorithm (MLFMA) for three dimensional scattering problems of large-scale objects. Several parallel implantation tricks are discussed and analyzed. Firstly, we propose a method that reduces truncation number without loss of accuracy. Furthermore, a matrix-sliced technique, allowing data in the memory transforming into the hard disk, is applied here, in order to solve the problem of extremely large targets. Finally, a transition level scheme is adopted to improve the parallel efficiency. We demonstrate the capability of our code by considering a sphere of 220? discretized with 48,879,411 unknowns and a square patch of 200? discretized with 10,150,143 unknowns. The bi-static RCS is calculated within 41.5 GB memory for the first object and 14.7 GB for the second one.
Author(s):	H. Fangjing, N. Zaiping, H. Jun
File Type:	Journal Paper
Issue:	Volume: 25 Number: 4 Year: 2010
Download Link:	Click here to download PDF File Size: 817 KB