ACES Publication Search
There are 13 search results for:
Title: | ACES April 2010 Full Journal |
File Type: | Journal Paper |
Issue: | Volume: 25      Number: 4      Year: 2010 |
Download Link: | Click here to download PDF File Size: 6648 KB |
Title: | ACES April 2010 Front/Back Matter |
File Type: | Journal Paper |
Issue: | Volume: 25      Number: 4      Year: 2010 |
Download Link: | Click here to download PDF File Size: 319 KB |
Title: | Overview of Reconfigurable Computing Platforms and Their Applications in Electromagnetics Applications |
Abstract: | This paper investigates the utilization of field programmable gate arrays (FPGAs) in the acceleration of numerically intensive electromagnetics applications. We investigate the speed improvement by employing FPGAs for two different applications: (i) the optimization of a phased array antenna pattern by amplitude control using the ant colony optimization algorithm, (ii) implementation of the rigorous coupled wave (RCW) analysis technique for the design of engineered materials. The first application utilizes FPGAs as the only processor; i.e., all functionalities of the algorithm reside on the FPGA. The second one employs a hybrid hardware/software approach where the FPGA serves as a coprocessor to the CPU. The hybrid approach identifies the most numerically intensive part of the RCW algorithm and implements it on the FPGA. In both applications we demonstrate orders of magnitude of improvement in speed proving that FPGAs are highly flexible platforms suited well for the challenging electromagnetics problems. An overview of available FPGA platforms for scientific computing and how they compare are also presented in the paper. |
Author(s): | O. Kilic, M. Huang |
File Type: | Journal Paper |
Issue: | Volume: 25      Number: 4      Year: 2010 |
Download Link: | Click here to download PDF File Size: 521 KB |
Title: | Using GPUs for Accelerating Electromagnetic Simulations |
Abstract: | The computational power and memory bandwidth of graphics processing units (GPUs) have turned them into attractive platforms for general-purpose applications at significant speed gains versus their CPU counterparts [1]. In addition, an increasing number of today's state-ofthe- art supercomputers include commodity GPUs to bring us unprecedented levels of performance in terms of raw GFLOPS and GFLOPS/cost. Inspired by the latest trends and developments in GPUs, we propose a new paradigm for implementing on GPUs some of the major aspects of electromagnetic simulations, a domain traditionally used as a benchmark to run codes in some of the most expensive and powerful supercomputers worldwide. After reviewing related achievements and ongoing projects, we provide a guideline to exploit SIMD parallelism and high memory bandwidth using the CUDA programming model and hardware architecture offered by Nvidia graphics cards at an affordable cost. As a result, performance gains of several orders of magnitude can be attained versus threadlevel methods like pthreads used to run those simulations on emerging multicore architectures |
Author(s): | M. Ujaldon |
File Type: | Journal Paper |
Issue: | Volume: 25      Number: 4      Year: 2010 |
Download Link: | Click here to download PDF File Size: 309 KB |
Title: | Compute Unified Device Architecture (CUDA) Based Finite- Difference Time-Domain (FDTD) Implementation |
Abstract: | Recent developments in the design of graphics processing units (GPUs) have made it possible to use these devices as alternatives to central processor units (CPUs) and perform high performance scientific computing on them. Though several implementations of finitedifference time-domain (FDTD) method have been reported, the unavailability of high level languages to program graphics cards had been a major obstacle for scientists and engineers who would want to develop codes for graphics cards. Relatively recently, compute unified device architecture (CUDA) development environment has been introduced by NVIDIA and made GPU computing much easier. This paper presents an implementation of FDTD method based on CUDA. Two thread-to-cell mapping algorithms are presented. The details of the implementation are provided and strategies to improve the performance of the FDTD simulations are discussed. |
Author(s): | V. Demir, A. Z. Elsherbeni |
File Type: | Journal Paper |
Issue: | Volume: 25      Number: 4      Year: 2010 |
Download Link: | Click here to download PDF File Size: 483 KB |
Title: | A Practical Look at GPU-Accelerated FDTD Performance |
Abstract: | This paper outlines several key features and conditions that impact the performance of FDTD on GPUs. It includes relevant performance measurements as well as practical suggestions on how to mitigate their impact. Among these factors are: PML depth, the number of unique materials, dispersive materials, the impact of field reads/observations, simulation orientation, and domain decomposition using multiple GPUs. The paper shows that the performance of FDTD on GPUs can be limited in certain extreme cases, but with proper care on the part of the designer these cases can be managed and maximum performance guaranteed. |
Author(s): | M. Weldon, L. Maxwell, D. Cyca, M. Hughes, C. Whelan, M. Okoniewski |
File Type: | Journal Paper |
Issue: | Volume: 25      Number: 4      Year: 2010 |
Download Link: | Click here to download PDF File Size: 598 KB |
Title: | A Stacking Scheme to Improve the Efficiency of Finite-Difference Time-Domain Solutions on Graphics Processing Units |
Abstract: | Advances in computer hardware technologies accompanied by easy-to-use parallel programming software platforms have led to the wide spread use of parallel processing architectures, such as multi-core central processor units (CPUs) and graphic processing units (GPUs), in technical and scientific computing. Among electromagnetic numerical analysis methods, the finite-difference time-domain (FDTD) method is very well suited for parallel programming, and several implementations of FDTD have been developed and reported to solve electromagnetics problems orders of magnitude faster. Examination of performances of these implementations reveals that, in general, it is more efficient to solve larger FDTD domains than smaller domains. In this paper it is demonstrated that one can exploit the higher efficiency inherent to the solution of larger problem sizes to solve parameter sweep and optimization problems faster: instead of solving multiple smaller FDTD domains separately, these domains can be combined or stacked to form a larger problem and the large problem can be solved more efficiently. It has been shown that up to 40% faster solution can be achieved on GPUs with this method. Index Terms—FDTD methods, |
Author(s): | V. Demir |
File Type: | Journal Paper |
Issue: | Volume: 25      Number: 4      Year: 2010 |
Download Link: | Click here to download PDF File Size: 302 KB |
Title: | Accelerating Multi GPU Based Discontinuous Galerkin FEM Computations for Electromagnetic Radio Frequency Problems |
Abstract: | A Graphics Processing Unit (GPU) accelerated simulation of Maxwell’s equations in the time domain is presented. The Discontinuous Galerkin Finite Element Method (DG-FEM) is used for discretization since the elementwise structure fits the parallelization design aspects of the GPU architecture and the NVIDIA Compute Unified Device Architecture (CUDA), a GPU programming model. The parallelization strategy for a multi-GPU setup using CUDA is focused. Several performance improvements are analyzed and investigated with the help of a realistic 3D electromagnetic scattering example. |
Author(s): | N. Gödel, N. Nunn, T. Warburton, M. Clemens |
File Type: | Journal Paper |
Issue: | Volume: 25      Number: 4      Year: 2010 |
Download Link: | Click here to download PDF File Size: 1302 KB |
Title: | CUDA Based LU Decomposition Solvers for CEM Applications |
Abstract: | The use of graphical processing units to perform numerical computations required by electromagnetic analyses have been shown over the past several years significant increase in the computational speed. Most of the previous work concentrated on electromagnetic analyses that do not require matrix inversion. This paper uses the NVIDIA’s compute unified device architecture (CUDA) language to develop and modify routines for matrix solution based on the LU decomposition procedure to enhance and speed up a class of electromagnetic simulations. This implementation is utilizing the CPU and GPU for the inversion procedure. Various implementations for real, complex, single precision and double precision will be examined. The performance details of the developed LU decomposition routines especially for complex and double precision arithmetic are presented. |
Author(s): | M. J. Inman, A. Z. Elsherbeni, C. J. Reddy |
File Type: | Journal Paper |
Issue: | Volume: 25      Number: 4      Year: 2010 |
Download Link: | Click here to download PDF File Size: 359 KB |
Title: | GPU Based TLM Algorithms in CUDA and OpenCL |
Abstract: | Recent advancements in graphics computing technology has brought highly parallel processing power to desktop computers. As multi-core multi-processor computing technology becomes mature, a new front in parallel computing technology based on graphics processing units has emerged. This paper reports a highly parallel symmetrical condensed node TLM procedure for the NVIDIA graphics processing units. The algorithm has been tested on three NVIDIA processors, from low-end laptop graphics card to highend workstation graphics processors. |
Author(s): | F. Rossi, C. McQuay, P. So |
File Type: | Journal Paper |
Issue: | Volume: 25      Number: 4      Year: 2010 |
Download Link: | Click here to download PDF File Size: 1177 KB |
Title: | Fast CPU/GPU Pattern Evaluation of Irregular Arrays |
Abstract: | An approach for the fast analysis of “irregular”, i.e., of conformal, periodic or aperiodic, 2D arrays, based on the use of the pseries approach and Non-Uniform FFT (NUFFT) routines is proposed. The approach allows for modulating the computational burden depending on the array curvature and, thanks to the use of the NUFFT, the asymptotic growth of the computing time reduces to that of a few, standard FFTs. A sub-array partition strategy is also sketched and shown to further unburden the procedure and control the accuracy. The approach has been implemented in both sequential and parallel codes enabling its execution on CPUs and on costeffective, massively parallel computing platforms like Graphic Processing Units (GPUs). Its performance in terms of computational efficiency and accuracy has been assessed by an extensive numerical analysis and also against benchmarks provided by algorithms based on fast Matrix- Vector Multiplication routines. |
Author(s): | A. Capozzoli, C. Curcio, G. D'Elia, A. Liseno, P. Vinetti |
File Type: | Journal Paper |
Issue: | Volume: 25      Number: 4      Year: 2010 |
Download Link: | Click here to download PDF File Size: 910 KB |
Title: | A New Software and Hardware Parallelized Floating Random-Walk Algorithm for the Modified Helmholtz Equation Subject to Neumann and Mixed Boundary Conditions |
Abstract: | A new floating random-walk algorithm for the one-dimensional modified Helmholtz equation subject to Neumann and mixed boundary conditions problems is developed in this paper. Traditional floating random-walk algorithms for Neumann and mixed boundary condition problems have involved “reflecting boundaries” resulting in relatively large computational times. In a recent paper, we proposed the elimination of the use of reflecting boundaries through the use of novel Green’s functions that mimic the boundary conditions of the problem of interest. The methodology was validated by a solution of the one-dimensional Laplace’s equation. In this paper, we extend the methodology to the floating random-walk solution of the onedimensional modified Helmholtz equation, and excellent agreement has been obtained between an analytical solution and floating random-walk results. The algorithm has been parallelized and a near linear rate of parallelization has been obtained with as many as thirty-two processors. These results have previously been published in [1]. In addition, a GPU implementation employing 4096 simultaneous threads displayed a similar near-linear parallelization gain and a one to two orders of magnitude improvement over the CPU implementation. An immediate application of this research is in the numerical solution of the electromagnetic diffusion equation in magnetically permeable and electrically conducting objects with applications in dielectrometry and magnetometry sensors that have the ability to detect sub-surface objects such as landmines. The ultimate goal of this research is the application of this methodology to the solution of aerodynamical flow problems. |
Author(s): | K. Chatterjee, M. Sandora, C. Mitchell, D. Stefan, D. Nummey, J. Poggie |
File Type: | Journal Paper |
Issue: | Volume: 25      Number: 4      Year: 2010 |
Download Link: | Click here to download PDF File Size: 229 KB |
Title: | An Efficient Parallel Multilevel Fast Multipole Algorithm for Large-scale Scattering Problems |
Abstract: | In this paper, we present an efficient parallel multilevel fast multipole algorithm (MLFMA) for three dimensional scattering problems of large-scale objects. Several parallel implantation tricks are discussed and analyzed. Firstly, we propose a method that reduces truncation number without loss of accuracy. Furthermore, a matrix-sliced technique, allowing data in the memory transforming into the hard disk, is applied here, in order to solve the problem of extremely large targets. Finally, a transition level scheme is adopted to improve the parallel efficiency. We demonstrate the capability of our code by considering a sphere of 220? discretized with 48,879,411 unknowns and a square patch of 200? discretized with 10,150,143 unknowns. The bi-static RCS is calculated within 41.5 GB memory for the first object and 14.7 GB for the second one. |
Author(s): | H. Fangjing, N. Zaiping, H. Jun |
File Type: | Journal Paper |
Issue: | Volume: 25      Number: 4      Year: 2010 |
Download Link: | Click here to download PDF File Size: 817 KB |