CoE 163

Computing Architectures and Algorithms

Advanced course on the foundations and techniques in high performance software development for signal processing and other numerical functions including transforms, filters, and basic linear algebra algorithms, taking into account memory hierarchy and other microarchitectural features.

Semester Offered: 2nd semester
Course Credit: Lecture: 3 units

Prerequisites

Math 40 (Linear Algebra)
EEE 121 (Data Structures and Algorithms for EEE)
EEE 153 (Computer Organization and Embedded Systems I)

Course Goal

To analyze the connection between algorithms, implementation and computer architecture; provide tools needed to write and apply fast numerical code; and to present representative fundamental numerical algorithms.

Content

This course introduces the foundations and techniques in high performance software development for signal processing and other numerical functionality including transforms, filters, and basic linear algebra algorithms. Topics include: fundamental tools in algorithm theory and analysis; fast signal processing and numerical algorithms; writing software that overcomes compiler limitations; memory hierarchy and other microarchitectural features and their role in software development; special instruction sets; and introduction to the concepts of self-adaptable software and software generators.

Introduction

Overview and motivation; Problem, algorithms, asymptotic analysis, divide-and-conquer algorithms
Asymptotic analysis (multiple variables), cost analysis, solving recurrences;
Architecture, microarchitecture, cache

Implementing Efficient Linear Algebra Operations

Runtime and performance, cache behavior of code; Linear algebra software, blocking, matrix-matrix multiplication (MMM)
Optimizing MMM for the Memory hierarchy, automatically tuned linear algebra software (ATLAS); Model-based ATLAS
Gauss elimination, LU factorization; Matrix inversion, determinant
Sparse matrix-vector multiplication (MVM), Sparsity/BeBOP; Singular value decomposition (SVD)

Architectural Features of Computing Systems

SIMD vector instructions
GPU SIMT instructions
Transforms, structured matrices, fast Fourier transform (FFT); From structured matrices to code, complex arithmetic, recursive and iterative FFT
Fast discrete Fourier transform (DFT), fastest FFT in the west (FFTW)
Parallelism; Shared memory parallelism, OpenMP
Spiral, library generator for transforms; Matlab, how it works, profiling and short performance guide, including C code
Filtering and convolution
Sorting; Optimized and adaptive sorting

References

Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.,Introduction to algorithms, MIT Press, Cambridge, MA, USA, 2001.
Cooley, J.W., Tukey, J.W., An algorithm for the machine calculation of complex Fourier series, Math. of Computation 19 (1965) 297–301.
D’Alberto, P., Milder, P.A., Sandryhaila, A., Franchetti, F., Hoe, J.C., Moura, J.M.F., Puschel, M., Johnson, J., Generating FPGA accelerated DFT libraries, IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), 2007.
Demmel, J.W., Applied numerical linear algebra, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1997.
F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R.W., Rizzolo, N., SPIRAL: Code generation for DSP transforms, Proc. of the IEEE 93 (2) (2005) 232–275 Special issue on Program Generation, Optimization, and Adaptation.
Frigo, M., Johnson, S.G., FFTW: An adaptive software architecture for the FFT, Proc. IEEE Int’l Conf. Acoustics, Speech, and Signal Processing (ICASSP). Volume 3. (1998) 1381–1384.
Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.,LAPACK Users’ Guide, 3rd ed. Society for Industrial and Applied Mathematics, Philadelphia, PA, 1999.
Hennessy, J. L., Patterson, D. A., Computer Architecture: A Quantitative Approach, 5th ed., Morgan Kaufmann, 2012.
Im, E.J., Yelick, K., Vuduc, R., Sparsity: Optimization framework for sparse matrix kernels, Int’l J. High Performance Computing Applications 18 (1) (2004) 135–158
Johnson, J.R., Johnson, R.W., Rodriguez, D., Tolimieri, R.,A methodology for designing, modifying, and implementing FFT algorithms on various architectures, Circuits Systems Signal Processing 9 (4) (1990) 449–500.
Milder, P.A., Franchetti, F., Hoe, J.C., Puschel, M., Formal datapath representation and manipulation for implementing DSP transforms, Design Automation Conference (DAC), 2008.
Nussbaumer, H.J., Fast Fourier Transformation and Convolution Algorithms, 2nd ed. Springer, 1982.
Press, W.H., Flannery, B.P., A., T.S., T., V.W., Numerical Recipes in C: The Art of Scientific Computing, 2nd ed. Cambridge University Press, 1992.
Puschel, M., Moura, J.M.F., Johnson, J., Padua, D., Veloso, M., Singer, B.W., Xiong, J., Franchetti, Tolimieri, R., An, M., Lu, C.,Algorithms for discrete Fourier transforms and convolution, 2nd ed. Springer, 1997.
Trefethen, L. N., Bau, D. Numerical Linear Algebra, SIAM, 1997.
Whaley, R.C., Dongarra, J., Automatically Tuned Linear Algebra Software (ATLAS), Proc. Supercomputing, 1998.

CoE 163

Contents

Prerequisites

Course Goal

Content

Introduction

Implementing Efficient Linear Algebra Operations

Architectural Features of Computing Systems

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools