CoE 163

From Microlab Classes
Revision as of 10:29, 18 December 2020 by Louis Alarcon (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Computing Architectures and Algorithms

Advanced course on the foundations and techniques in high performance software development for signal processing and other numerical functions including transforms, filters, and basic linear algebra algorithms, taking into account memory hierarchy and other microarchitectural features.

  • Semester Offered: 2nd semester
  • Course Credit: Lecture: 3 units

Prerequisites

  • Math 40 (Linear Algebra)
  • EEE 121 (Data Structures and Algorithms for EEE)
  • EEE 153 (Computer Organization and Embedded Systems I)

Course Goal

  • To analyze the connection between algorithms, implementation and computer architecture; provide tools needed to write and apply fast numerical code; and to present representative fundamental numerical algorithms.

Content

This course introduces the foundations and techniques in high performance software development for signal processing and other numerical functionality including transforms, filters, and basic linear algebra algorithms. Topics include: fundamental tools in algorithm theory and analysis; fast signal processing and numerical algorithms; writing software that overcomes compiler limitations; memory hierarchy and other microarchitectural features and their role in software development; special instruction sets; and introduction to the concepts of self-adaptable software and software generators.

Introduction

  • Overview and motivation; Problem, algorithms, asymptotic analysis, divide-and-conquer algorithms
  • Asymptotic analysis (multiple variables), cost analysis, solving recurrences;
  • Architecture, microarchitecture, cache

Implementing Efficient Linear Algebra Operations

  • Runtime and performance, cache behavior of code; Linear algebra software, blocking, matrix-matrix multiplication (MMM)
  • Optimizing MMM for the Memory hierarchy, automatically tuned linear algebra software (ATLAS); Model-based ATLAS
  • Gauss elimination, LU factorization; Matrix inversion, determinant
  • Sparse matrix-vector multiplication (MVM), Sparsity/BeBOP; Singular value decomposition (SVD)

Architectural Features of Computing Systems

  • SIMD vector instructions
  • GPU SIMT instructions
  • Transforms, structured matrices, fast Fourier transform (FFT); From structured matrices to code, complex arithmetic, recursive and iterative FFT
  • Fast discrete Fourier transform (DFT), fastest FFT in the west (FFTW)
  • Parallelism; Shared memory parallelism, OpenMP
  • Spiral, library generator for transforms; Matlab, how it works, profiling and short performance guide, including C code
  • Filtering and convolution
  • Sorting; Optimized and adaptive sorting

References

  • Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.,Introduction to algorithms, MIT Press, Cambridge, MA, USA, 2001.
  • Cooley, J.W., Tukey, J.W., An algorithm for the machine calculation of complex Fourier series, Math. of Computation 19 (1965) 297–301.
  • D’Alberto, P., Milder, P.A., Sandryhaila, A., Franchetti, F., Hoe, J.C., Moura, J.M.F., Puschel, M., Johnson, J., Generating FPGA accelerated DFT libraries, IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), 2007.
  • Demmel, J.W., Applied numerical linear algebra, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1997.
  • F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R.W., Rizzolo, N., SPIRAL: Code generation for DSP transforms, Proc. of the IEEE 93 (2) (2005) 232–275 Special issue on Program Generation, Optimization, and Adaptation.
  • Frigo, M., Johnson, S.G., FFTW: An adaptive software architecture for the FFT, Proc. IEEE Int’l Conf. Acoustics, Speech, and Signal Processing (ICASSP). Volume 3. (1998) 1381–1384.
  • Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.,LAPACK Users’ Guide, 3rd ed. Society for Industrial and Applied Mathematics, Philadelphia, PA, 1999.
  • Hennessy, J. L., Patterson, D. A., Computer Architecture: A Quantitative Approach, 5th ed., Morgan Kaufmann, 2012.
  • Im, E.J., Yelick, K., Vuduc, R., Sparsity: Optimization framework for sparse matrix kernels, Int’l J. High Performance Computing Applications 18 (1) (2004) 135–158
  • Johnson, J.R., Johnson, R.W., Rodriguez, D., Tolimieri, R.,A methodology for designing, modifying, and implementing FFT algorithms on various architectures, Circuits Systems Signal Processing 9 (4) (1990) 449–500.
  • Milder, P.A., Franchetti, F., Hoe, J.C., Puschel, M., Formal datapath representation and manipulation for implementing DSP transforms, Design Automation Conference (DAC), 2008.
  • Nussbaumer, H.J., Fast Fourier Transformation and Convolution Algorithms, 2nd ed. Springer, 1982.
  • Press, W.H., Flannery, B.P., A., T.S., T., V.W., Numerical Recipes in C: The Art of Scientific Computing, 2nd ed. Cambridge University Press, 1992.
  • Puschel, M., Moura, J.M.F., Johnson, J., Padua, D., Veloso, M., Singer, B.W., Xiong, J., Franchetti, Tolimieri, R., An, M., Lu, C.,Algorithms for discrete Fourier transforms and convolution, 2nd ed. Springer, 1997.
  • Trefethen, L. N., Bau, D. Numerical Linear Algebra, SIAM, 1997.
  • Whaley, R.C., Dongarra, J., Automatically Tuned Linear Algebra Software (ATLAS), Proc. Supercomputing, 1998.