

# CoE 163

Computing Architectures and Algorithms

03b: x86 Assembly Introduction

## COMPUTER ARCHITECTURE

A computer architecture describes how a computer should work, from the hardware to the software.

This also includes any peripherals that may be connected to it.



## **COMPUTER ARCHITECTURE**

| OCBB:0100 |     |     |     |
|-----------|-----|-----|-----|
| OCBB:0103 | mov | ax, | CX  |
| OCBB:0105 | out | 70, | al  |
| OCBB:0107 | mov | ax, | 0   |
| OCBB:010A | out | 71, | al  |
| OCBB:010C | inc | CX  |     |
| OCBB:010D | cmp | cx, | 100 |
| ØCBB:0111 |     |     |     |
| ØCBB:0113 | int | 20  |     |
| ØCBB:0115 |     |     |     |
|           |     |     |     |

#### Instruction Set Architecture (ISA)

Abstract model that defines the what a computer should look like from a programmer's perspective.



#### Microarchitecture

Implementation of an ISA with several factors in consideration, such as component interconnection, cache and memory sizes, etc.



#### Implementation

Hardware design engineering - the microarch is converted into a concrete and tangible processor.

### **ISA CLASSIFICATION**

ISAs can be classified in different ways,. More commonly, they can be split into two types based on instruction complexity.

- Complex instruction set
- Reduced instruction set



#### **ISA: CISC**

A complex instruction set computer (CISC) has many specialized instructions to support different use cases.

This is easier to program, but hardware should be able to support the huge amount of instructions.



### **ISA: RISC**

A reduced instruction set computer (RISC) has a basic set of instructions

- at least enough for it to be Turing complete.

They are easy to implement in hardware, but software has to make do with the reduced number of instructions.



### **MAINSTREAM ISAs**



#### x86 (CISC)

Most commonly found in desktop and laptop computers



#### ARM (RISC)

Most commonly found in cellphones, tablets, and small single-board computers like the Raspberry Pi



#### **MIPS (RISC)**

Most commonly found in gaming consoles and small electronics such as routers

MIPS is recently abandoned in favor of RISC-V



## ASSEMBLY PROGRAMMING MODEL



#### x86\_64 ISA

In a past course, MIPS was introduced (which is currently being abandoned). However, desktop and laptops use 64-bit x86 processors.

To be able to maximize programs, we should also be able to read and understand programs written in the x86\_64 ISA.



#### x86\_64 ISA

An instruction can have at most 15 bytes. It is little-endian (i.e. legacy prefixes/LSB at lowest address).

Normal (32-bit) x86 are also encoded in the same way.





## x86\_64 ISA



#### x86\_64 REGISTERS

A *register* is a small memory cell within the CPU. There are 16 64-bit *general-purpose registers* in x86\_64.

Some of these registers have special purposes, such as the stack pointer rsi.

The registers have "root words" (a, b, c, d, si, di, bp, 8-15) affixed by a length identifier (r, e, w, d, h, l, b).





## x86\_64 REGISTERS

A portion of each of the 16 registers can be referred to by writing the relevant affix, such as getting a quad (8 bytes), double word (4 bytes), word (2 bytes), and byte.



### x86\_64 SYNTAX

Unlike MIPS, the destination is implied to contain one of the operands to instructions that require at least two operands

x86 has two major syntax flavors: AT&T and Intel.

- AT&T is common in Linux systems.
- Intel is common in Windows systems.

For the rest of the discussion, we will be using Intel syntax.





## x86\_64 SYNTAX

#### Intel

- Order is dest, src
- Uses registers as is
- Uses immediates as is
- Hex numbers are suffixed by "h"
- Memory dereferencing uses square brackets []
- Memory addressing uses math operations
- Uses type keywords (e.g. DWORD PTR, WORD PTR) to denote operand size

#### AT&T

- Order is src, dest
- Registers are prefixed by a percent %
- Immediates are prefixed by a dollar sign \$
- Hex numbers are prefixed by "0x"
- Memory dereferencing uses parentheses ()
- Memory addressing uses function syntax
- Uses operation affixes (e.g. movd, movl) to denote operand size



## x86\_64 SYNTAX

move a 32-bit value of 1 into eax

move the value in address ebx +
3 into eax

add the values in addresses ebx + ecx \* 0x2 and eax, then store the answer into eax

move the 64-bit value of 2 in address held by ebx

mov eax, 1
movl \$1, %eax

mov eax, [ebx + 3] movl 3(%ebx), eax

add eax, [ebx + ecx \* 2h] addl (%ebx, %ecx, 0x2), %eax

mov QWORD PTR [ebx], 2
movq \$2, (ebx)

#### x86\_64 ASSEMBLERS

There are several x86 assemblers available. The following lists the most popular ones

- GNU Assembler (GAS)
- Flat Assembler (FASM)
- Netwide Assembler (NASM)
- Microsoft Macro Assembler (MASM)



GCC supports writing and compiling of assembly code within C/C++.

It is also possible to write the assembly code separately as a . s file.





Subroutines in assembly should have an equivalent prototype in the C/C++ part of the code.

C++
// outside main()
extern "C" int add\_ints(int, int);
[...]
// in main()
int c = add\_ints(a, b)
cout << "add\_ints() returned " << c << endl;
[...]</pre>



Subroutines should be written using the calling convention suitable for the operating system. The Windows convention is used in this example





Using NASM, the asm code should be first compiled into object code. This object code is then used with gcc to compile the program into an executable.

\$ nasm.exe -f win64 hello.s
\$ g++ hello.cpp hello.obj -o hello

CMD

### RESOURCES

- <u>x86 introduction</u> from the University of Washington (US)
- <u>x86 reference</u> from the University of Virginia (US)





# CoE 163

Computing Architectures and Algorithms

03b: x86 Assembly Introduction