Traditional Culture Encyclopedia - Traditional stories - 64-bit extension technology Intel's 64-bit extension technology

64-bit extension technology Intel's 64-bit extension technology

A processor with 64-bit extension technology can run in the traditional IA-32 mode or IA-32e mode. The traditional IA-32 mode allows the processor to run in protected mode, real address mode or virtual 8086 mode.

IA-32E mode is a mode used by the processor when running a 64-bit operating system. A processor with 64-bit extension technology will initially enter the traditional page address and protection mode, and then, when a bit in the IA32-EFER register is set and PAE (Physical Address Extension) mode is enabled. The following table shows the operating modes supported by 64-bit extension technology and their differences.

1.IA-32e mode

IA-32e mode has two sub-modes: 64-bit mode and compatible mode. The IA-32e mode can only be entered when a 64-bit operating system is loaded.

2.64 bit mode

64-bit mode is used for 64-bit applications running in a 64-bit operating system. It supports the following functions:

Support 64-bit linear address structure; However, the IA-32 processor supporting 64-bit extension technology will be implemented with less than 64-bit addresses.

After the register is extended, it can be accessed with a new opcode prefix (REX).

The existing general registers are extended to 64 bits (RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP).

Eight new general registers (r8–r15)

Eight new 128 bit stream SIMD extension (SSE) registers (xmm8–xmm15)

64-bit instruction pointer (RIP)

A new addressing mode of RIP related data

A flat address space can be used for a single code, data and stack space.

Extended instructions and new instructions

Support physical addresses greater than 64GB; However, the actual physical address of IA-32 processor supporting 64-bit extension technology needs special implementation.

New interrupt priority control mechanism

64-bit mode can be used in operating systems based on code segments. Its default address size is 64 bits; Its default operation width size is 32 bits. Note that these default settings can be overridden in instruction pairs that use the new REX opcode prefix. When operating in 64-bit mode, the REX prefix allows you to specify 64-bit operands. Using this mechanism, many existing instructions are modified or redefined to allow 64-bit registers and 64-bit addresses.

3. Compatibility mode

Compatibility mode allows traditional 16-bit and 32-bit applications to run under 64-bit operating system without recompiling (however, traditional applications running in virtual 8086 mode or using hardware task management will not work). Just like 64-bit mode, the operating system enables compatibility mode in a special code segment. This means that 64-bit applications can run in the processor (64-bit mode), while 32-bit applications (not recompiled for 64-bit) can run in re-compatibility mode.

Compatibility mode is just like traditional protection mode. Applications can only access the first 4GB in the linear address space and handle standard IA-32 instruction prefixes and registers. REX prefix is not provided in compatibility mode. (REX prefix encoding has been processed into traditional IA-32 instruction) Compatibility mode must also use 16 bits and 32 bits of address and operand. Like the traditional protection mode, compatibility mode also allows applications to handle 64GB of physical storage using PAE (Physical Address Extension).

Compatibility mode does not support the following items in traditional protection mode.

Virtual 8086 mode, task switching and stack parameter copying functions are not available in compatible mode.

From the point of view of operating system, 64-bit computer system is used instead of 32-bit mechanism to deal with system data structure, address translation, interrupt and exception handling.

4. Traditional mode

Why is Intel doing this now?

Traditional modes include protected mode, real address mode and virtual 8086 mode. The existing software written for these modes can run on IA-32 processors with 64-bit extension technology.

5. System management mode

System Management Mode (SMM) provides the same execution environment as the system management interrupt (SMI) handler in the traditional IA-32 architecture. SMM supports the conversion from one mode to another (including IA-32e and traditional mode). SMI handler can handle any physical storage page through PSE mechanism. However, because PAE is not supported, SMM environment does not support 64-bit linear address. For the transaction submitted to SMI, the processor will switch to SMM and store the state of the memory in SM RAM according to SMM save mapping. Therefore, the SMI handler will be executed in the same environment as the traditional IA-32 architecture. The following table compares the differences of register data structures between applications running in 64-bit mode and those running in traditional IA-32 environment. Traditional environments include existing IA-32 processors, processors supporting 64-bit extension technology and processors in IA-32e compatibility mode. Compatibility mode applications cannot run in 64-bit mode or 64-bit operating system, so applications need to run in the traditional IA-32 protected mode environment.

1. General Register (GPRS)

The IA-32 architecture has eight general-purpose registers when running in traditional or compatible mode. Ax, bx, CX, dx, di, si, BP and sp are valid for 16-bit operands, while EAX, EBX, ECX, EDX, EDI, ESI, EBP and ESP are valid for 32-bit operands.

In 64-bit mode, the default operand is 32 bits, but GPRs can be used for 32-bit and 64-bit operands. If 32-bit operands EAX, EBX, ECX, EDX, EDI, ESI, EBP, ESP, R8d-R 15d are available, if 64-bit operands RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, R8-R/kloc-0 are available. All these registers can have four levels: byte, word, doubleword and quadword. The division of these levels mainly depends on the REX prefix.

In 64-bit mode, instructions will be restricted from accessing byte registers, and instructions cannot use traditional high bytes (such as AH and BH) at the same time. CH, DH) and a new byte register (such as the low byte of the RAX register). However, instructions can use either traditional low-order bytes (such as al, BL, CL or DL) or new byte registers (such as R8 register or RBP). This structure will force everyone to abide by the above restrictions and convert the use of high bytes (AH, BH, CH, DH) into low bytes (BPL, SPL, DIL, SIL, these are the lower 8 bits of RBP, R SP, RDI and RSI).

In 64-bit mode, the operand size determines the significant number of the target GPR:

64-bit operands generate 64-bit results to the target general register.

The 32-bit operand produces a 32-bit result, and the 64-bit result is written into the target general register by the 0 extension method.

The 8-bit and 16-bit operands produce an 8-bit or 16-bit result. The upper 56 or 48 bits of the target general register will not be modified during operation. If the result of an 8-bit or 16-bit operand is used for 64-bit address calculation, it will be sign extended to 64 bits.

Since the upper 32 bits of the 64-bit general register are not defined in the 32-bit mode, when switching from the 64-bit mode to any 32-bit mode (such as legacy mode or compatibility mode), the data of the upper 32 bits will not be retained. Similarly, after 64-bit conversion to 32-bit mode, the software does not need to store data with these undefined high byte bits. These values are switched from one hardware implementation to the next or from one cycle to the next.

2. Streaming SIMD extension (SSE) register

In compatible and legacy modes, SSE register group consists of 8 legacy registers XMM0-XMM7 with 128 bits. In 64-bit mode, there are eight additional 128-bit SSE registers, namely XMM8-XMM 15. These memories are accessed by using the REX instruction prefix. The XMM register can be used for SSE, SSE2 and SSE3 instructions in any mode.

3. System registers

The introduction of 64-bit new registers also changes the existing system registers. They are:

MSRs。 The extended function allows MSR(IA-32_EFER) to contain those bits that control, enable and disable the functions of 64-bit extended technology.

Control register. All control registers are extended to 64 bits, and a new control register (task priority register CR8 or TPR) is added.

Descriptor table register. The global descriptor table register (GDTR) and the interrupt descriptor table register (IDTR) are extended to 10 bytes, so they can contain all 64-bit addresses. The local descriptor table register (LDTR) and the task register are also extended to include 64-bit addresses.

Debug register. The debug register is extended to 64 bits.

1) extended function permission register (IA-32_EFER)

LMA(IA-32e mode activated, bit 10): This bit is a read-only status bit, and any writing operation to this bit will be ignored. When IA-32e mode and page management are allowed, the processor will set this bit to 1, indicating that the processor is running in compatibility mode or 64-bit mode, depending on the values of L bit and D bit of the code segment descriptor. When LMA=0, the processor runs in the traditional mode. In this mode, the processor behaves like a standard 32-bit IA-32 processor.

LME (Allow IA-32e mode, bit 8): Setting this bit to 1 can switch the function of the processor to IA-32e mode, but IA-32e mode is not really activated, and it will only be activated when the software enables PAE mode for page management. When PAE page management is allowed and LME is set to 1, the processor sets the LMA bit to 1, which indicates that IA-32e mode is not only allowed, but also activated. All other reserved bits of IA32_EFER must be bit 0.

SCE (Allow Syscall/Sysret, bit 0): Setting this bit to 1 will support Syscall/Sysret. Syscall/Sysret is only supported in 64 mode. The operating system is responsible for enabling it for 64-bit operation.

2) Control register

In the 64-bit expansion mode, the control memories CR0-CR4 are expanded to 64 bits. In 64-bit mode, the MOV CRn instruction reads or writes all 64 bits of these registers. Operand width prefix ignored. Compatible with the traditional mode, the upper 32 bits of the control register are all filled with zeros, and the read control register only returns the lower 32 bits.

In 64-bit mode, the upper 32 bits of CR0 and CR4 are reserved bits, and 0 must be written. The result of writing any upper 32 bits is a general protection exception, #GP(0). All 64 bits of CR2 can be written by software. Bit [5 1:40] of CR3 is reserved and must be 0. However, the MOV CRn instruction does not check whether the address written to CR2 or CR3 is within the implementation boundary of linear address or physical address.

64 introduces a new control register-—CR8 for the extended structure, which is defined as task priority register (TPR). The operating system can use TPR to control whether external interrupts are allowed to interrupt the processor according to the priority of the interrupts.

3) descriptor table register

Four system descriptor table registers (GDTR, IDTR, LDTR and TR) are extended to accommodate 64-bit base addresses. This allows the operating system running in IA-32e mode to locate the descriptor table anywhere in the available linear address space. The following table shows these four registers. In all cases, the base address must conform to the standard format. The linear and physical address bits can be determined by executing CPUID and setting 80000008H by EAX.

4) Debugging register

In 64-bit mode, the debug registers DR0-DR7 are 64-bit. The MOV DRn instruction reads or writes all 64 register bits. Operand width prefix ignored.

On the IA-32e platform, all the upper 32 bits of 16-bit mode or 32-bit mode (legacy mode or compatible mode) are filled with zeros when writing debug registers, and only the lower 32 bits are returned when reading debug registers. In 64-bit mode, the upper 32 bits of DR6 and DR7 are reserved bits and must be 0. Writing 1 into any of the upper 32 bits will cause #GP(0) exception.

All 64 bits of DR0-DR3 are software writable. However, the MOV DRn instruction does not check whether the address written to DR0-DR3 is within the limit of linear address. Address matching is only supported when the processor generates a valid address. Address width and operand width prefix.

In 64-bit mode, the default address width is 64 bits and the default operand width is 32 bits. The address width and operand width prefixes allow 32-bit and 64-bit data and addresses to be mixed in an instruction sequence. The following table (1-7) shows the address width of the instruction prefix required for IA-32e mode. Please note that 64-bit mode does not support 16-bit address. In the beloved content and traditional mode, the function of address width is the same as that in IA-32 transmission architecture.

In 64-bit mode, the default operand width is 32 bits, and the REX prefix includes 4 fields to specify 16 different values. The w bit field of the REX prefix is designated as REX. When rex. W= 1, and the prefix indicates that operand 64 is an operand. Note that the software can still switch to the 16 bit operation width by using the prefix with the operand width of 66H. However, if the prefix REX. W and 66H are used at the same time, which is the priority of REX. W is higher.

In the case of SSE/SSE2/SSE3 SIMD instructions, the 66H, F2H and F3H prefixes are used as opcode extensions and are considered as part of the instruction. In these cases, there is no correlation between effective rex. W prefix and 66H code extension prefix.

2.REX prefix

REX prefix is a new instruction prefix byte introduced in 64-bit mode. It performs the following tasks:

Specify new GPRs and SSE registers

Specify a 64-bit code width

Specify extended control register (for system software only)

Not all instructions require the REX prefix. This prefix is only required if the instruction references an extended register or uses a 64-bit operand. If the prefix is put in an unnecessary place, it will be ignored.

An instruction can only have one REX prefix. Once used, this prefix must be placed directly before the opcode byte or double-byte opcode extension prefix. REX prefixes in other locations will be ignored.

Instructions containing the REX prefix must still follow the traditional instruction width limit of 15 bytes. The following figure describes how the REX prefix conforms to the byte order of instructions.

3. New coding of control and debugging registers

In 64-bit mode, extra codes are specified for controlling machine memory and debugging registers. When the domain of the ModRM register encodes a control or debug register, REX. The r bit is used to modify these fields. These codes allow the processor to access CR8-CR 15 and DR8-DR 15.

A control register (CR8) is added in 64-bit mode. CR8 becomes the task priority register (TPR). When IA-32e technology was first implemented, neither CR9-CR 15 nor DR8-DR 15 were implemented, and accessing them would lead to invalid code exception (#UD).

4. New instructions

The following new instructions were introduced in 64-bit mode with 64-bit extension.

exchange instruction

Sycalland sysret instruction

CDQE instruction

CMPSQ instruction

CMPXCHG 16B instruction

LODSQ instruction

MOVSQ instruction

MOVZX(64-bit) instruction

STOSQ instruction

5. Stack indicator

In 64-bit mode, the stack pointer is 64-bit. Unlike compatibility mode or legacy mode, the stack size is not controlled by the bits in the SS segment descriptor or indicated by the instruction prefix.

Implicit stack references ignore the indication of address size. In 64-bit mode, all instructions that implicitly reference RSP default to 64-bit operands except far branches. The affected instructions include: push PUSH, POP, PUSHF, pop, ENTER and LEAVE. Using these instructions in 64-bit mode will not push or pull the stack of 32-bit stack values. If 66H operand prefix is used, 16-bit stack push and stack expansion will be supported.

When the register RAX-RSP is used as the operand, the default operation size of 64-bit mode does not need the REX prefix as the forerunner of these instructions. If you use R8-R 15 as the operand, you still need REX. This is because a prefix is required when accessing the new extension register.

6. Branch transfer

64-bit extension technology extends two branching mechanisms to adapt to the branching of 64-bit linear address space. They are:

The near branch in 64-bit mode is redefined.

In 64-bit mode and compatibility mode, a 64-bit call gate descriptor is defined as a remote call.

In 64-bit mode, all near branches (call, RET, JCC, JCXZ, JMP and loop) are forced to be 64-bit. These instructions are updated to provide a 64-bit RIP value without the REX prefix. The following approximate conversions are controlled by valid operand widths:

Truncation of instruction pointer width

The size of stack undo or stack RETraction caused by call or ret.

The size of the stack pointer is increased or decreased by calling or RET.

Indirect transport operand size

In 64-bit mode, all the above operations are forced to be 64-bit regardless of the operand prefix (the prefix of operand size is ignored). However, the displacement area of relative transfer is still limited to 32 bits; The address size of the near branch is not forced to be 64 bits.

The address size affects the size of JCXZ and RCX in the loop; They also affect the address calculation of indirect memory transfer. This type of address defaults to 64 bits, but it can be converted to 32-bit width by address width prefix.

The software will use remote transmission to change the priority. The traditional IA-32 architecture provides a call gate mechanism that allows software to switch from one priority to another, although the call gate can also be switched without changing the priority. When using the call gate, the direct or indirect selector pointer will point to a gate descriptor (the instruction overhead is too high to be ignored), and the offset of the target code segment can be obtained from the call gate descriptor. IA-32e mode redefines the type value of the 32-bit call gate descriptor, making it a 64-bit call gate descriptor, and extends the 64-bit descriptor to accommodate the 64-bit offset. The 64-bit mode call gate descriptor allows remote access to any location in the valid linear address space. These call gates also control the code segment selector (CS), allowing conversion to privilege level and default size as a result of gate conversion.

Because 32 bits are usually specified, the only one that specifies a complete 64-bit absolute RIP in 64-bit mode is the indirect branch. Therefore, in 64-bit mode, the direct far branch is deleted from the instruction set.

IA-32e mode extends the semantics of SYSENTER and SYSEXIT instructions, so that they can run in 64-bit memory space. IA-32e also introduces two new instructions: SYSCALL and SYSRET, which are only valid in 64-bit mode. Address calculation in 1.64 bit mode

In 64-bit mode (if there is no address size conversion), the size of effective address calculation is 64 bits. Effective address calculation uses 64-bit base address and index register and sign extension transformation to 64-bit.

For the plane address space of 64-bit mode, linear address is equivalent to effective address. This rule does not apply to non-zero base transactions that use FS and GS segments. In 64-bit mode, the effective address part is added, and the effective address is shortened before the 64-bit base address is added. When the address mapping mode is 64-bit mode, the base address will not be shortened.

In IA-32e mode, the instruction pointer is extended to 64 bits to support 64-bit code offset. The 64-bit instruction pointer assigns a value to RIP in the call. The following table describes the differences between RIP, EIP and IP.

Generally, in 64-bit mode, the replacement sum will not be directly extended to 64-bit. In effective address calculation, they are still limited to 32-bit and sign extension. However, in 64-bit mode, 64-bit substitution of MOV instruction and direct format support are provided.

All 16-bit and 32-bit address calculations in IA-32e mode are extended with 0 to form a 64-bit address. Address calculation search is to shorten the effective address width of the current mode, just like the designation of address width prefix. The result is an extended full 64-bit address width of 0. Therefore, when 16-bit and 32-bit applications run in compatibility mode, they can only access the low-order 4GB of the effective address in 64-bit mode. Similarly, a 32-bit address generated in 64-bit mode can only access the low-order 4GB of the effective address in 64-bit mode.

2. Normative addressing

The canonical address has an address bit 63 until a more effective implementation bit is reached, and the macro structure sets it to all 1 or all 0s.

IA-32e mode defines a 64-bit linear address, but it supports fewer bits when implemented. The first IA-32e processor with 64-bit extension technology will support 48-bit linear addresses. This means that all 63 to 48 bits in the canonical address must be filled with 0 or 1. Whether to fill 0 or 1 depends on whether the 47th bit is 0 or 1.

Although implementations don't use all 64 bits of the address first, they need to check the 63rd bit for more valid implementation bits to see if the address is in canonical form. If the linear storage reference is not in the standard form, the implementation will generate an exception. In many cases, a general protection exception (#GP) is generated. However, in the case of explicit or implicit combat applications, a stack error (#SS) will occur. Implicit stack reference instructions include push/pop instructions and instructions that use RSP/RBP registers as default stack segment registers. In these cases, if the instruction uses RSP/RBP as the base address register and a non-SS segment is given for the segment overflow, the specification error #SF will lead to the specification error of general protection error (#GP). Implicit stack references include all push/pop instructions, and any instruction that uses RSP or RBP as the base address register. The check of canonical address form will be completed after permission check and before page and boundary check.