1. Introduction to LoongArch¶
LoongArch is a new RISC ISA, which is a bit like MIPS or RISC-V. There are currently 3 variants: a reduced 32-bit version (LA32R), a standard 32-bit version (LA32S) and a 64-bit version (LA64). There are 4 privilege levels (PLVs) defined in LoongArch: PLV0~PLV3, from high to low. Kernel runs at PLV0 while applications run at PLV3. This document introduces the registers, basic instruction set, virtual memory and some other topics of LoongArch.
1.1. Registers¶
LoongArch registers include general purpose registers (GPRs), floating point registers (FPRs), vector registers (VRs) and control status registers (CSRs) used in privileged mode (PLV0).
1.1.1. GPRs¶
LoongArch has 32 GPRs ( $r0
~ $r31
); each one is 32-bit wide in LA32
and 64-bit wide in LA64. $r0
is hard-wired to zero, and the other registers
are not architecturally special. (Except $r1
, which is hard-wired as the
link register of the BL instruction.)
The kernel uses a variant of the LoongArch register convention, as described in the LoongArch ELF psABI spec, in References:
Name |
Alias |
Usage |
Preserved across calls |
---|---|---|---|
|
|
Constant zero |
Unused |
|
|
Return address |
No |
|
|
TLS/Thread pointer |
Unused |
|
|
Stack pointer |
Yes |
|
|
Argument registers |
No |
|
|
Return value |
No |
|
|
Temp registers |
No |
|
|
Percpu base address |
Unused |
|
|
Frame pointer |
Yes |
|
|
Static registers |
Yes |
Note
The register $r21
is reserved in the ELF psABI, but used by the Linux
kernel for storing the percpu base address. It normally has no ABI name,
but is called $u0
in the kernel. You may also see $v0
or $v1
in some old code,however they are deprecated aliases of $a0
and $a1
respectively.
1.1.2. FPRs¶
LoongArch has 32 FPRs ( $f0
~ $f31
) when FPU is present. Each one is
64-bit wide on the LA64 cores.
The floating-point register convention is the same as described in the LoongArch ELF psABI spec:
Name |
Alias |
Usage |
Preserved across calls |
---|---|---|---|
|
|
Argument registers |
No |
|
|
Return value |
No |
|
|
Temp registers |
No |
|
|
Static registers |
Yes |
Note
You may see $fv0
or $fv1
in some old code, however they are
deprecated aliases of $fa0
and $fa1
respectively.
1.1.3. VRs¶
There are currently 2 vector extensions to LoongArch:
LSX (Loongson SIMD eXtension) with 128-bit vectors,
LASX (Loongson Advanced SIMD eXtension) with 256-bit vectors.
LSX brings $v0
~ $v31
while LASX brings $x0
~ $x31
as the vector
registers.
The VRs overlap with FPRs: for example, on a core implementing LSX and LASX,
the lower 128 bits of $x0
is shared with $v0
, and the lower 64 bits of
$v0
is shared with $f0
; same with all other VRs.
1.1.4. CSRs¶
CSRs can only be accessed from privileged mode (PLV0):
Address |
Full Name |
Abbrev Name |
---|---|---|
0x0 |
Current Mode Information |
CRMD |
0x1 |
Pre-exception Mode Information |
PRMD |
0x2 |
Extension Unit Enable |
EUEN |
0x3 |
Miscellaneous Control |
MISC |
0x4 |
Exception Configuration |
ECFG |
0x5 |
Exception Status |
ESTAT |
0x6 |
Exception Return Address |
ERA |
0x7 |
Bad (Faulting) Virtual Address |
BADV |
0x8 |
Bad (Faulting) Instruction Word |
BADI |
0xC |
Exception Entrypoint Address |
EENTRY |
0x10 |
TLB Index |
TLBIDX |
0x11 |
TLB Entry High-order Bits |
TLBEHI |
0x12 |
TLB Entry Low-order Bits 0 |
TLBELO0 |
0x13 |
TLB Entry Low-order Bits 1 |
TLBELO1 |
0x18 |
Address Space Identifier |
ASID |
0x19 |
Page Global Directory Address for Lower-half Address Space |
PGDL |
0x1A |
Page Global Directory Address for Higher-half Address Space |
PGDH |
0x1B |
Page Global Directory Address |
PGD |
0x1C |
Page Walk Control for Lower- half Address Space |
PWCL |
0x1D |
Page Walk Control for Higher- half Address Space |
PWCH |
0x1E |
STLB Page Size |
STLBPS |
0x1F |
Reduced Virtual Address Configuration |
RVACFG |
0x20 |
CPU Identifier |
CPUID |
0x21 |
Privileged Resource Configuration 1 |
PRCFG1 |
0x22 |
Privileged Resource Configuration 2 |
PRCFG2 |
0x23 |
Privileged Resource Configuration 3 |
PRCFG3 |
0x30+n (0≤n≤15) |
Saved Data register |
SAVEn |
0x40 |
Timer Identifier |
TID |
0x41 |
Timer Configuration |
TCFG |
0x42 |
Timer Value |
TVAL |
0x43 |
Compensation of Timer Count |
CNTC |
0x44 |
Timer Interrupt Clearing |
TICLR |
0x60 |
LLBit Control |
LLBCTL |
0x80 |
Implementation-specific Control 1 |
IMPCTL1 |
0x81 |
Implementation-specific Control 2 |
IMPCTL2 |
0x88 |
TLB Refill Exception Entrypoint Address |
TLBRENTRY |
0x89 |
TLB Refill Exception BAD (Faulting) Virtual Address |
TLBRBADV |
0x8A |
TLB Refill Exception Return Address |
TLBRERA |
0x8B |
TLB Refill Exception Saved Data Register |
TLBRSAVE |
0x8C |
TLB Refill Exception Entry Low-order Bits 0 |
TLBRELO0 |
0x8D |
TLB Refill Exception Entry Low-order Bits 1 |
TLBRELO1 |
0x8E |
TLB Refill Exception Entry High-order Bits |
TLBEHI |
0x8F |
TLB Refill Exception Pre-exception Mode Information |
TLBRPRMD |
0x90 |
Machine Error Control |
MERRCTL |
0x91 |
Machine Error Information 1 |
MERRINFO1 |
0x92 |
Machine Error Information 2 |
MERRINFO2 |
0x93 |
Machine Error Exception Entrypoint Address |
MERRENTRY |
0x94 |
Machine Error Exception Return Address |
MERRERA |
0x95 |
Machine Error Exception Saved Data Register |
MERRSAVE |
0x98 |
Cache TAGs |
CTAG |
0x180+n (0≤n≤3) |
Direct Mapping Configuration Window n |
DMWn |
0x200+2n (0≤n≤31) |
Performance Monitor Configuration n |
PMCFGn |
0x201+2n (0≤n≤31) |
Performance Monitor Overall Counter n |
PMCNTn |
0x300 |
Memory Load/Store WatchPoint Overall Control |
MWPC |
0x301 |
Memory Load/Store WatchPoint Overall Status |
MWPS |
0x310+8n (0≤n≤7) |
Memory Load/Store WatchPoint n Configuration 1 |
MWPnCFG1 |
0x311+8n (0≤n≤7) |
Memory Load/Store WatchPoint n Configuration 2 |
MWPnCFG2 |
0x312+8n (0≤n≤7) |
Memory Load/Store WatchPoint n Configuration 3 |
MWPnCFG3 |
0x313+8n (0≤n≤7) |
Memory Load/Store WatchPoint n Configuration 4 |
MWPnCFG4 |
0x380 |
Instruction Fetch WatchPoint Overall Control |
FWPC |
0x381 |
Instruction Fetch WatchPoint Overall Status |
FWPS |
0x390+8n (0≤n≤7) |
Instruction Fetch WatchPoint n Configuration 1 |
FWPnCFG1 |
0x391+8n (0≤n≤7) |
Instruction Fetch WatchPoint n Configuration 2 |
FWPnCFG2 |
0x392+8n (0≤n≤7) |
Instruction Fetch WatchPoint n Configuration 3 |
FWPnCFG3 |
0x393+8n (0≤n≤7) |
Instruction Fetch WatchPoint n Configuration 4 |
FWPnCFG4 |
0x500 |
Debug Register |
DBG |
0x501 |
Debug Exception Return Address |
DERA |
0x502 |
Debug Exception Saved Data Register |
DSAVE |
ERA, TLBRERA, MERRERA and DERA are sometimes also known as EPC, TLBREPC, MERREPC and DEPC respectively.
1.2. Basic Instruction Set¶
1.2.1. Instruction formats¶
LoongArch instructions are 32 bits wide, belonging to 9 basic instruction formats (and variants of them):
Format name |
Composition |
---|---|
2R |
Opcode + Rj + Rd |
3R |
Opcode + Rk + Rj + Rd |
4R |
Opcode + Ra + Rk + Rj + Rd |
2RI8 |
Opcode + I8 + Rj + Rd |
2RI12 |
Opcode + I12 + Rj + Rd |
2RI14 |
Opcode + I14 + Rj + Rd |
2RI16 |
Opcode + I16 + Rj + Rd |
1RI21 |
Opcode + I21L + Rj + I21H |
I26 |
Opcode + I26L + I26H |
Rd is the destination register operand, while Rj, Rk and Ra (“a” stands for “additional”) are the source register operands. I8/I12/I14/I16/I21/I26 are immediate operands of respective width. The longer I21 and I26 are stored in separate higher and lower parts in the instruction word, denoted by the “L” and “H” suffixes.
1.2.2. List of Instructions¶
For brevity, only instruction names (mnemonics) are listed here; please see the References for details.
Arithmetic Instructions:
ADD.W SUB.W ADDI.W ADD.D SUB.D ADDI.D SLT SLTU SLTI SLTUI AND OR NOR XOR ANDN ORN ANDI ORI XORI MUL.W MULH.W MULH.WU DIV.W DIV.WU MOD.W MOD.WU MUL.D MULH.D MULH.DU DIV.D DIV.DU MOD.D MOD.DU PCADDI PCADDU12I PCADDU18I LU12I.W LU32I.D LU52I.D ADDU16I.D
Bit-shift Instructions:
SLL.W SRL.W SRA.W ROTR.W SLLI.W SRLI.W SRAI.W ROTRI.W SLL.D SRL.D SRA.D ROTR.D SLLI.D SRLI.D SRAI.D ROTRI.D
Bit-manipulation Instructions:
EXT.W.B EXT.W.H CLO.W CLO.D SLZ.W CLZ.D CTO.W CTO.D CTZ.W CTZ.D BYTEPICK.W BYTEPICK.D BSTRINS.W BSTRINS.D BSTRPICK.W BSTRPICK.D REVB.2H REVB.4H REVB.2W REVB.D REVH.2W REVH.D BITREV.4B BITREV.8B BITREV.W BITREV.D MASKEQZ MASKNEZ
Branch Instructions:
BEQ BNE BLT BGE BLTU BGEU BEQZ BNEZ B BL JIRL
Load/Store Instructions:
LD.B LD.BU LD.H LD.HU LD.W LD.WU LD.D ST.B ST.H ST.W ST.D LDX.B LDX.BU LDX.H LDX.HU LDX.W LDX.WU LDX.D STX.B STX.H STX.W STX.D LDPTR.W LDPTR.D STPTR.W STPTR.D PRELD PRELDX
Atomic Operation Instructions:
LL.W SC.W LL.D SC.D AMSWAP.W AMSWAP.D AMADD.W AMADD.D AMAND.W AMAND.D AMOR.W AMOR.D AMXOR.W AMXOR.D AMMAX.W AMMAX.D AMMIN.W AMMIN.D
Barrier Instructions:
IBAR DBAR
Special Instructions:
SYSCALL BREAK CPUCFG NOP IDLE ERTN(ERET) DBCL(DBGCALL) RDTIMEL.W RDTIMEH.W RDTIME.D ASRTLE.D ASRTGT.D
Privileged Instructions:
CSRRD CSRWR CSRXCHG IOCSRRD.B IOCSRRD.H IOCSRRD.W IOCSRRD.D IOCSRWR.B IOCSRWR.H IOCSRWR.W IOCSRWR.D CACOP TLBP(TLBSRCH) TLBRD TLBWR TLBFILL TLBCLR TLBFLUSH INVTLB LDDIR LDPTE
1.3. Virtual Memory¶
LoongArch supports direct-mapped virtual memory and page-mapped virtual memory.
Direct-mapped virtual memory is configured by CSR.DMWn (n=0~3), it has a simple relationship between virtual address (VA) and physical address (PA):
VA = PA + FixedOffset
Page-mapped virtual memory has arbitrary relationship between VA and PA, which is recorded in TLB and page tables. LoongArch’s TLB includes a fully-associative MTLB (Multiple Page Size TLB) and set-associative STLB (Single Page Size TLB).
By default, the whole virtual address space of LA32 is configured like this:
Name |
Address Range |
Attributes |
---|---|---|
|
|
Page-mapped, Cached, PLV0~3 |
|
|
Direct-mapped, Uncached, PLV0 |
|
|
Direct-mapped, Cached, PLV0 |
|
|
Page-mapped, Cached, PLV0 |
User mode (PLV3) can only access UVRANGE. For direct-mapped KPRANGE0 and KPRANGE1, PA is equal to VA with bit30~31 cleared. For example, the uncached direct-mapped VA of 0x00001000 is 0x80001000, and the cached direct-mapped VA of 0x00001000 is 0xA0001000.
By default, the whole virtual address space of LA64 is configured like this:
Name |
Address Range |
Attributes |
---|---|---|
|
|
Page-mapped, Cached, PLV0~3 |
|
|
Direct-mapped, Cached / Uncached, PLV0 |
|
|
Direct-mapped, Cached / Uncached, PLV0 |
|
|
Page-mapped, Cached, PLV0 |
User mode (PLV3) can only access XUVRANGE. For direct-mapped XSPRANGE and XKPRANGE, PA is equal to VA with bits 60~63 cleared, and the cache attribute is configured by bits 60~61 in VA: 0 is for strongly-ordered uncached, 1 is for coherent cached, and 2 is for weakly-ordered uncached.
Currently we only use XKPRANGE for direct mapping and XSPRANGE is reserved.
To put this in action: the strongly-ordered uncached direct-mapped VA (in XKPRANGE) of 0x00000000_00001000 is 0x80000000_00001000, the coherent cached direct-mapped VA (in XKPRANGE) of 0x00000000_00001000 is 0x90000000_00001000, and the weakly-ordered uncached direct-mapped VA (in XKPRANGE) of 0x00000000 _00001000 is 0xA0000000_00001000.
1.4. Relationship of Loongson and LoongArch¶
LoongArch is a RISC ISA which is different from any other existing ones, while Loongson is a family of processors. Loongson includes 3 series: Loongson-1 is the 32-bit processor series, Loongson-2 is the low-end 64-bit processor series, and Loongson-3 is the high-end 64-bit processor series. Old Loongson is based on MIPS, while New Loongson is based on LoongArch. Take Loongson-3 as an example: Loongson-3A1000/3B1500/3A2000/3A3000/3A4000 are MIPS-compatible, while Loongson- 3A5000 (and future revisions) are all based on LoongArch.
1.5. References¶
Official web site of Loongson Technology Corp. Ltd.:
Developer web site of Loongson and LoongArch (Software and Documentation):
Documentation of LoongArch ISA:
Documentation of LoongArch ELF psABI:
Linux kernel repository of Loongson and LoongArch: