Overview

The T-Head XuanTie E906 is a fully synthesizable, middle-end, microcontroller-class processor that is compatible to the RISC-V RV32IMA[F][DIC[P] ISA. It delivers considerable integer and enhanced, energy-efficient floating-point compute performance HPM RAS especially the double precision, deep optimized DSP execution unit with CSI-DSP lib and fast interrupt response.

 

Feature Description
Architecture RV32IMA[F][D]C[P]
Pipeline 5-stages (integer)
Bus interface AMBA3 AHB-Lite 32-bit master
FPU Profitable architecture for double precision floating point
DSP Enhanced Deep optimized DSP unit with CSI-DSP lib compliant to v0.9.4 P-extension spec
Hybrid Branch Predictor BHT/BTB/RAS
Instruction cache Up to 32KB
Data cache Up to 32KB
Interrupts Up to 240 interrupts + Non-maskable interrupt (NMI)
Hardware Performance Monitor (HPM) HPM for performance profiling
XuanTie Extensions XuanTie MCU Enhanced Extensions include the interrupt accelerating technology to
reduce the response latency and the enhanced ISA to improve the instruction set
performance
Sleep modes Sleep and deep sleep mode
RISC-V Debug Three levels of hardware configurations

 

Processor Overview

The E906 processor adopts a 16/32 bits mixed instruction set and implements a classic five-stages pipeline for integer. Also, it can be configured with single floating point or both the single and double floating point or DSP ISA. The processor offers high floating-point compute performance, enhanced ISA extensions, extended fast interrupt handling.

 

Floating Point Unit (FPU)

Oriented towards the motor and navigation domain, the E906 processor implements a powerful FPU to accelerate the algorithm. The FPU has following features:

 

  • Compliant to the RISC-V RV32F and RV32D
  • Compliant to the IEEE-754 protocol spec;
  • Special design for double precision floating point unit when configured with RV32D and the single precision reuses the pipeline
  • Enough 64-bit data width to access the data cache under for double precisions

 

DSP

The DSP execution unit in E906 is compliant to the v0.9.4 version P extension spec and includes such as 8/16-bit SIMD  multiply, multiply-accumulate with 32/64-bit data operations, etc. which are key operations to accelerate the signal processing or filter arithmetic like FFT, FIR, IIR and Al arithmetic like matrix multiplication, vector multiplication, etc.

 

The DSP execution unit can make full use of the 32 integer GPRs in E906 to supply enough resource for the software optimization. Further, E906 optimized the micro-architecture to reduce the execution latency, adopts hybrid branch prediction to decrease the mis-prediction ratio. Thus, the CPI of key DSP lib is close to 1. Besides, the DSP execution unit had following features:

  • Appropriate read and write GPR ports for Zp64 64-bit arithmetic
  • Supply tuned CSI-DSP lib after software and hardware co-optimization

 

Memory Subsystem

E906 implements optional instruction cache and data cache. Both the instruction cache and data cache has following features:

 

  • 2-way set-associative and cache line is 32Bytes
  • FIFO cache replacement policy
  • Support software invalid and clear (only D-cache) operations through extended instructions
  • Can be configured to 2KB/4KB/8KB/16KB/32KB

 

Physical Memory Protection (PMP)

The E906 processor has optional RISC-V PMP which allows machine and user privilege modes to access different address ranges. Only the machine mode has the authority to define the memory access permissions. If an unauthorized access is detected, an access fault exception is triggered. The PMP has following features:

 

  • Upto 16 regions can be configured
  • Read/Write/Execution memory protection
  • Minimum 128B address range

 

Core Local Interrupt Controller (CLIC)

The E906 processor implements the RISC-V standard interrupt controller, CLIC and the CLINT. The CLIC has following

features:

 

  • Support up to 240 external interrupts
  • Up to 32 priority settings
  • Support level or positive/negative edge interrupt types
  • Support hardware vector interrupt
  • The control registers are memory mapped.

 

Debug Components

The E906 processor adopts RISC-V v0.13.2 version debug spec with standard JTAG to communicate the host and E906 debug unit. E906 has done a lot of optimizations on the debugger and probe and has achieved 800KB/5s-900KB/s download speed, 4 times faster than the common solutions in the market.

 

The debug unit supports following operations:

  • Supply multi-level configurations in order to adapt to various needs
  • Support hardware/software breakpoints
  • Support variety trigger settings
  • Supply an independent master port to access the SoC resource
  • Check and modify CPU register resource
  • Single step or multi step flexibly supported

 

Hardware Performance Monitor (HPM)

The E906 processor implements optional RISC-V standard HPM to enable the software developer to profile the performance. The HPM has following features:

  • Support the ratio of branch prediction profiling
  • Support the cache miss ratio profiling
  • Support the execution number of instructions and CPU cycles profiling
  • Support profiling under machine and user mode

 

Interface

The E906 has three 32-bit AMBA3 AHB-Lite master bus to communicate with the external memory or peripheral IP which are instruction, data and system bus. The internal request can be allocated to either bus according to the address. The instruction and data bus can only be connected to the memory other than the peripheral IPs.

 

XuanTie MCU Enhanced Extensions (XME)

The E906 processor implements the XME to deliver more powerful features such as:

  • Support fast interrupt handling and the response time is 18 CPU cycles
  • Support tail-chain for both vector and non-vector interrupts
  • Support hardware interrupt stack swapping
  • Support NMI
  • Support Lockup
  • Support sleep and deep sleep
  • Support soft reset operation
  • Support configurable reset address through top port during integration
  • Beyond the standard RV32IMA[F][D]C[P] ISA, E906 supports extend functional instructions and enhanced performance instructions.

 

Software Ecosystems
  • Optimized compiler, assembler, linker and binary tools are contributed to GNU and supported officially
  • Enhanced ISA is supported by GCC and LLVM
  • QEMU is contributed and supported officially
  • Deep optimized CSI-DSP lib
  • Code size optimized runtime lib
  • Supply Keil-like Integrated Development Environment (CDK) and support mainline IDE and debug probe such as IAR IDE, OpenQCD, Lauterbach debugger, Segger J-Link
  • Support FreeRTOS, uCos, RT-Thread and AliOS-Things.

 

Fabrication Process for E906 RISC-V

Process is TSMC28 HPCPlus, 9T, RVT

 

You can license the T-HEAD E906 RISC-V core here