🚀 Check out this awesome post from Hacker News 📖
📂 **Category**:
💡 **What You’ll Learn**:
I’ve been working on a PC Engine emulator (aka TurboGrafx-16) after getting the bug to start looking at a new system, and I’ve found it to be pretty interesting hardware-wise. Originally released in 1987, its hardware is in a sort of awkward spot between the 3rd generation gaming conoles (NES, Sega Master System) and the 4th generation consoles (Genesis / Mega Drive, SNES), though it’s generally grouped with the latter due to featuring notably improved graphics over NES and SMS. It sold pretty well in Japan, but in North America it couldn’t really compete with the Genesis and Super Nintendo, and it never even released in Europe (officially). It’s historically notable for being the first gaming console to support CD-based games via its CD-ROM2 add-on (aka TurboGrafx-CD).
This post is specifically an overview of the PC Engine’s CPU, which I think is interesting in that it’s very fast for its time but has much more limited instruction set capabilities than its immediate competition.
Despite being called the TurboGrafx-“16” in North America, this console doesn’t actually have a 16-bit CPU! This isn’t like the 68000 where people sometimes quibble over whether it’s a 16-bit CPU or 32-bit; there is simply nothing 16-bit about the PC Engine CPU. It has 8-bit registers, an 8-bit ALU, and an 8-bit data bus. It tries to make up for this with raw speed though.
The CPU is part of a custom-designed package by Hudson called the HuC6280, which includes some other hardware alongside the CPU core such as a PSG sound chip and a hardware timer. The CPU itself is heavily based on the 65C02, Western Design Center’s enhanced version of the venerable 6502 8-bit CPU. The instruction set will look very familiar to anyone with experience programming for the NES (6502 minus BCD mode) or the SNES (65C816, WDC’s 16-bit extension of the 65C02), though it also adds a number of additional instructions exclusive to the HuC6280.
The HuC6280 inherits most of the 65C02’s new instructions and the new addressing mode (zero page indirect), along with the fixes for some of the jankier aspects of the 6502, like how jmp ($xxFF) instructions would unintuitively wrap within the same 256-byte page when reading the jump address. There’s also none of the 6502’s crazy illegal opcode behavior; the PC Engine CPU has 22 unused opcodes, but they all seem to function as plain old NOPs, not even the funny multi-byte NOPs that some of the illegal 6502 opcodes would perform.
The CPU can run at one of two clock speeds: “low” (~1.79 MHz, same as NES) or “high” (~7.16 MHz). The CPU always powers on in low speed, and games can switch between the two speeds using the CPU instructions CSL (Clock Speed Low, or maybe Change Speed Low?) and CSH (Clock Speed High). As far as I can tell there are no downsides to running at the faster 7.16 MHz speed, so games tend to almost immediately execute a CSH instruction and then leave the CPU in high speed.
7.16 MHz is fast for a 6502-based CPU in the late 80s / early 90s! That’s exactly twice as fast as the SNES CPU, and even better, the PC Engine CPU mostly doesn’t suffer from memory latency like the SNES CPU does. It gets a wait cycle every time it accesses one of the video processor ports, but otherwise there is no memory latency in the PC Engine – all areas of ROM and RAM can respond in a single 7.16 MHz clock cycle. In practice this makes the PC Engine CPU usually more than twice as fast as the SNES CPU…as long as you don’t need to perform any math or logic on 16-bit values.
In terms of raw speed, the PC Engine also compares favorably to the Genesis’ main 7.67 MHz 68000 CPU, though that’s much less straightforward of a comparison. The 68000 gets much less work done per clock cycle than 6502-based CPUs, but it makes up for that with a large number of mostly-general-purpose 32-bit registers (“mostly” because of the data/address register split) and a much more powerful instruction set. Which CPU will perform better depends on the code and how it’s written.
For cycle counting, it’s worth noting that many HuC6280 instructions take 1 or 2 cycles longer than the equivalent 6502 or 65816 instruction. For example, ADC with absolute addressing takes 4 cycles on 6502 but 5 cycles on HuC6280. Also, for instructions where 6502 has a potential penalty cycle on page crossing (e.g. absolute indexed instructions), it seems like the HuC6280 always takes the penalty cycle even when there’s not a page crossing. This was maybe done to make it easier to support the high clock speed, or maybe to save costs by sharing more circuitry between different instructions. That’s purely a guess though.
Software running on the HuC6280 operates on 16-bit memory addresses, same as 6502 software, but the HuC6280 has a builtin MMU that expands the physical address space from 16-bit to 21-bit (2 MB address range). It’s extremely simple as far as MMUs go: it splits the 16-bit logical address space into eight 8 KB pages, and each page has its own 8-bit MPR (Memory Page Register) that maps it directly to an 8 KB physical page. It’s very similar to the memory-banking mappers seen in many NES / Game Boy / Master System / Game Gear cartridges, only it’s built into the CPU, so game cartridges don’t need to include their own mapper hardware (though one game still does due to its massive ROM size).
| MPR | Logical Addresses |
|---|---|
| MPR0 | $0000-$1FFF |
| MPR1 | $2000-$3FFF |
| MPR2 | $4000-$5FFF |
| MPR3 | $6000-$7FFF |
| MPR4 | $8000-$9FFF |
| MPR5 | $A000-$BFFF |
| MPR6 | $C000-$DFFF |
| MPR7 | $E000-$FFFF |
The address translation is just this:
|
|
Conventionally, PC Engine games seem to always map page 0 ($0000-$1FFF) to the memory-mapped I/O page ($FF), page 1 ($2000-$3FFF) to the working RAM page ($F8), and page 7 ($E000-$FFFF) to the first 8 KB of cartridge ROM (page $00) while freely remapping pages 2-6 as needed. Games can interact with the eight MPRs using the CPU instructions TAMi (Transfer Accumulator to MPRi) and TMAi (Transfer MPRi to Accumulator).
The HuC6280 zero page and hardware stack are located at $2000-$21FF, rather than $0000-$01FF as on 6502. This is presumably because games were expected to use the $0000-$1FFF page exclusively for memory-mapped I/O. Yes, the name “zero page” is a little awkward when it’s not actually at $0000, but I guess Hudson thought it was more beneficial to use the same terminology as 6502. For contrast, the 65816 supports relocating the zero page, and WDC renamed it to “direct page” as part of this.
The console’s physical memory map is quite straightforward, far moreso than the convoluted SNES memory map:
| Pages | Description |
|---|---|
| $00-$7F | HuCard game cartridge or CD-ROM2 System Card |
| $80-$F7 | Expansion (used by CD-ROM2 add-on for extra RAM) |
| $F8 | 8 KB working RAM |
| $F9-$FB | Expansion (used by SuperGrafx for extra RAM) |
| $FF | Memory-mapped I/O registers and ports |
This limits game cards to 1 MB in size, though Street Fighter II works around that with a bank-switching mapper to manage access to its massive 2.5 MB of ROM (massive by PCE standards at least). The Genesis version of Super Street Fighter II also happens to be the only officially licensed Genesis game that has a bank-switching mapper in the cartridge, so I guess Capcom was just willing to use custom hardware to ship ports of their flagship arcade game with an abnormally high amount of ROM.
8 KB of working RAM is not a lot! It’s a lot more than you get on NES (2 KB), but equal to what you get on the original Game Boy and Sega Master System / Game Gear, and much less than you get on Genesis (64 KB) or SNES (128 KB). The various revisions of the CD-ROM2 add-on augment this by adding between 64 KB and 2 MB of additional RAM, not counting sound chip buffer RAM. Although 64 KB is really not much after taking into account that CD-ROM2 games need to load code and assets into RAM well in advance of using them due to the extremely high latency of reading from disc, where HuCard-based games can execute code and read data directly from the cartridge at any time.
Among the new instructions, the standouts are the five block transfer instructions: TAI, TDD, TIA, TII, and TIN. These all do the same thing, a bulk copy from one memory location to another memory location, but each applies different address steps to the source and destination addresses. These are sort of similar to the 65816’s MVN and MVP instructions (Move Memory Negative/Positive), but unlike SNES the PC Engine doesn’t have any DMA hardware that can access cartridge ROM or CPU working RAM, so these instructions are actually useful on PC Engine.
| Instruction | Source Step | Destination Step |
|---|---|---|
| TAI | Alternate | Increment |
| TDD | Decrement | Decrement |
| TIA | Increment | Alternate |
| TII | Increment | Increment |
| TIN | Increment | None (Fixed) |
HuC6280 block transfer instructions
“Alternate” alternates between incrementing by 1 and decrementing by 1 after each byte copied, which is very useful for bulk copies in and out of e.g. the video processor’s 16-bit data port.
These instructions copy at a rate of 1 byte per 6 cycles, plus a 17-cycle overhead per instruction. The overhead is negligible if you’re copying a large enough amount of data, and 6 cycles per byte isn’t blazing fast but it’s much faster than you can copy in software. A dedicated hardware DMA unit would probably be faster than this, but on the PC Engine these instructions seem like the best way to copy large amounts of data into VRAM or working RAM.
One thing to be mindful of is that the CPU can’t respond to interrupts while it’s in the middle of a block transfer. You can copy up to 64 KB in a single block transfer instruction, and the CPU only gets around 120,000 cycles per frame in 7.16 MHz mode, so you can easily blow past multiple VBlank interrupts if you run a large enough block transfer. This is fine if the screen is blanked out during a screen transition or something, but you probably don’t want to run a multi-frame transfer during gameplay.
The other new instruction that I think is notable is SET (Set T), which sets the new T flag, but only for the immediately following instruction. When executed immediately before an ADC, AND, EOR, or ORA instruction, the ADC/AND/EOR/ORA instruction operates on a zero page value in memory instead of operating on the accumulator, specifically the value at ZeroPage[X]. This is useful for manipulating values in the zero page without needing to go through the accumulator.
A few of the new instructions are highly specific to PC Engine: ST0, ST1, and ST2 each write an immediate operand directly to one of the three VDC ports (Video Display Controller). These can be useful for slightly speeding up code that updates the various VDC registers.
Beyond that, there’s a new BSR instruction (Branch to Subroutine), which works pretty much exactly like JSR (Jump to Subroutine) except the operand is a PC-relative displacement instead of an absolute address. I think the main benefit of BSR over JSR is that code using BSR is agnostic to what memory bank it’s mapped into, while code using JSR requires that it’s mapped into a specific 8 KB logical page.
TST (Test Bits) is like the 65C02’s TRB and TSB instructions (Test and Reset/Set Bits) except it doesn’t mutate the memory value, only tests the specified bits. There are new swap instructions SAX, SAY, and SXY to swap values between two registers. And finally there are new clear instructions CLA, CLX, and CLY to quickly zero out a register without modifying CPU flags.
I’d like to follow this up with posts on other parts of the PC Engine hardware, but we’ll see. I think the video hardware is interesting in how it contrasts to the Genesis and SNES video hardware so I will probably at least write a post on that.
💬 **What’s your take?**
Share your thoughts in the comments below!
#️⃣ **#Engine #CPU #jsgroths #blog**
🕒 **Posted on**: 1778261738
🌟 **Want more?** Click here for more info! 🌟
