What’s DMA, Anyway?

Direct Memory Access (DMA) is a specialized hardware feature in microcontrollers and microprocessors, designed to move data between memory regions without stalling the CPU. In simple terms, the DMA controller performs data movements while the memory bus is not occupied by the CPU, thus allowing for concurrency and true multitasking behavior.

How DMA works in PIC?

DMA basically works in two modes:

Stalling mode, where the DMA controller takes control over the CPU
Cycle Stealing Mode, where the DMA controller only accesses the memory bus when the CPU doesn’t.

Mode selection is actually done indirectly, through setting priorities on the System Arbiter. When setting the CPU with higher priority over DMA, then the Cycle Stealing Mode is selected. The opposite applies for the Stalling Mode.

I’ll focus on the Cycle Stealing Mode because this is when the DMA controller really shines. Unused CPU cycles, referred to as bubbles, are those available for the DMA to perform read and write operations, as Microchip states. A little vague term don’t you think?

I’ll try to explain a bit further.

During each instruction cycle, a potential window opens for DMA to perform one read or write operation.

Suppose that the DMA needs to access the instruction bus (FLASH memory). At first glance, this might seem impossible, as the CPU typically uses the instruction bus during every instruction cycle to fetch the next instruction. However, there are some single-word instructions, such as the conditional branch, that introduce an additional NOP instruction when executed. This NOP does not require fetching, leaving the instruction bus free for one cycle. The DMA controller can take advantage of this cycle to perform a data read operation.

Interestingly, this characteristic seems to be unique to PIC MCUs. Their strongest rival, the AVR MCU, does not provide flash memory DMA reads due to its different architectural design.

Additionally, when the DMA needs to access the data bus (SRAM), it can take advantage of instructions during which the CPU does not require to access it. For instance, an ALU operation involving the W register and a literal, a NOP instruction, or any flash memory bubble instruction mentioned earlier, qualifies as a potential candidate and can be considered a bubble in this scenario. Generally speaking, apart from the instructions already mentioned, any multi-word instruction that decodes some of its words to a NOP (e.g., CALL, GOTO, etc.) will also act as a bubble.

The same principle applies when the DMA needs to access the EEPROM data bus. In this case, potentially any instruction could be considered to be a bubble, as long as the CPU is not actively fetching or writing data to the EEPROM. Thus, the DMA can seamlessly perform its operations without interrupting the CPU's workflow, offering almost parallel processing.

By exploiting those clock cycles, not only the effective bandwidth for handling data is increased, but the CPU can process any other task uninterrupted, resulting in concurrent processing.

How is DMA configured?

DMA basically has two pointers. The source and the destination pointer and both can be configured to remain unchanged, increment by one or decrement by one, on each data movement which is technically called a DMA data transaction. A transaction always consists of two actions: reading from the source and writing to the destination address. The total count of the DMA data transactions that the controller is configured to perform, constitute the DMA message. The size of a message is configured by two additional registers, one for the source message and one for the destination message. On each transaction, both the message counters are decremented and when one reaches zero, the underlying pointer is re-initialized to its original setting.

I’ll offer an example to clarify the above. Suppose you need to automate the process of reading data from the UART port. In such a scenario, the source pointer should be configured to steadily point to the memory mapped address of the UART RX register, while the destination pointer should point to the starting address of an array in RAM and be configured to increment by one on each transaction. Since the UART register accepts one byte at a time, the message size is configured to be one byte too. On the other hand, the destination message size should be equal to the size of the array. On each transaction the source pointer will remain unchanged while the destination pointer will increment until it exceeds the upper bound of the array. Then it’s simply reset to is initial value, thus wrapping around at the beginning of the array, almost mimicking the behavior of a circular buffer.

The initiation of DMA transferring can either be software or hardware controlled but the latter is what really automates the process. A hardware trigger is actually a sampling mechanism to the hardware signal that sets an interrupt. In the above scenario, the hardware trigger would be set to check for the UART RX interrupt flag. Once it samples a set signal, the DMA will start transferring on the next available bubble cycle.

This simple mechanism fully automates the process and eliminates the need for repetitive software-based data handling routines, empowering the PIC mcu with truly multitasking capabilities.