The Serial VGA Controller - Part II


Software Architecture

The Software Architecture of the VGA controller is based on a layered model, which separates the low-level hardware management from the high-level application logic. This design approach enhances modularity, maintainability, and scalability of the codebase.

Software Architecture

Software Architecture

The MCU operates at 14.3182 MIPS to maintain exact alignment with VGA timing requirements. Each video frame contains 525 total scanlines, of which 480 are visible. During the visible region the MCU is fully occupied generating pixel output, leaving only the remaining 45 non-visible lines, barely 8.6 % of the total frame time available for secondary tasks such as interpreting incoming commands and handling auxiliary system logic. This extremely limited processing window was a key constraint that shaped the software architecture of the system.

The time-critical video generation is performed entirely inside the ISR to ensure that VGA timing requirements are met with absolute precision. During the active vertical region, the ISR is triggered at the beginning of each scanline by the compare interrupt, which is the only active interrupt and is driven by a dedicated hardware timer. The ISR is carefully designed to complete its work within the strict timing constraints of the VGA signal. Each active scanline is generated in real time using a “racing the beam” approach, where pixel data is produced on-the-fly as the scanline progresses across the display.

Outside the active vertical region, the ISR is not invoked on each scan line. Instead, the compare interrupt is reconfigured to fire at the beginning of the front porch, the sync pulse, and the back porch intervals, ensuring that the VSYNC signal is asserted and cleared at the correct times. Additionally, during these non-active intervals, the ISR checks for any pending interrupt flags and sets the corresponding Reactor event flags, initiating the appropriate processing of asynchronous events.

The program's main loop, implemented by the Reactor module, orchestrates all non-time-critical activity in the system. It continuously evaluates a set of registered event sources such as buffer updates, DMA transfer notifications, command-processing triggers, UART error conditions, and external inputs such as button events. When an event is detected, it invokes the appropriate event handler.

When new data arrives through the UART interface, the DMA engine stores it into a circular buffer without consuming CPU time. The appropriate event flag is then raised, prompting the Reactor to hand the buffered data to the Interpreter module, which parses the command stream and executes the corresponding operation, through the video terminal module.

The video terminal supports three video modes· text mode, plot mode and image mode. It also provides a range of functions for manipulating the video buffer, including clearing the screen, setting the cursor position, changing colors, drawing lines and circles, painting pixels, and more.

The VGA controller has been extensively tested at several UART speeds and it can process incoming commands reliably without suffering any data loss. The sweet spot has been found to be 28800 baud. At higher baud rates, the system is designed to signal back a wait message whenever the circular buffer approaches a critical fill level, indicating that it is temporarily unable to accept additional data at the current rate. This mechanism ensures that command processing remains stable and prevents buffer overruns even under heavy input load.


Intepreter

Each command received through the UART communication bus is appropriately processed, checked for syntactic correctness, and then the corresponding function is executed. Therefore, it was necessary to design a rudimentary communication language together with an interpreter for that language.

Terminals of the past used a standardized set of control commands, known as ANSI Escape Codes. This protocol remains in use today for configuring and controlling character display in terminal emulators, as well as for interaction with systems based on serial communication.

Following this approach, a set of commands was designed based on a context-free grammar (CFG), with the aim of clarity and strict formalization of syntactic rules. The terminal and non-terminal symbols, along with the production rules of the language, were defined using the Backus-Naur Form (BNF) notation:


<esc code> ::= <esc> <command> | <character>

<command>  ::= <alpha> <parameters seq> | <alpha>

<parameters seq> ::= <parameter> <parameter tail>

<parameters tail> ::= <parameter delimiter> <parameters seq> |
                      <parameter end>

<parameter> ::= <digit> <parameter> | <digit>

<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"

<parameter delimiter> ::= ","

<parameter end> ::= 0x0d | ";" 

<esc> ::= 0x1b

<alpha> ::=     "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" |
                "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" |
                "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | 
                "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"           

<character> ::= <digit> | <alpha>
  

From the above set of grammar rules, three groups of commands are formed, each corresponding to one of the operating states of the VGA controller:

Command Description
<ESC>E;Clear Screen
<ESC>O;Set Cursor ON
<ESC>F;Set Cursor OFF
<ESC>B;Blink ON
<ESC>B;Blink OFF
<ESC>y <x>,<y>;Set Cursor Position
<ESC>b <color>;Set Background Color
<ESC>f <color>;Set Foreground Color
<ESC>m <mode>;Change Video Mode
<ESC>d <x1>,<y1>,<x2>,<y2>,<1>;Draw Thin Rectangle
<ESC>d <x1>,<y1>,<x2>,<y2>,<2>;Draw Thick Rectangle
<ESC>d <x1>,<y1>,<x2>,<y2>,<10>;Draw Thin Window
<ESC>d <x1>,<y1>,<x2>,<y2>,<20>;Draw Thick Window

Text Mode Commands

 

Command Description
<ESC>E; Clear Screen
<ESC>b <color>; Set Background Color
<ESC>f <color>; Set Foreground Color
<ESC>p <f>,<b>; Set Both Colors
<ESC>l <x1>,<y1>,<x2>,<y2>,<color>; Draw Line
<ESC>c <x>,<y>,<radius>,<color>; Draw Circle
<ESC>s <x>,<y>,<color>; Set Pixel
<ESC>m <mode>; Change Video Mode

Plot Mode Commands

 

Command Description
<ESC>p <f>,<b>,<x>,<y>; Set fore/back ground color for image cell at {x,y}
<ESC>p <f>,<b>; Paint Picture
<ESC>l <picture>; Load Picture
<ESC>m <mode>; Change Video Mode

Image Mode Commands

 

For the interpretation of commands, it was necessary to construct a Finite State Machine (FSM), which forms the core of the command interpreter. Symbols received from the UART bus are checked against the grammar of the command language, making the FSM a real-time syntactic parser that transitions between predefined states based on each symbol received.

The method chosen for implementing the FSM is based on the State Pattern, a design pattern that allows an object to change its behavior depending on the current state of the system. This approach was selected because it offers significant advantages compared to the traditional construction of FSMs using multiple nested conditional statements, particularly in terms of time complexity. It also provides excellent maintainability and future extensibility, with the only real cost being the additional design and development time.


Video Terminal

The Video Terminal module is designed to support multiple operating modes, each defined by their resolution, color depth, and pixel dimensions. Notably, the Image Mode emulates the display characteristics of the classic ZX Spectrum, while extending the resolution to 360x480 with 16 colors at 4-bit depth. The following table summarizes the available display modes and their key characteristics.

Code Description Resolution Colors Color depth Cell size
0 Text mode 360x480 16 4-bit 8x12
1 Image mode 360x480 16 4-bit 8x8
2 Plot mode 360x120 16 1-bit 1x1

Video Modes

Text Mode

In text mode, the video buffer is organized into 40 rows and 45 columns. Each element of the buffer corresponds to a display position and consists of 16-bit data. The first byte specifies the ASCII character to be displayed, while the second byte defines its color attributes, as illustrated in the following table.

Character Color
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
ASCII Code Background Color Foreground Color

Organization of the Video Buffer in Text Mode

 

Each character corresponds to a graphical symbol (glyph) with dimensions of 12 x 8 pixels. The following figure shows an indicative rendering of the character “A”:

vga

Glyph Representation

It is evident that in order to cover the entire range of the ASCII table, 12 x 256 = 3,072 bytes are required. The storage of the symbol table is not done in RAM but in FLASH memory. This choice was not only strategic but also technically necessary, since the controller’s RAM is limited and therefore reserved strictly for dynamic data.

text_demo

Text mode demo

Image Mode

The image mode allows the display of images at 480 x 360 pixels and also supports 16 colors. Each image of this size requires 480 x 360 x 4 bits = 86,400 bytes. Storing even a single image in the microcontroller’s FLASH memory would occupy almost 70% of its total capacity. Therefore, a different approach was necessary.

The chosen solution keeps the display resolution unchanged while applying compression to the color data, thereby reducing both the memory requirements and the timing complexity for preparing and transmitting data to the VGA port.

Two tables are used, with dimensions 480 x 45 bytes and 45 x 60 bytes respectively. The first stores the 1-bit depth image data, requiring a total of 480 x 360 x 1 bit = 21,600 bytes of memory. The second stores the compressed color information, which is logically divided into 60 rows and 45 columns. Each element has a size of 8 bits, where the first four bits represent the background color and the next four bits represent the foreground color. Each logical unit of 8 x 8 pixels in the image data table is linked to one element of the color table. Thus, 45 x 60 = 2,700 bytes are required for storing the colors, and in total 24.3 Kbytes are needed to store a complete image. With this approach, the overall memory requirements are reduced to less than 30% of the original, making it possible to store up to four images in the microcontroller’s FLASH memory.

This compression technique was widely used by computers of the 1980s, including the ZX Spectrum. As a result, free applications were available that could convert image files using this compression method, eliminating the need to develop additional image processing software. Later tests showed that it was technically feasible to double the resolution of the color information to 4 x 4 cells instead of 8 x 8; however, no free conversion software could be found, and since the time investment required to develop such software was significant, the original approach was retained.

To illustrate the above compression process, the following example shows the result of converting a 16-color image with 4-bit depth per pixel, into a 16-color image with 4-bit depth per 8 x 8 cell.

hydro

Original 16-color picture

hydro

Compressed 16-color picture

hydro hydro

Image Data with 1-bit Depth     Color Information

The original image was processed using the Image Spectrumizer, a free software tool by Jari Komppa.

Plot Mode

In Plot Mode, the Video Buffer is once again accessed as a two-dimensional structure, with dimensions of 120 x 45. Each element of the buffer is of byte type and corresponds to 8 consecutive pixels. A color depth of 1-bit is supported, resulting in an overall resolution of 360 x 120. Unlike Text Mode, where modifications were limited to replacing standardized 12 x 8 characters, it is now possible to modify each pixel individually. Sixteen colors are supported, but only two can be visible simultaneously on the screen — one for the background and one for the foreground.

Functions are provided for activating individual pixels as well as for drawing lines, circles, and ellipses. These operations rely on Bresenham’s algorithms, which are ideal for resource-constrained systems such as microcontrollers without a Floating Point Unit (FPU).

The traditional approach to drawing geometric shapes requires trigonometric functions such as sin and cos, along with double-type variables to achieve the necessary decimal precision. Such an approach would impose a heavy burden on the microcontroller, since all calculations would have to be performed in software. By following Bresenham’s approach, the use of the C math library was completely avoided, and the computational process was significantly accelerated.

To control an individual pixel, the corresponding bit in the video buffer must be modified. This is achieved by calculating its relative position in the 360 x 120 matrix and performing logical shifts to select its exact position within the byte. The result is the construction of a bitmask, where a logical OR with the corresponding byte activates the pixel on the screen, while a logical AND with the inverted mask deactivates it.

For example, to activate the pixel at screen position {y = 60, x = 84}, the following steps are performed:

                    
        y = 60
        x = ⌊84 / 8⌋ = 10
        byte = VideoBuffer[y][x]
        bit_mask = 128 >> (84 % 8) = b10000000 >> 4 = b00001000 = 8
        byte |= bit_mask
        VideoBuffer[y][x]=byte
        
        

  And the corresponding C code:

                 
        void SetPixel(int16_t x, int16_t y, uint8_t color) {
            //Cast to array [120][45]
            uint8_t (*gfx_ptr)[120][45]=(uint8_t(*)[120][45]) &video_buffer; 
            
            int16_t x_offset = x >> 3; // x = x /8

            uint8_t pixel_mask = (uint8_t)(0x80 >> (x & 0x07));
                
            if (color){
                (*gfx_ptr)[y][x_offset] |= pixel_mask;
            }else {
                (*gfx_ptr)[y][x_offset] &= ~pixel_mask;
            }
        }
        
        

 

plot_demo

Plot mode demo

 

Pixel Generator (ISR)


The Pixel Generator module belongs to the Interrupt Layer and was implemented almost entirely in Assembly. It is responsible for generating a signal suitable for driving the DACs. It consists of the TEXT Generator, IMAGE Generator, and PLOT Generator modules, each of which produces an appropriately formatted signal depending on the current state of the Video Terminal.

Text Generator

In Text Mode, the module operates as follows:

It scans the Video Buffer from left to right and reads each entry. For each entry, it looks up the corresponding symbol in the glyph table. The symbol is loaded into the PISO (SPI) register and its serial transmission to the color multiplexer selector is initiated. Next, it reads the memory location that contains the color information and writes its contents to the LATD port, whose outputs are connected to the multiplexer inputs. This process is repeated for every visible character on the screen.

The procedure of reading, processing, and transmitting the data of a single character takes exactly 8 cycles, which matches the precise time window available to load the SPI register and shift out all bits. The decision to implement this in Assembly was mandatory, since the equivalent C code consumed significantly more cycles, making it practically impossible to achieve the desired resolution.

The above procedure, however, conceals a fundamental problem that arose during signal generation. Because instructions are executed sequentially, the selection signal reached the multiplexers two cycles earlier than the color information, resulting in incorrect rendering (color bleeding).

color bleeding

Color Bleeding effect

This is one of the obvious reasons why such applications are typically implemented on FPGAs, where signals can be generated in parallel. Nevertheless, the solution proved to be simple.

Before the start of the process, the PISO (SPI) register is preloaded with data in such a way that an intentional two-cycle delay is introduced before the next load operation. In this way, both the selection signal and the color information arrive at the multiplexers simultaneously, and the above phenomenon is eliminated.

no color bleeding

Free from Color Bleeding effect

In the following snippet, INDF1 indexes into the video buffer while the TBLPTR registers address the glyph table. The sequence forms an 8-cycle pipeline in which a character code is fetched from the video buffer, its corresponding glyph byte is retrieved from program memory, and both the glyph and color data are output with cycle-perfect timing. First, the character is read and used to update the table pointer; the glyph byte is then fetched via TBLRD* and written directly to the SPI register through INDF0. Immediately afterward, the color attribute is read from the video buffer and written to LATD, which drives the external multiplexers. This tightly optimized 8-cycle routine is what makes it possible to display 360 logical pixels scaled across a 640-pixel horizontal resolution.

                    
        MOVF POSTINC1,W,C   ; READ ASCII CHAR - 1 CY
        MOVWF TBLPTRL,C     ; 1 CY
        TBLRD*              ; READ ASCII GLYPH - 2 CYs
        MOVF TABLAT,W,C     ; 1 CY
        MOVWF INDF0,C       ; WRITE GLYPH DATA TO SPI - 1 CY
        MOVF POSTINC1,W,C   ; READ COLOR - 1 CY
        MOVWF LATD,C        ; WRITE COLOR DATA TO MULTIPLEXERS - 1 CY - TOTAL: 8 CYs
            
        

Image Generator

The graphics generation module follows a similar approach. The Video Buffer stores only the color information, while the remaining data are fetched directly from the microcontroller's flash memory. The procedure proved to be simpler compared to the text mode.

Its operating mechanism is as follows: The selected image is scanned from flash memory byte by byte. Each byte is loaded into the PISO register. The color corresponding to the specific 8x8 region is read from the Video Buffer and sent to the LATD port. This process provides a timing margin of two additional cycles, which indicates the possibility of increasing the color resolution in a future version. In the current implementation, for the sake of simplicity, two NOP instructions were inserted into the code in order to maintain the same number of cycles per generated element.

Assembly code:

                    
        TBLRD*+             ; READ IMAGE DATA AND POST INCREMENT - 2 CYs
        MOVF TABLAT, W,C    ; 1 CY
        MOVWF INDF0,C       ; WRITE IMAGE DATA TO SPI - 1 CY
        NOP                 ; 1 CY
        NOP                 ; 1 CY
        MOVF POSTINC1,W,C   ; READ COLOR - 1 CY
        MOVWF LATD,C        ; WRITE COLOR DATA TO MULTIPLEXERS - 1 CY - TOTAL: 8 CYs
            
        

Plot Generator

The PLOT Generator module is responsible for producing a signal suitable for supporting the Plot mode. All available data reside in the Video Buffer, since the generated signal is now 1-bit deep.

Its operating mechanism is as follows: It loads the multiplexers with the selected foreground and background colors, which are uniform across the entire screen. It scans the Video Buffer from left to right, byte by byte. It loads the contents of each location into the PISO register and activates the multiplexer selectors.

As in the previous module, the code provides a margin of four additional cycles, and for simplicity, NOP instructions were inserted. The limitation in the resolution of the Drawing mode is not due to the microcontroller’s clock speed, but to the limited memory capacity. Therefore, although it is possible to display additional data within the eight-cycle time window, there is not enough memory to store them.

A snapshot of the Assembly code follows:

                    
        MOVF POSTINC1,W,C   ; READ IMAGE DATA AND POST INCREMENT - 1 CY
        NOP                 ; 1 CY
        NOP                 ; 1 CY
        NOP                 ; 1 CY
        NOP                 ; 1 CY
        MOVWF INDF0,C       ; WRITE IMAGE DATA TO SPI - 1 CY
        MOVF INDF2,W,C      ; READ COLOR - 1 CY
        MOVWF LATD,C        ; WRITE COLOR DATA TO MULTIPLEXERS - 1 CY - TOTAL: 8 CYs
            
        

The insertion of NOP instructions may appear to be a waste of computational time; however, it simultaneously creates a “bubble” that is exploited by the DMA controller to transfer data to and from the UART bus.


Schematic Diagrams

vga
vga
vga
pcb