SFP Media Buddy and why SERDES is painful Post

Software design of the SFP-Buddy and SFP-Media-Buddy

Created :05-13-2024

Last Updated:08-28-2024 18:48

How assumptions lead to much longer development times than expected and how confusing pin numbers made simple processes complicates

Starting with a UART converter for SFP modules

Created :04-10-2024

Last Updated:04-11-2024 22:32

Software design of the SFP-Buddy and SFP-Media-Buddy

Where to buy

I have noticed that this post has been attracting lots of attention, if you are searching for where to buy this media converter you can at https://whinis.com/sfp-buddy/

Choice of Microcontroller

As mentioned in the previous blog, the SFP-Buddy and SFP-Media-Buddy are using a rp2040 as its brain. This is now a very common microprocessor designed and sold by the Pi Foundation that runs a dual core arm Cortex-M0+. There is ~ 256kb of onboard ram, which will become important later, but no onboard flash- this means that an external flash chip- or at minimum an external spi interface to feed the micro with code. I mostly chose this particular micro because it’s extremely cheap for a dual-core chip, has a significant amount of ram, and plenty of GPIOs. While it doesn't have as many peripheral components as other micros, it does have a very interesting mechanic called PIOs PIOs are programmable input/output modules. There is 4 state machine per instance and 2 PIO instances which each can access every pin. These PIOs have a very small memory size and as such can only hold 32 instructions. The instruction set itself is very limited. However, the power comes from each PIO being effectively an entire core, allowing you to handle both simple and complex protocols with speed, without needing to take away from either of the main cores.

Code Design

With all of that being said, I had originally broken up the code for the SFP-Buddy into three main sections, with a few accessory sections. There is the uart which would handle all the GPIO of the pins for the previous blogs post’s pings 1-10, setting them up/down as input and which ones were either tied to an internal hardware UART or to a PIO. Then there was the console which would perform as a virtual uart console for which to get feedback from the SFP-Buddy, change config values, and generally get information about the SFP module and board. Finally, there is the i2c section which would handle all i2c communication between the PC and the sfp module. The accessory modules were CDC, which handled the actual USB communication using the TinyUSB library, the USB-PD library which talked to the on-board usb-pd controller to set USB voltage levels, and MDIO which would handle communication with media converter chips on media variants. I'll explain more in the coming sections.

Uart Controller.

I want to start with this because I feel it's the most important and at the same time gave me the most issues. To start, I have a simple illustration of how it should work. Below is the 10 available pins on the left and 3 potential destinations on the right.

hw_graph

The 10 input pins can go to either PIO0 or PIO1, however only the bottom 2 red pins can go to the UART_HW. The differences between the PIO and UART_HW is that the UART_HW is a separate peripheral with its own memory and clock. It has a dedicated 32 byte FIFO for transmit and 32 bytes FIFO for receive. Also, rather than needing to write your own state machines, the logic is implemented in hardware with various options to configure allowing for full support of any UART standard. Another advantage of the logic being in hardware is that it's unlikely that a bug to prevent it from completely functioning would appear. For this reason, originally only the RX and TX pins were connected to the RP2040, and the remaining pins required a jumper pin. This is also how the other current solutions function with a dedicated IC to handle UART conversion to USB. A jumper is used to select RX and TX, although this is partially obscured with the naming of the jumpers.

While the hardware UART is preferred, obviously it cannot easily work for all these pins, hence the introduction of the PIO. Due to how the PIO state machines function, it’s essentially impossible to get transmit and receive to function in the same code. To overcome this limitation, the transmit is in a single PIO state machine, and the receive is in another PIO state machine. Let’s start with the simpler of the two state machines, the transmit. The following code is just the assembly, however, is essentially fully copied from the Pico transmit pio example code.

Tx PIO Walkthough

; Copyright (c) 2020 Raspberry Pi (Trading) Ltd.
;
; SPDX-License-Identifier: BSD-3-Clause
;

.program uart_tx
.side_set 1 opt

; An 8n1 UART transmit program.
; OUT pin 0 and side-set pin 0 are both mapped to UART TX pin.

    pull       side 1 [7]  ; Assert stop bit, or stall with line in idle state
    set x, 7   side 0 [7]  ; Preload bit counter, assert start bit for 8 clocks
bitloop:                   ; This loop will run 8 times (8n1 UART)
    out pins, 1            ; Shift 1 bit from OSR to the first OUT pin
    jmp x-- bitloop   [6]  ; Each loop iteration is 8 cycles.

The code starts by using the pull instruction, which retrieves a 32-bit word from the FIFO buffer and places it into the OSR register. It also says to set the side pin to high, then wait for 7 clock cycles. The next line is set which stores the value of 7 into the X scratch register, a 32-bit storage area. Next comes bitloop: which is a label that defines something we will discuss later. After the bitloop label we have out which transfers 1 bit from the OSR, which was loaded by pull earlier to the output pin. This is effectively transferring a single bit that we want to right and the designated pin high or low. The final line is jmp which decreases the value in the x shift register by 1, as long as that value is greater than zero, it will then "jump" back to the bitloop label and shift one more bit out. As soon as the X shift register is zero, it will instead reach the end of the program and loop back to the first instructions, where we pull another 32-bit word from the FIFO and do it all again. As soon as the FIFO is empty, the program stalls at the pull command. Overall, a very simple and easy to use state machine. The C code to feed this is not particularly complicated either.

this->tx_offset = pio_add_program(this->tx_pio, &uart_tx_program);
this->tx_sm = pio_claim_unused_sm(this->tx_pio, true);
pio_sm_config c_tx = uart_tx_program_get_default_config(this->tx_offset);

// OUT shifts to right, no autopull
sm_config_set_out_shift(&c_tx, true, false, 32);

// We only need TX, so get an 8-deep FIFO!
sm_config_set_fifo_join(&c_tx, PIO_FIFO_JOIN_TX);

// SM transmits 1 bit per 8 execution cycles.
sm_config_set_clkdiv(&c_tx, div);

pio_sm_set_pins_with_mask(this->tx_pio, this->tx_sm, 1u << this->tx_pin, 1u << this->tx_pin);
pio_sm_set_pindirs_with_mask(this->tx_pio, this->tx_sm, 1u << this->tx_pin, 1u << this->tx_pin);
pio_gpio_init(this->tx_pio, this->tx_pin);
sm_config_set_out_pins(&c_tx, this->tx_pin, 1);
sm_config_set_sideset_pins(&c_tx, this->tx_pin);
pio_sm_init(this->tx_pio, this->tx_sm, this->tx_offset, &c_tx);
pio_sm_set_enabled(this->tx_pio, this->tx_sm, true);

Here is the entire block, lets go through this a few sections at a time

this->tx_offset = pio_add_program(this->tx_pio, &uart_tx_program);
this->tx_sm = pio_claim_unused_sm(this->tx_pio, true);

After a year, I did end up finding the bigger issue with getting this to work and was only found after I went to a slightly different PIO state machine as well as a trap for embedded engineers at any level. Nearly all of the code in my C++ references the SFP pin numbers which range from 2-9, the pins on the RP2040 side which are the ones that I need to reference are in the 0-9 range and often overlap. With that being said, many times I would reference say pin 2 on the SFP side as that is how they are commonly talked about in the discord and other channels, with good reason as that's where the action is happening. However, I obviously need to reference the Rp2040 pin instead, one such pin was pin 0. I assumed in many sections of my code was an invalid pin and used it as a default leading to conflicting states.
I only found this out after the third PIO state machine that didn't function, and I was going line by line rubber duck debugging, (if you don't know it look it up). I realized my massive, incorrect assumption that SFP equaled the rp2040. For reference here is the pinout on the rp2040 side: rp2040 pins

and here is the sfp side:

SFP pins

One thing to note is that TX Fault, which is RX on the Pontronic stick I am using for testing, is 3 on SFP- which leads to an unconnected pin and would never work. Meanwhile, 6/MOD_ABS goes to what would be RX and would work most of the time- even if reversed, which is another common mistake. With that being said, both rewriting my code nearly entirely from scratch, and going line by line did allow me to remove and optimize large sections of the code and remove major bugs. One such is memory utilization- as I mentioned earlier lots of memory buffers were used. In the rewrite, I switched from a read per loop approach to a DMA approach. For those who don't know DMA means direct memory access and, in this case, means that as a byte comes into the FIFO of the PIO, it is nearly instantly transferred to another location in ram. This is important as the FIFO is limited to 8 characters, even after joining the two buffers. With these sticks potentially dropping hundreds of characters a second during boot-up is either interrupts or DMA.

Interrupts is a common way this is handled. With interrupts, once the first character is in buffer, the processor is signaled and stops what its currently doing and jumps to the code to handle this. It adds complexity to handle these out of order instructions and prevents overwriting from the other core. DMA has the processor itself do this transfer in the background; with a large enough buffer you have plenty of time to handle this later.

Another trick I employed was making the buffer that the DMA transferred to a ring buffer, so that even if I cannot get to it in time, the code will not crash. With the DMA in place, I could now handle the UART as I had time rather than with interrupts. With this in mind, I put all of the code handling UART on core 1, leaving core 2 for CDC and other functions such as USB-PD and debugging. In the end I cut out 10kb of buffer and ~ 5 memory transfers down to a single 2kb buffer and 1 memory transfer. I handed this new version out for tests and was informed that the crashing issue had been fixed.

Conclusions

This post got a bit longer than I wanted, next I will cover CDC, USB-PD, and the fun in media converting. I learned that even with 256kb of ram you need to be careful about ballooning buffers, always check your pinouts even in code, and sometimes the simple errors are the hardest to fix.