Design Notes

Project Origins

Web-888 originates from the RX-888 project. Since the launch of RX-888, we have noticed an increasing number of applications using RX-888 for "web-based" functionalities. Notable examples include the KA9Q-Radio and PhantomSDR projects. However, to achieve Web-SDR functionality, it typically requires a Linux PC or a Raspberry Pi 5, which is neither the most convenient nor economical solution.

Therefore, we decided to design a plug-and-play Web-888. We believe that Web-888 should have the following features:

Plug-and-play capability: Requiring only an Ethernet cable and an antenna to work. With an excellent UI and operation, it can run directly on any browser without needing Linux knowledge, as not everyone is a software expert.
Multi-user support: Support for as many online users as possible, with enough RX channels to allow one SDR to function like multiple SDRs working in parallel.
Single-board solution: No need for an additional PC, Raspberry Pi, or Beaglebone board; it works as a single board.
Low network bandwidth usage: With decoding completed on the SDR end, reducing the dependency on network environment and speed.
Excellent reception performance: Featuring high sensitivity, high dynamic range, and wide bandwidth.
Cost-effective and affordable: Providing professional-grade performance at an accessible price point.

Based on these requirements, we have initially conceptualized the architecture of Web-888. It requires a high-resolution, high-sampling-rate ADC, LNA, ATT, LPF, a resource-rich FPGA, a sufficiently powerful CPU, rich memory, and a fast Ethernet PHY.

Hardware Design

We start with the original RX-888, retaining the excellent RF design of ATT+LPF+LNA+ADC, and have re-optimized the LPF design to enhance suppression above 60MHz, catering to wideband reception from 0-60MHz. Additionally, we have introduced a second antenna channel equipped with a 118-150MHz BPF and a +20dB LNA, allowing the powerful LTC2208's 130MSPS sampling rate to directly achieve under-sampling for Air-Band/VHF in the second Nyquist zone, eliminating the need for additional converters.

For the digital backend, we drew inspiration from the classic PlutoSDR design, employing Xilinx's ZYNQ XC7Z010. The integrated FPGA+ARM design of the ZYNQ facilitates high-speed data sharing between PL and PS via AXI (Advanced Extensible Interface), resolving the traditional challenges of data interaction in separate FPGA+CPU designs. Using the ZYNQ also addresses the need for both FPGA and CPU, enabling a more streamlined design that fits the entire SDR hardware onto a single PCB. We also included Realtek's 1000M Ethernet in the Web-888 to ensure minimal network latency and stability.

Considering the convenience of connecting SDR via Wi-Fi, we provide a USB C interface with a USB2.0 HOST for directly connecting a USB Wi-Fi Dongle. It can also be used as a UART bridge to support CAT commands from other SDR software.

By utilizing a 0.5ppm TCXO as the ADC's clock source, the necessity for GPS is reduced. Nonetheless, we have included a GPS module supporting BDS/GPS/GLONASS/GALILEO, with PPS for clock calibration. Additionally, we offer reference clock input and output options. When used as an output, the GPS module will function, allowing the Web-888 to provide a GPSDO-grade variable clock source.

Firmware (FPGA Implementation)

This project implements fully digital intermediate frequency signal processing in an FPGA. It includes 15 channels of digital downconversion, with 13 channels of 12kHz sampled audio data processed through meticulously designed and debugged multistage filters and a custom-developed AXI bus system. Two channels of spectrum data utilize a custom high-bandwidth DMA bus design, achieving a maximum rate of 2GB/s and an operating rate of 1GB/s. The maximum sampling rate of 130MHz ensures that signals within a 60MHz bandwidth are clearly visible during scaling, allowing for seamless zooming with detailed clarity. The professional timing optimization design provides ample margins, ensuring long-term stable operation in various environments.

The Web-888 utilizes a full AMBA-AXI bus architecture, providing reliable high-bandwidth multi-channel data transmission. Data transfer between the dual-core ARM A9 processor and the FPGA (PL) is facilitated by four sets of AXI3-Full buses. One GP (General Purpose) bus is used for configuring registers and reading statuses, while one HP (High Performance) bus handles DMA transfers for 13 channels of received IQ data. Two additional HP buses manage DMA transfers for two spectrum data channels. These two DMAs use a time-division multiplexing scheduler to display 13 channels of 60MHz full-bandwidth waterfall spectrum.

The general configuration interface supports high-performance burst transfers, ensuring real-time configuration of a large number of internal registers and accurate reading of the GPS PPS counter values.

The high-performance data bus features a unique design, achieving 13-channel digital downconversion, IQ data transmission, and multistage filtering with minimal resource usage. This section employs digital mixing combined with a two-stage filtering design. After digital downconversion, the 13 signals produce 13 IQ signals, which are then processed through a first-stage CIC decimation filter. The 26 signals are converted into 26 serial data packets via a custom-developed bus interconnect module. These serial data packets then enter the second-stage multiplexed CIC filter. The output data from the second-stage CIC is packaged into specified format data packets using a custom-developed packaging module and then transmitted through a high-performance custom-developed DMA.

The high-performance spectrum bus design is inspired by KiwiSDR to achieve a 60MHz full-bandwidth display. This section consists of digital downconversion, a variable sampling rate CIC, and a high-performance DMA. After downconversion, the sampling rate is adjusted according to the spectrum's zoom settings. The DMA used for spectrum data transmission operates in two modes:

Exclusive Mode: In this mode, two spectrum channels are dedicated to two clients, while other clients only receive a 12kHz bandwidth spectrum stream. The DMA operates in address loopback continuous transmission mode.
Time-Division Multiplexing (TDM) Mode: In this mode, the two spectrum channels are time-division multiplexed across 13 clients. After collecting a line of data, the DMA stops automatically and switches to collect the spectrum data for the next online user. The TDM mechanism is optimized by software based on the number of online users and zoom level, ensuring maximum DMA utilization.

If you are interested in more technical details about the FPGA implementation, please read this document.

Software Architecture

We worked on the RaspSDR project, which is largely a re-platforming project of KiwiSDR. KiwiSDR has an easy-to-use UI and a rich set of features that align with our goal for Web-888.

On the other side, we wanted to fix some problems of KiwiSDR while maintaining the nice user experience. The key problems we were trying to solve:

Software Updates

Using source code to distribute updates is painful. It requires users to spend minutes to hours to rebuild the project. This also limited the build technology we could use.

So in Web-888, we decided to use binary updates to speed this up. In addition to that, we created two channels for updates. One alpha channel is for users who always want the latest features and have the ability to fix problems if they encounter them. Another stable channel is for users who want a stable build or unattended running servers.

SD Card Corruption

Having an SD card running a Debian system is a good choice but not the only choice. It exposes a great possibility that the SD card gets corrupted. We decided to use a Linux distribution that supports a readonly root partition. Alpine Linux has become popular and is also used by the Red Pitaya Notes project.

Having a readonly SD card significantly reduces the chance of breaking the SD card.

One side benefit is that we are able to put the config files in the root partition of the SD card. They can be backed up easily via a copy tool or edited manually.

User Mode Task Scheduler

KiwiSDR code as well as RaspSDR code uses user mode scheduling in order to guarantee the strict requirements on timing to pull the SPI bus.

Thanks to the Zynq7010 design, we have a much more flexible design to move the data from FPGA to CPU memory. In Web-888, we implemented a DMA controller in FPGA to move the data to CPU. During the movement, the CPU is not involved at all. So we don't have such strict requirements on timing anymore.

We changed the code to use the Linux kernel scheduler. Checking the CPU usage in Web-888, you may notice the CPU is very idle if there are no active users. The load is balanced between two cores. We can now leverage all CPU computing power and rely on the Linux Kernel to best schedule the workload among two CPU cores.

One RX channel takes about 3% CPU of one core, one WF channel takes about 9%. In addition to CPU usage reduction, we also have much more flexibility to call blocking APIs which had to be called in an RPC way in Kiwi. This simplified many code paths, especially for the extensions like DRM, FT8, and WSPR.

Other Innovations

We also incorporated several innovations in Web-888.

13 WF Channels - Shared Mode

The Zynq7010 can only provide resources for 2 WF channels. In order to support 13 WF channels, we developed a time-sharing solution to use 2 WF channels shared between 13 concurrent users.

Inside the FPGA, two DMA controllers are created to move the data from FPGA FIFO to CPU memory. When enough data (8192 samples) are moved, the CPU copies out the data and triggers another transfer with the next channel's frequency and decimate settings.

We had to implement a DMA controller that supports burst transfer in order to meet the timing requirements.

Tunable ADC Sampling Rate

The ADC's clock is not directly connected to a TCXO; instead, it is connecting to CLK0 of the Si5351. This design is inherited from RX-888, which has proven to offer a good balance of flexibility and quality.

We also choose 122.88MHz as our sampling rate for the HF band. With the reference TCXO of 24.576MHz:

Item	Value
Wanted frequency	122,088,000.000 Hz
Crystal frequency	24,576,000.000 Hz
VCO frequency	860,160,000.000 Hz
True frequency	122,880,000.000 Hz
Deviation	0 Hz / 0 PPB

Inside the bootloader of Web-888, the Si5351 is configured to 122.88MHz. The Si5351 can be configured for other frequencies for different purposes.

VHF Support with Oversampling

We decided to use oversampling to support VHF bands, especially since we can cover the whole air-band, which is a common use case on the Internet.

However, we found out that 122.88MHz sampling rate is not a good choice for the air-band. The air-band is from 118MHz, which is below 122.88MHz. After oversampling, it will be moved to 4.88MHz, which 127.76MHz is also moved to 4.88MHz and caused the problem.

Thanks to the Si5351, we can choose other sampling rates here like 101MHz, 98MHz, etc.

Poor Man's GPSDO

The PPS signal from GPS is used to calculate the ADC frequency. This is a feature from KiwiSDR that was used to correct the displayed frequency.

Since we can tune the Si5351, we decided to use the PPS signal to govern the Si5351 output frequency. A PID algorithm is chosen to take the tick count between two PPS signals to correct the Si5351.

The PID parameters are below:

static const float32_t Kp = 1.4f;  // Proportional gain
static const float32_t Ki = 0.15f; // Integral gain
static const float32_t Kd = 0.01f; // Derivative gain

Since PLL inside the Si5351 is governed, we can use the same PLL to output clock signals externally. This is the clock-out SMA's output.

Warning

Enabling this feature will impact the phase noise of the Si5351. So we introduce a configuration in the Admin Config Tab to control it, and it is disabled by default.

Thermal Management

Having a metal box offers not only a nice and solid product quality but also good RF shielding. However, it exposes the challenges of Zynq7010 temperature.

We use several methods to optimize this:

Optimize CPU usage to reduce heat from the CPU.
Optimize FPGA following low-power design.
Design the airflow to suck air from two sides and take the heat away.

However, all of these solutions didn't make us satisfied. We finally added a fan to the product. In order to reduce the noise, we use a larger fan with a low RPM. In our testing samples, we can maintain the temperature under 50 degrees Celsius.