Skip to content

Think Horses, not Zebras (Part 2)

There is a popular quote in medical circles:

When you hear hoofbeats, think of horses, not zebras. — Dr. Theodore Woodward

I recently posted several debugging experiences where it was beneficial to examine simple scenarios before complex ones. Below are a few more …

NVMe slot not working

An NVMe slot is fairly simple — it has a clock and one to four PCIe lanes. However, in a recent new hardware design, the slot was not working. We reviewed the kernel boot logs, scoped the clock signals, etc. We did notice the clock amplitude was lower than other designs (see below). The initial suspect was signal integrity as the PCIe data lines run at very high speeds, so we reviewed the routing, board stackup, impedance calculations, etc. Eventually, I compared the PCIe connector pinout to another design and it appeared that the RX and TX data signals were swapped. Sometimes RX vs TX (receive vs transmit) can be a little confusing based on which perspective the signals are referenced to (the host system or the target device). In the case of PCIe, it appears the signals are all named from the perspective of the host. After modifying to swap the RX and TX signals, the NVMe slot worked fine, even though the modification wires drastically violated signal integrity and impedance requirements. PCIe must be a fairly robust transport!

PCIe clock amplitude

While verifying the hardware of a design, we noticed that the PCIe clock was at a lower amplitude than the specification required:

What we were measuring:

The Vcross was about 170mV, and the spec required it to be between 250 and 500mV. To verify our measurement technique, we measured the PCIe clock on another system and it it looked correct. The default output of the clock chip should have given us the correct amplitude. The clock chip was connected via an I2C bus, so we read back the registers to make sure they were set to the defaults. We then set the amplitude to the maximum the chip would output, and it still was not within range. Finally, we noticed the following in the datasheet:

Sure enough, we had terminations resistors for these signals on the PCB. After the resistors were removed, the clock amplitude was correct.

Cellular connection problems

Cellular IoT systems can be tricky to troubleshoot. They are often installed in harsh environments (rain, cold, hot, damp, caustic sewer gasses, etc.). Occasionally a modem goes bad and connections fail. In one recent case, we replaced a modem and it still would not connect. The antenna, cable, and connections were the next suspect. After spending considerable time swapping components and not making any progress, I reviewed the configuration and noticed the APN was not set. After setting the cellular APN, it worked.

Zephyr Nanopb build error

We recently updated the version of Zephyr we were using in a project. In the process, a protobuf generation process broke and another developer found a solution — nanopb was now an optional dependency that needed to be activated. However, the build still failed in my workspace. We discussed diffing our build workspace, but in the back of my mind, I was thinking — there has to be a simple reason. While reading the documentation, it occurred to me to check the west version — sure enough, I was running 1.0 and the latest was 1.2. After updating, it worked.

What can we learn?

Again, in the above cases, the initial assumptions were more complex than the actual problem. Suggestions from the previous article are relevant here:

  • Double-check connector and component pinouts.
  • Compare to working designs.
  • Re-read the entire datasheet carefully.

Additional suggestions learned from these examples include:

  • Verify the configuration of the system is correct, never assume it is.
  • Make sure you have the correct version of the tools installed (ideally your build system would check this for you).
  • Re-read documentation for tools and software components. This will often spark ideas on where to look for problems.

Leave a Reply

Your email address will not be published. Required fields are marked *