(*This content is the English version of the article posted in eSOL’s Japanese Blog in August.)
Hello! I am Y.K., working for the Product Support Department at eSOL’s Engineering Division.
At eSOL, I’m in charge of customer support. We receive all types of inquiries at the help desk from customers about how to use eSOL products they’ve purchased, how to check specifications, and more.
We sometimes receive troubleshooting requests like, “It’s not working as I thought it would.” When troubleshooting, we have a standard procedure that we follow. We start by observing the issue that the customer is facing. After that, we estimate what the root cause could be, build a hypothesis, acquire additional information in line with the hypothesis, and finally confirm it. To build a hypothesis, we must be well-versed in the implementation of the software. And when handling built-in software, in some cases, we must also build a hypothesis based on how the hardware may be working. In some cases, it is difficult to guess the cause of the trouble just by hearing the condition. Investigating such issues is customer support engineers’ chance to show their skills. They build various hypotheses and acquire additional information to narrow down the causes.
Working alongside veteran customer support engineers, I can realize their backgrounds based on their rich experience and knowledge. Not only software design, but they are also knowledgeable regarding hardware, and they have a good understanding of the points that tend to be related to software issues. I believe this knowledge is something a veteran engineer can only master through countless troubleshooting experiences.
However, I also think that it’s possible to master this talent by sharing examples of troubleshooting experiences and participating in simulations based on other engineers’ experiences without personally going through a trying troubleshooting ordeal.
So, in this article, I want to introduce one example of troubleshooting experiences that our veteran customer support engineers experienced. I’ll also be reviewing how they were able to successfully resolve the issue and study what points are to be emphasized to successfully investigate issues like a veteran customer support engineer would.
Troubleshooting Example: A corrupted FAT on an SD memory card
The example I will be examining today is a help request we received from a customer regarding a corrupted FAT on the SD memory card.
FAT is a type of metadata of the FAT File System and is a table for managing the free space on an SD memory card. When this table has a value of 0, the SD card’s segment corresponding to the table is free space available, and when it has a value other than 0, the segment is being used. The issue the customer was experiencing was that although the FAT indicated the segment was in use, it was being allocated for another use. We checked the SD analyzer’s log and discovered that when reading from the SD card, the FAT was returning a value other than 0 (meaning it was in use), and when writing to the SD card, it was replaced by a different value (meaning that it was allocated for another purpose.)
Figure 1: The issue at hand (*You may click this image to expand it).
The hypothesis I came up with was as follows: only the 32-byte segment, a part of the FAT, was being overwritten to 0 for some reason while the device read out the FAT from the SD memory card and transmitted it over to the file system, leading to the file system allocating the segment for another use.
Figure 2: My hypothesis (*You may click this image to expand it).
I decided to confirm at what point the FAT was being corrupted by embedding trap codes in the SD memory card’s drivers and file system. These trap codes work to trap and halt the process if just the 32-byte segment is turning to zero on the FAT sequentially used. However, none of the trap codes worked to trap. I also attempted to insert a trap code in the writing process of the SD memory card’s driver, but that also failed to trigger a trap, and my investigation came to a standstill.
Figure 3: Testing my hypothesis via a trap code (*You may click this image to expand it).
When I took this issue to one of my seniors, who is a veteran support engineer, they discovered the scenario that would explain the situation. As a result, I understood that the issue was occurring exactly as they hypothesized, and I was able to resolve the issue.
How about that? What sort of hypotheses do you form based on this information? I’ll explain the hypothesis my senior formulated in detail in the second part. I’ll also review why they were able to build their hypothesis and study what points are to be emphasized so that we can successfully investigate issues like a veteran customer support engineer would.
Look forward to the second part of my blog (coming soon...)!