A guide to avoiding time-consuming debugging in embedded programming, using Armv8-A processor architectures as an example.
When porting an OS running on one target board to another target board, one of the tasks to be performed is to set the “translation tables”. Translation tables link logical addresses to physical addresses, and the Memory Management Unit (MMU) of the chip operates according to these settings.
Since the size of memory and its allocation to physical addresses varies at each target board, this table needs to be updated during each porting.
Configuration
In the Armv8-A architecture, not only the logical and physical addresses are associated, but also the access rights and memory attributes are set in the translation tables. There are two main types of memory attributes: Normal and Device.
For standard RAM and ROM one usually uses the attribute Normal, and for memory-mapped peripheral registers the attribute Device (figure 1).
Figure 1: Armv8-A Normal and Device configuration
When this attribute setting is changed, it must be thoroughly checked at the desk. This is because it is difficult to trace the cause from the symptoms that occur when an attribute is incorrectly set.
What follows are three symptoms that can occur when memory attributes are incorrectly set.
Wrong Memory Category Configuration
For memory accesses, a distinction is made between aligned address access and non-aligned address access. Compilers can provide some optimization possibilities if they can use memory with non-aligned access (e.g. storage of data with smaller bit sizes in RAM). Normal physical ROM and RAM usually also allow non-aligned accesses.
However, there are also memories in embedded chips, such as memory-mapped registers, where non-aligned access is forbidden by the hardware specification. Access types must be set correctly in the translation-table according to the physical memory.
In translation-table "Normal" allows non-aligned address access, while "Device" allows only aligned address access.
If "Device" is incorrectly set in the address-table for normal RAM and ROM, but the compiler performs non-aligned access, its optimized non-aligned accesses will trigger hardware exceptions, although there is no programmatic error to be found.
Figure 2: Status register DFSR
An engineer familiar with the Armv8-A architecture may be able to determine the cause by looking at the DFSR (Data Fault Status Register, Figure 2) after an exception occurs and seeing that the FS bits (Fault status bits) indicate an alignment error. However, without this knowledge, it is difficult to solve the problem of an exception message caused by a software process, since there is no other direct information pointing to this root cause.
Wrong Normal Attribute Setting
The second example is a problem that occurs when the Normal attribute is set for an address where no physical memory is available at all.
In the translation tables of the Armv8-A architecture, there are several settings for granularity, e.g. 4KB, 16KB, 64KB, etc.. The following error situation may occur especially in the initial phase of OS porting if only a single granularity, e.g. 16KB, is used in the translation table. (Figure 3).
Figure 3: Wrong Normal attribute granularity setting
For example, if a small on chip SRAM of only 4 KB is set to the Normal attribute in 16 KB units of Translation tables, 12 KB before or after the SRAM will have the Normal attribute set to a physical address to which no physical memory is available.
If a program accidentally accesses an address that is not allocated memory, the expected behavior is that a CPU exception will be generated, and the OS will be able to detect abnormal behavior of the program. However, if the address is set as normally accessible in the translation table, no CPU exception is raised, and the memory access is performed.
But in this case, the operation is undefined. In many cases, the CPU is in a state like a hang-up, and reading log information and debugging using JTAG are no longer possible. This makes it difficult to capture the activity during debugging just before the problem occurs and difficult to get to the cause of the problem.
Wrong Normal Attribute Setting Combines With Speculative Access
The last example is similar to the second one, but complicated by “speculative access”.
Speculative access is a pre-fetch mechanism by Armv8-A CPUs where memory accesses are initiated in anticipation of instruction execution. Performance can be improved by starting slow memory accesses ahead of the instruction execution by the CPU.
However, this look-ahead is speculative, and the chosen memory address to be accessed may not be used by the CPU. Put another way, accesses to addresses that should not be accessed programmatically may occur.
Speculative data memory accesses are performed only to memory areas with the Normal attribute specified. Therefore, when the predictions are wrong, there is usually no problem, just the pre-fetched data are not used by the program. (Memory subject to speculative reads as a program is different from this; more on this another time.)
However, if a physical address to which no memory is assigned is set to Normal, speculative accesses will occur. In this case, as in the second example, a CPU hang-up will also occur. But now it is even more difficult to identify the cause of the hang because the hang occurs due to memory accesses that occur in the background by hardware mechanisms and are not related to the debug able program execution.
Conclusion
Therefore, setting up the translation tables is one of the tasks that must be performed with special care when porting the operating system. This is because it is difficult to identify the cause of the problem from the symptoms caused by the problem.
OS porting personnel and application developers alike should keep in mind the phenomena listed above. Unexplained hangs or exceptions may also be caused by the Translation tables settings. If you suspect this is the case, it is wise to consult with the OS porting staff.
Yusuke Kubo,
Technology Director, Product Support, Software Engineering
About Yusuke: Yusuke joined eSOL in 2000 and has since been involved in the development and service offering of eSOL's RTOS platform products as an embedded engineer for over 20 years. He is now in charge of the company’s Software Product Support Department. |