Wednesday, May 26, 2010

Deadlock and how to avoid it

Deadlock is a situation where two processes are each waiting for the other to release a resource it held, or more than two processes are waiting for resources in a circular chain, so that no one can have the resource required and all stop running.

Deadlock can happen between processes, thread or task (in vxWorks). And it can happen on any kinds of shared resources. I use process here for discussion.

There are three conditions in order for a deadlock to happen:
1. Each of the processes involved access multiple shared resources.
2. Each of the processes involved hold some shared resources while requiring other shared resources.
3. A circular waiting chain potentially possible.

To handle a deadlock, basically we have to break one or more of the conditions above. There are four ways to avoid it as far as I know.
1. All the processes apply the same coding pattern. And the pattern is, all the processes require (semWait for example) the shared resources in the same order, so that circular chain will be formed.
This method is suitable for smaller scale application, where all the shared resources can be listed and ordered.

2. Back-off algorithm. Each of the processes either have all the shared resources it need before proceed or none of them. In reality, the first semTake() can use WAITFOREVER, and the subquential semTake() use NOWAIT. And if one of the subsequential semTake() fail, it will back off and release all the resources it already holds.
This method increase the coding complexity. And it's not suitable for realtime system as the back off takes unpredictable time.

3. Avoid processes access multiple resources. This can be done by redesign the software's structure, algorithm or data structure. For example, a client-server model can be used so that only the server manage the shared resources, while client access the resources through the server.

4. Use some sort of watchdog to monitor the processes and if a deadlock is detected, reset the system or the processes.
Since a deadlock situation rarely happens, and if built-in mechanism is too expensive, then a third party monitor maybe suitable. For example, Linux kernel totally ignore deadlock and pretend it will never happen! (Supprising, isn't it? But that's real. When a deadlock indeed happens, the system just reboot.)

Thursday, May 13, 2010

Embedded linux application remote debugging using GDB

As we all know, GDB is a pretty good debugging tool. Using it to debug applications running on the same Linux machine on which you run the GDB debugger is somewhat different from debugging an application running on a different architecture. This is called remote debugging, i.e, the application is running on a Linux base embedded system (with a different CPU), and the GDB debugger is running on the development host machine.

Starting from version 5.3 (am I right?), GDB supports remote debugging (even multi-threaded program).

Here is how to set up GDB for remote debugging. I'm using m68k as an example.(I know it's pretty old, but for other architecture, the process is pretty much the same).

1. Build and install GDB cross debugger
Download GDB from http://ftp.gnu.org/gnu/gdb, then make and install it.

tar zxvf gdb-6.6.tar.gz
cd gdb-6.6/
mkdir build
cd build/
../configure --program-prefix=m68k-uclinux- --target=m68k-elf --disable-werror
make
sudo make install

2. Build and install gdbserver. (Some linux distributions have gdbserver as an application in the rootfs. Some toolchains have it in the package. If it's not available there, you have to cross compile it)
gdbserver comes with the GDB package. So to cross compile it you can do:

$ cd gdb-6.6/gdb/gdbserver
$ export CC=m68k-ulinux-gcc // if required
$ export LD=m68k-ulinux-ld // if required
$ ./configure --host=m68k-uclinux --target=m68k-uclinux
$ make

3. Build the application with proper CFLAGS, LDFLAGS etc.
For CFLAGS, I believe you need at least -g. And if possible, give -O0 to disable optimization. An example of mytest.c will be:

m68l-uclinux-gcc -g -O0 -Wall -o mytest mytest.c

There are 2 executables generated, one is mytest stripped off symbols, and the other is mytest.gdb with all the symbols. It'll be used by the debugger to load symbol table.

4. Copy gdbserver and mytest over to target (using nfs for example)

5. On the target side, run application by gdbserver. I use network as the communication interface, but you can use serial port too.

gdbserver :3000 mytest

The 3000 is the port number on which gdbserver is listening for gdb commands.

6. On the client side, run cross debugger gdb, connect to gdbserver. And once connected, you can use any gdb commands for debugging.

m68k-uclinux-gdb 192.168.1.100:3000 ./mytest.gdb

192.168.1.100 as you imagine is the target's IP address. And if you want to be able to "list" source code in gdb, you have to put the source code (mytest.c) together with mytest.gdb in the same dir. (You can set it up if you really want to put the source code in other dirs by setting environment variables)

And that's it. There are some details I missed here, they can be found online or by gdb help.

Monday, May 10, 2010

Linux System.map file and its use.

System.map file is the kernel's symbol table created everytime the kernel is compiled. It has the address information of all the symbols (variables, functions, etc) used in kernel. These address information can be useful for trouble shootting and debugging. For example, last time my uClinux kernel doesn't boot up, I used System.map to find out that the RAM end address is not correctly set, and from there I finally solved the problem.

Here is a good article talking about System. It's from http://rlworkman.net/system.map/. But I copy it here for ease of access.

---------------------------------------------------------------------------------
This page is a mirror of Peter Jay Salzman's System.map Explanation, and the only modification made by me is the addition of this note. --rworkman

The system.map File
There seems to be a dearth of information about the System.map file. It's really nothing mysterious, and in the scheme of things, it's really not that important. But a lack of documentation makes it shady. It's like an earlobe; we all have one, but nobody really knows why. This is a little web page I cooked up that explains the why.

Note, I'm not out to be 100% correct. For instance, it's possible for a system to not have /proc filesystem support, but most systems do. I'm going to assume you "go with the flow" and have a fairly typical system.

Some of the stuff on oopses comes from Alessandro Rubini's "Linux Device Drivers" which is where I learned most of what I know about kernel programming.

What Are Symbols?
In the context of programming, a symbol is the building block of a program: it is a variable name or a function name. It should be of no surprise that the kernel has symbols, just like the programs you write. The difference is, of course, that the kernel is a very complicated piece of coding and has many, many global symbols.

What Is The Kernel Symbol Table?
The kernel doesn't use symbol names like BytesRead(). It's much happier knowing a variable or function name by the variable or function's address, like c0343f20. Humans, on the other hand, do not appreciate addresses like c0343f20. We prefer to use symbol names like BytesRead(). Normally, this doesn't present much of a problem. The kernel is mainly written in C, so the compiler/linker allows us to use symbol names when we code and allows the kernel to use addresses when it runs. Everyone is happy.

There are situations, however, where we need to know the address of a symbol (or the symbol for an address). This is done by a symbol table, and is very similar to how gdb can give you the function name from an address (or an address from a function name). A symbol table is a listing of all symbols along with their address. Here is an example of a symbol table:

c03441a0 B dmi_broken
c03441a4 B is_sony_vaio_laptop
c03441c0 b dmi_ident
c0344200 b pci_bios_present
c0344204 b pirq_table
c0344208 b pirq_router
c034420c b pirq_router_dev
c0344220 b ascii_buffer
c0344224 b ascii_buf_bytesYou can see that the variable named dmi_broken is at the kernel address c03441a0.

What Is The System.map File?
There are 2 files that are used as a kernel symbol table:

/proc/kallsyms
System.map
There. You now know what the System.map file is.

Every time you compile a new kernel, the addresses of various symbol names are bound to change.

/proc/kallsyms is a "proc file" that is created on the fly when a kernel boots up. Actually, it's not really a disk file; it's a representation of kernel data which is given the illusion of being a disk file. If you don't believe me, try finding the filesize of /proc/kallsyms. Therefore, it will always be correct for the kernel that is currently running.

However, System.map is an actual file on your filesystem. When you compile a new kernel, your old System.map has wrong symbol information. A new System.map is generated with each kernel compile and you need to replace the old copy with your new copy.

What Is An Oops?
What is the most common bug in your homebrewed programs? The segfault. Good ol' signal 11.

What is the most common bug in the Linux kernel? The segfault. Except here, the notion of a segfault is much more complicated and can be, as you can imagine, much more serious. When the kernel dereferences an invalid pointer, it's not called a segfault -- it's called an "oops". An oops indicates a kernel bug and should always be reported and fixed.

Note that an oops is not the same thing as a segfault. Your program (usually) cannot recover from a segfault. The kernel doesn't necessarily have to be in an unstable state when an oops occurs. The Linux kernel is very robust; the oops may just kill the current process and leave the rest of the kernel in a good, solid state.

An oops is not a kernel panic. In a panic, the kernel cannot continue; the system grinds to a halt and must be restarted. An oops may cause a panic if a vital part of the system is destroyed. An oops in a device driver, for example, will almost never cause a panic.

When an oops occurs, the system will print out information that is relevent to debugging the problem, like the contents of all the CPU registers, and the location of page descriptor tables. In particular, the contents of the EIP (instruction pointer) is printed. Like this:

EIP: 0010:[<00000000>]
Call Trace: []

What Does An Oops Have To Do With System.map?
The information given in EIP and Call Trace is not very informative. Since a kernel symbol doesn't have a fixed address until after the kernel is booted, c010b860 can point to any kernel symbol. Kernel developers wouldn't have the faintest clue where to begin looking for the bug if you simply reported an address. They need a symbol name to begin hunting for the bug.

To help understand cryptic oops output, a daemon called klogd, the kernel logging daemon, is used to perform symbol-address translation. When an ooops occurs, klogd intercepts the oops report, translates addresses into symbol names (e.g. translating c010b860 into BytesRead()), and logs the event with the system logger, usually syslogd,

To perform kernel symbol-address resolution, klogd uses System.map.

There. Now you know what an oops has to do with System.map.

Fine print:
There are actually two types of address resolutions performed by klogd.

Static translation, which uses the System.map file.
Dynamic translation, which is used with loadable modules. These translations don't use System.map and is therefore not relevant to this discussion, but I'll describe it briefly anyhow:
Klogd Dynamic Translation
Suppose you load a kernel module which generates an oops. An oops message is generated, and klogd intercepts it. It is found that the oops occured at d00cf810. Since this address belongs to a dynamically loaded module, it has no entry in the System.map file. klogd will search for it, find nothing, and conclude that a loadable module must have generated the oops. klogd then queries the kernel for symbols that were exported by loadable modules. Even if the module author didn't export his symbols, at the very least, klogd will know what module generated the oops, which is better than knowing nothing about the oops at all.

Where Should System.map Be Located?
System.map should be located wherever the software that uses it looks for it. It's the only answer possible until some standards board (or someone of clear authority) mandates exactly where System.map should be located and what its name should be. With that in mind, let's look at some software packages and where they expect System.map to be.

klogd
If klogd isn't given the location of System.map as a command line option with the -k switch, it uses the following string array (as of version 1.4.1) to search for it (see source code file ksym.c:

static char *system_maps[] =
{
"/boot/System.map",
"/System.map",
#if defined(TEST)
"./System.map",
#endif
(char *) 0
};
klogd looks for both "System.map" and "System.map-release" in these directories where "-release" is your kernel version. This is an intelligent search: if klogd finds a System.map for a kernel version that is different from the currently running kernel, it'll keep searching.

Although the klogd man pages and source code comments claim that /usr/src/linux is in the search path, I can't find any reference to it. I've reported this to Debian BTS and to Dr. G.W. Wettstein (the author of ksym.c).

Device Drivers
System.map isn't just useful for debugging kernel oopses. A few drivers need System.map to resolve symbols since they're linked against kernel headers instead of glibc). They won't work correctly without the System.map for the particular kernel currently running. This is NOT the same thing as a module not loading because of a kernel version mismatch, which has to do with the kernel version, not the kernel symbol table which changes between kernels of the same version!

ps
ps uses a different (more general) search array than klogd:

*sysmap_paths[] = {
"/boot/System.map-%s",
"/boot/System.map",
"/lib/modules/%s/System.map",
"/usr/src/linux/System.map",
"/System.map",
NULL
};
where %s gets replaced by the currently running kernel version.

What else uses (or doesn't use) the System.map
At one point (May 2003), I thought that lsof and dosemu used System.map, but from looking at the source code (May 2007) they don't appear to anymore (or perhaps I was mistaken).

What Happens If I Don't Have A Healthy System.map?
Suppose you have multiple kernels on the same machine. You need a separate System.map file for each kernel. If you run a kernel with no (or an incorrect) System.map, you'll periodically see annoying warnings like:

System.map does not match actual kerneleverytime you use ps. Also, your klogd or ksymoops output will not be reliable in case of a kernel oops.

TODO
Look at ps more closely to determine if the man page is really in error, and if so, report it to Debian BTS and the ps maintainers.
Fix the CSS: my markup sucks ass. I'm really bad at webpage design. Need a better way to distinguish between sections and subsections. Fixed font content should have a smaller a font size.


Acknowledgements
Rickey Page (28 May 2003) for doing my heart good.
Mauro Giachero (May 2007): Read the klogd source code and determined that the klogd man page and source code comments are wrong about the System.map man page search path. He also provided information about the System.map usage (or lack thereof) of lsof, ps, and dosemu.

Wednesday, May 5, 2010

uClinux 2.6.x ROM based image with only data, init and bss in RAM

Recently I've been trying to build a uClinux (2.6.x kernel)ROM based image for Coldfire uc5272, with the kernel executing (XIP) from within ROM, .data, .init and .bss segments in the RAM, and ROMfs as rootfs in ROM. This arrangement saves memory as only these changable parts of the system are in RAM.

There are three files to change to build such an image:
1. vmlinux.lds.S
This is the linker script file for 2.6.x kernel. It describes the memory layout of the image. The .romvec and .text segments are defined in ROM, and .data, .init, .bss segments are defined in RAM.

2. Head.S
This is the second bootloader for 2.6.x kernel. It't the entry point of the kernel. In this file, some codes are needed to move .data, .init into RAM, and clear out .bss segments.

3. uclinux.c
This is the mapping driver for ROMFS partition support. Here the starting address of romfs has to be given.

I have made all the changes, load the image into ROM, and issue a "go" command. What I got first is:

go 0x10c20000 ...

Then it just stopped there. After digging down for a while, I found the problem. The _ramend variable is 0, while it's supposed to be the end of the RAM. But I clearly set a value for it in head.S. And here is the code snap:

GET_MEM_SIZE /* macro code determines size */
addl %a7,%d0
movel %d0,_ramend /* set end ram addr */

And here is the code for moving .data, .init into RAM:

lea _etext, %a0
lea _sdata, %a1
lea _sbss, %a2
_copy_data:
movel (%a0)+, (%a1)+
cmpal %a1, %a2
bhi _copy_data

How come _ramend is 0? By the way, the logs at __log_buf (0x3df44 of RAM in my case) indicates that there was a memory access violation, which coincides a _ramend with wrong value.

The answer is really simple. But finding the answer took me some time. It turned out that the order of setting _ramend and moving .data into RAM is significant and I did it in a wrong order.

The right order is to move .data into RAM first and then set _ramend. Why? because when the image is burnt into ROM, the .data segment is in ROM, with all the variables in it have their init values. For _ramend, it's 0. But _ramend's address is actually in RAM (0x2000c in my case). So if _ramend is set before .data is moved, the set value will be over taken by the init value from ROM. So the .data needs to be moved into RAM first, then set the value for _ramend, the value will be kept and for later use.

How do I find the address of _ramend? I found it from System.map. The first 4 variables in .data segment are:

00020000 D _rambase
00020000 D _sdata
00020004 D _ramvec
00020008 D _ramstart
0002000c D _ramend