Thursday, April 22, 2010

Linux kernel and glibc

I have this question for a while. What's the dependency of kernel and system libraries such as glibc? Can kernel codes call functions in glibc? Do we need glibc when we compile kernel? Or do we need kernel code to compile glibc?

This is what I found from http://kernelnewbies.org/FAQ/LibraryFunctionsInKernel:

Q: Can I use library functions in the kernel ?
A: System libraries (such as glibc, libreadline, libproplist, whatever) that are typically available to userspace programmers are unavailable to kernel programmers. When a process is being loaded the loader will automatically load any dependent libraries into the address space of the process. None of this mechanism is available to kernel programmers: forget about ISO C libraries, the only things available is what is already implemented (and exported) in the kernel and what you can implement yourself.

Note that it is possible to "convert" libraries to work in the kernel; however, they won't fit well, the process is tedious and error-prone, and there might be significant problems with stack handling (the kernel is limited to a small amount of stack space, while userspace programs don't have this limitation) causing random memory corruption.

Many of the commonly requested functions have already been implemented in the kernel, sometimes in "lightweight" versions that aren't as featureful as their userland counterparts. Be sure to grep the headers for any functions you might be able to use before writing your own version from scratch. Some of the most commonly used ones are in include/linux/string.h.

Whenever you feel you need a library function, you should consider your design, and ask yourself if you could move some or all the code into user-space instead.

So, the idea is that the kernel doesn't (and actually can't) refer to external libraries. Kernel and glibc can be compiled seperately. However, glibc seems to have to compile against a particular kernel. Why? because it uses some kernel headers. Also, when a new kernel comes out, it's new features can't be handles by the old APIs, new libraries are needed and they have to be compiled against the new kernel.

Tuesday, April 20, 2010

An endianess problem with uClinux on uc5272

I have been struggling in the past few days trying to run a uClinux cramfs image on a Coldfire uc5272 board. I know, the uc5272 is pretty old, with its uCbootloader at version 1.7.7. I'm trying to run a 2.6.x kernel image on it, so it's interesting to see how a pretty new kernel running on a pretty old hardware.

uCbootloader has special support for cramfs image. Unlike u-boot or most of the other bootloaders, uCbootloader understands cramfs and can "ls" its contents, can "cat" file contents. And the actual kernel file, linux.bin, is embedded in the rootfs "/" directory. uCbootloader will uncompress the image, retrieve linux.bin out of the rootfs and run it, and the kernel in turn mount the fs.

u-boot and most other bootloader requires that the kernel file is out of the rootfs, usually on top of rootfs in the image. The kernel got loaded first, and then the rootfs got mounted by the kernel.

I have tried a few builds but none of them works. "ls" shows an error "Not a valid file system", and "go" doesn't uncompress the image. After a while I realized that either the image is not valid, or the uCbootloader somehow doesn't like the image, one way or the other. Nothing to do the kernel, because it's not even there yet.

I started to doubt the endianess of the image. I used "od" to display the magic number of the image, it's 0x28cd3d45, little endian, which is the host's endianess. It's supposed to be right. There is a convention that cramfs image is always little endian, even for big endian CPU. The kernel is supposed to do the swapping.

So what's going wrong, what about the uCbootloader? It doesn't complain "wrong magic number" etc. But it's old right? So I did a little experiment, I change the order of the first 4 bytes of the image to make it looks like a big endian image (the first 4 bytes as a long, is the magic number of cramfs). And "ls" works! Though it's all messy codes, but it seems to accept it as cramfs.

Then I used /sbin/mkfs.cramfs on my Ubuntu 9.10 to build a big endian image. This one support different endianess by a "-N" option. The image is downloaded and programed into flash, and "ls" shows its contents correctly. Beautiful! And "go" command uncompress the image, load the kernel from "/" dir and run. It stops there though, complaining illegal instructions. This is not supprising , because the kernel is built in little endian.

So, it's a good exercise at least. I have 2 things to go further, one is update the uCbootloader to 1.7.8. And the other is to build a big endian kernel.

Wednesday, April 14, 2010

"static" in C and C++

This post will talk a little bit about the C/C++ keyword "static" and its use.

static as a key word can be used in three distinct occasions in C:

1. a static local variable declared within a function. The value of the static variable remains between function invocations.
2. a static variable declared within a module (a file for example), but outside any function bodies. Such a variable can only be accessed by functions within the module, but not functions in other modules. It's a localized global variable.
3. a static function declared in a module. Such a function has a scope of the module, with only functions in the module in which it's declared can call it.

For embedded Linux, a program's static variables (and global variables) are actually in the program's data segment, and exist until the program terminates. (Since compiler and linker know their existence and their types, they are pre-allocated.) To be more accurate, these with initial values are in the data segment, while these without initial values are in the program's BSS segments with "0" as initial values.

This is different for local variables declared in a function (auto variables), and params passed over to functions, they got their storage in the thread's stack(usually every thread of the program has a stack). And memory allocated by malloc() is from the program's heap(there's typically one heap for the program).

Now for static in C++. There are:

1. static data members of class. static data members are associated to a class instead of an object of the class. There is only one memory copy of a static data member for all the objects.
2. static methods. Again, a static method is associated to a class instead of an object. A static method is just like a regular function. The only difference is that it can access private and protected static data members of the class. And if you make a static method public, it can be called just like a function, with the class's name and a double "::" in front of it.

Friday, April 9, 2010

Linux/Unix file system inode

Every objects (file, directory,etc) in a Unix style file system is represented by an inode. An inode is a data structure which contains information about the file object, such as user and group ownership, access mode (read, write, execute permissions) and type of file etc. An inode is again identified by a inode number, which serves as the index of the inode.

The harddrive or a partition containing a file system usually has three parts, a super block, an area for inodes, and the largest part, blocks for the actual file contents.

The inode number indexes a table of inodes in a known location on the device. From the inode number, the kernel can access the contents of the inode, including the data pointers, and so the contents of the file.

The kernel doesn't really know the name of the file in the file system. All it knows is inodes. The directories (which is a file object itself) translate file names to
inodes. Its contents is a mapping table of file names and corresponing inode.

Some commands can be used to see the inode of a file.

ls -i
example:
# ls -i /etc/passwd
# 32820 /etc/passwd

stat
example
# stat /etc/passwd
# File: `/etc/passwd'
Size: 1988 Blocks: 8 IO Block: 4096 regular file
Device: 341h/833d Inode: 32820 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2005-11-10 01:26:01.000000000 +0530
Modify: 2005-10-27 13:26:56.000000000 +0530
Change: 2005-10-27 13:26:56.000000000 +0530

Again, file name is just a human readable representation of the file. If a hardlink is created on a file, a "ls -l" command will show the file has 2 links, meaning the file has 2 names now. And if a symbolic link is created on a file, a "ls -l" command will show that the file has only 1 link. The symolic link itself is a newly created file, with its contents pointing to the original file. A symbolic link has an inode associated with it.

There are also APIs to manipulated file by inodes. There are mainly for system/application development purpose.