Saturday, June 28, 2014

Interacting with kernel using sysfs

। जय श्री भगवान् ।
In the last post I talked about how to add a system call to a x86 or x86_64 system. There are a couple of ways when a user space application might want to interact with kernel for example I/O operations or some performance statistics or maybe a special device has its own set of ioctl calls which the program want to use. We saw one example, that is using a system call by which a user space program can interact with kernel however it's not possible and even required to be adding system calls.

We'll take a look at one of the most simple interfaces for interacting with kernel here, which is the sysfs. Although it's not "as simple" as you may think, but we can leave out a lot of things like locking, allocating memory, file operations etc. So we can just focus on one thing, that is the easiest way to get data into and out of the kernel. Usually sysfs is used for making module interaction and exporting device specific information however it can be used for literally anything you want to accomplish. So let's dive into the basics first what exactly is sysfs

The basic idea of having a Sysfs

Sysfs was created mainly for devices and kernel modules wishing to export/import information. The information to be exported can't be more than PAGE_SIZE (usually 4KiB) however depending on how one is implementing the Sysfs files you can accomplish quite much more. So the basic idea is that when a module wants interactivity from user space for example a device that can be turned off by the root user by writing a specific command in the device's register then he/she shouldn't have to go all the way to write a program doing ioctl's. Instead, the device's driver module can create Sysfs entries allowing the root user to just do echo <command_value> /sys/<sysfs_file> which would take care of everything. 

The whole sysfs is based on the idea of kobject, which represents some kind of entity. A kobject may have a parent and may have many children which again are kobjects. So the basic idea is something like shown below,

sysfs layout
Sysfs Structure
So basically the idea is to group a certain type of kobjects and put them under that type. By default you can see the sysfs entries in /sys, and depending on the type of kobject you wish to implement it could be added under one of these. Since this is a very gentle introduction to kobjects we'll rather not use any parent and add our kobjects directly under /sys. So let's see what do we need to know in order to do this,

The following is the listing of kobject structure,

struct kobject {
        const char              *name;
        struct list_head        entry;
        struct kobject          *parent;
        struct kset             *kset;
        struct kobj_type        *ktype;
        struct sysfs_dirent     *sd;
        struct kref             kref;
        unsigned int state_initialized:1;
        unsigned int state_in_sysfs:1;
        unsigned int state_add_uevent_sent:1;
        unsigned int state_remove_uevent_sent:1;
        unsigned int uevent_suppress:1;
};

The above structure seems daunting there's a lot going on there however we don't need to bother about most of it right now and just need the name, kref and the parent. The rest are used for internal kobject maintenance. The Leaf kobjects are the ones where the real thing happens. These Leaf kobjects are implemented by the module writer using attributes or in specific kobj_attribute. The following shows the listing for both,

struct attribute {
        const char              *name;
        umode_t                 mode;
#ifdef CONFIG_DEBUG_LOCK_ALLOC
        bool                    ignore_lockdep:1;
        struct lock_class_key   *key;
        struct lock_class_key   skey;
#endif
};

struct kobj_attribute {
        struct attribute attr;
        ssize_t (*show)(struct kobject *kobj, struct kobj_attribute *attr,
                        char *buf);
        ssize_t (*store)(struct kobject *kobj, struct kobj_attribute *attr,
                         const char *buf, size_t count);
};

As you can see the kobj_attribute embeds the attribute structure however it also provides methods to show and store information from/to user/kernel. Thus it all boils down to the following steps that need to be done

  1. Create a parent for our attributes. This is required since we don't want to have our attributes coming up under sysfs directly.
  2. Create some kobj_attribute structures and set the store and show on these.
The following module shows you how to accomplish this. The module itself is commented heavily however we'll take a look at some interesting pieces.

Kernel Module using Sysfs, Kobjects and kobj_attribute

The idea of this module is to have 
  1. A parent directory, that is a parent Kobject.
  2. Two attributes that store their information in a static array. 
 
#include <common.h>
#include <linux/sysfs.h>

#define ROOT_KOBJ_NAME          "pks_kobj"
#define ROOT_ATTR1_NAME         "pks_kobj_attr1"
#define ROOT_ATTR2_NAME         "pks_kobj_attr2"
#define ROOT_ATTRS_COUNT        2

ssize_t rootfs_show(struct kobject *kobj, struct kobj_attribute *attr,
                        char *buf);

ssize_t rootfs_store(struct kobject *kobj, struct kobj_attribute *attr,
                        const char *buf, size_t count);


/* 1 Word of storage per attribute */
#define ROOT_ATTR_STORAGE_SIZE  (ROOT_ATTRS_COUNT * sizeof(unsigned long))
/*
 * This is our directory sort of for our sysfs files
 */
struct kobject *root_kobj;

/*
 * These are the files under pks_kobj we'll see. 
 * */
struct kobj_attribute root_kobj_attr1 = __ATTR(root_kobj_attr1, S_IWUSR|S_IRUGO,
                                        rootfs_show, rootfs_store);

struct kobj_attribute root_kobj_attr2 = __ATTR(root_kobj_attr2, S_IWUSR|S_IRUGO,
                                        rootfs_show, rootfs_store);

const struct attribute *root_kobj_attr[] = {    &root_kobj_attr1.attr,
                                        &root_kobj_attr2.attr, NULL};
/*
 * We need storage to get/put data from/to user land. Let's just create 
 * a static array for this.
 */
static char attribute_storage[ROOT_ATTR_STORAGE_SIZE];

static int __init init_sysfs_objs(struct kobject *root_kobj_parent)
{
        int err = 0;
        root_kobj = kobject_create_and_add(ROOT_KOBJ_NAME, root_kobj_parent);
        if (!root_kobj) {
                err = -ENOMEM;
                goto no_root_kobj;
        }
        err = sysfs_create_files(root_kobj, root_kobj_attr);
        if (err)
                goto err_create_files;
        return 0;

err_create_files:
        kobject_put(root_kobj);
no_root_kobj:
        return err;
}
static int __init load_module(void)
{
        return init_sysfs_objs(NULL);
}
static void __exit cleanup_sysfs_objs(void)
{
        sysfs_remove_files(root_kobj, root_kobj_attr);
        kobject_put(root_kobj);
}

static void __exit unload_module(void)
{
        cleanup_sysfs_objs();
}

ssize_t rootfs_show(struct kobject *kobj, struct kobj_attribute *attr,
                        char *buf) {
        unsigned long *storage = (unsigned long*)attribute_storage;
        //pr_debug("Copying to user space from attribute %s\n", attr->attr.name);
        if (attr == &root_kobj_attr1) {
        }
        else if (attr == &root_kobj_attr2) {
                storage++;
        }
        *( (unsigned long*)buf) = *storage;
        return sizeof(unsigned long);
}

ssize_t rootfs_store(struct kobject *kobj, struct kobj_attribute *attr,
                        const char *buf, size_t count)
{
        unsigned long *storage = (unsigned long*)attribute_storage;
        //pr_debug("Copying from user space to attribute %s\n", attr->attr.name);
        if (attr == &root_kobj_attr1) {

        }
        else if (attr == &root_kobj_attr2) {
                storage++;
        }
        pr_debug("Changing from %lu to %lu \n", *storage, *( (unsigned long*)buf));
        *storage = *( (unsigned long*)buf);
        return sizeof(unsigned long);
}

module_init(load_module);
module_exit(unload_module);


Creating the root directory for our kobj_attributes

In the above listing, we've created one directory represented by our root_kobj. All kobjects should be created dynamically and not statically. Therefore we've used a function kobject_create_and_add for this purpose. If you see the final argument of that function then we've supplied NULL which means this kobject doesn't have any parent and would thus appear directly under /sys.

Creating the kobj_attributes

The attributes you would like to show would almost always be declared statically since you know what you want to show in sysfs for your device or whatever purpose you are creating those entries. To facilitate this kernel provides the macro __ATTR for initializing the kobj_attribute. This attribute takes the variable name as it's first argument and uses it by stringify-ing it so we don't even need the names defined at the top.

Another important thing to note here is that for each of the attribute you'll have to specify a show and store method. Most of the time you'll have some common code to be executed so there are two ways in which you can do this,
  1. Provide a common routine and check which attribute is passed in by comparing the pointer to your statically defined kobj_attribute
  2. Provide wrappers over the kobj_attribute and then do container_of to get the containing attribute structure and go forward that way. This requires a bit more work and you may not even want this.
In this simple example we'll take approach 1.

We've used an available wrapper function sysfs_create_files, the first argument of this function is the kobject under which we will create these attributes while the second is an array of pointers, see how we've specified NULL at the end of this array. This is mandatory since this function will iterate over the array unless it finds a NULL entry because there's no length field supplied.

Copying Data to/from user space

The store method implies that
  • You are copying data from user land to kernel
  • You will return how much data you've copied. Usually just return same amount as passed in but copy whatever amount you really want.
The show method implies that
  • You are copying data from kernel to user land
  • You'll return how much data you are copying into the buffer.
 The buffer pointer passed in  buf in above code is actually a mapped page within kernel. So you can do memcpy or just assign directly as I've done above. Remember that buffer is exactly PAGE_SIZE so don't go above that limit.
 
The internal buffer is just an array. It holds the value as an unsigned long for each of the attributes.

Cleaning up,

You'll need to remove the files you created the same way you've added the files. Just be sure you do it reverse that is first remove the files then remove the parent kobject. To remove the root_kobj all you need to do is call kobject_put. This decrements the count of kobject and when the count goes to 0, it cleans up this kobject. This is why it's required that you remove the files first and then remove the parent kobject.

Excercises

  1. Modify the above module so that the first byte of each attribute's storage area represents 8 bit flags. That is the data can be stored in only 3 of the 4 bytes on a 32 bit computer and 7 of the 8 bytes on 64 bit computer.
  2. Write test programs, a producer and consumer that will write/read data respectively. Use the flag byte for any synchronization you may need. If the buffer is already full and producer hasn't consumed then you should check the flag byte if the data can be over written or not. This will be set/unset randomly by your producer on each write. If data can't be over written then you should return an error code or just 0 to convey nothing was written.
  3. Try implementing a wrapper over attributes and see how you can use container_of to accomplish the same. Think about what you'll need in your wrapper structure.
We'll again visit Sysfs later on for sure when we dive into device drivers.

Sunday, June 15, 2014

Adding a new System Call in Linux (x86 and x86_64)

। जय श्री भगवान् ।
In the previous module we saw how we can modify the running kernel by dynamically adding or removing modules from it. In this post I'll talk about how to add a new system to x86 or x86_64 system. Most of the stuff available on internet corresponds to 2.6 kernels however some files have been moved in order to make the changes required simple.

So let's dive into how we'll write a new system call. Firstly you will need the kernel sources and then you'll need to change the following file(s). I've tested this on a 32 bit system however 64 bit should be similar. 

Making changes to the sources 

 

  1. Locate the directory, arch/x86/syscalls. This directory is having he syscall table files for both 32 and 64 bit.
  2.  Open the file syscall_32.tbl for 32 bit and same way syscall_64.tbl for 64 bit.
  3. You can see all the system calls listed here with their number on the extreme left. The format is also shown in the first line of the file. We won't discuss about which ABI to use but for 32 bit you can use i386.
  4. At the end of this table (The last one listed would be #350 sys_finit_module). Now you'll need to add your system call here. Remember to set the correct number and the name. You won't need to supply a compat_ version of this. So you'll be having 3 entries like shown below


348     i386    process_vm_writev       sys_process_vm_writev           compat_sys_process_vm_writev
349     i386    kcmp                    sys_kcmp
350     i386    finit_module            sys_finit_module
351     i386    pks_first_call          sys_pks_first_call


So the above change shows that I added the system call named, pks_first_call however the entry point is going to be sys_pks_first_call. If you use the SYSCALL_DEFINE* macros then those macros would add sys_ to the name of the function hence the entry point is named that way.

Adding the system call code

To add the system call code, we'll create a new directory in the arch/x86 directory and then modify the Kbuild , top-level file to compile our new system call. So let's start by creating the required files first.

mkdir /usr/src/linux/arch/x86/pks_first

Now in this directory create a new file. I named the file same as my system call name however you can choose anything you want. So my file looks like as shown below,


#include <linux/kernel.h>
#include <linux/syscalls.h>

SYSCALL_DEFINE0(pks_first_call)
{
        printk (KERN_INFO "Inside %s",__FUNCTION__);
        return 0;
}

As you can see I've used SYSCALL_DEFINE0 since there's no argument for our system call. There are other versions of SYSCALL_DEFINE* so you are encouraged to use those since they also create ftrace meta data holder in addition in case CONFIG_FTRACE is enabled.

In the same directory you'll also need to have a Makefile which does nothing but tells which files need to be compiled so it's very simple as shown below,


obj-y           += pks_first_call.o

Changing the Top-Level Kbuild

All we need to do now is change the top level KBuild file. This would be the one located in arch/x86/Kbuild. The listing is as shown below,

ifneq ($(CONFIG_XEN),y)
obj-y += realmode/
endif
obj-y += kernel/
obj-y += mm/

obj-y += crypto/
obj-y += vdso/
obj-$(CONFIG_IA32_EMULATION) += ia32/

obj-y += platform/
obj-y += net/
obj-y += pks_first/

I added my directory in the top level KBuild at the end. The final change we need to do is let the kernel know about the system call. So we need to change the following file to finish all our changes
include/linux/syscalls.h. The change looks like as shown below,

asmlinkage long sys_kcmp(pid_t pid1, pid_t pid2, int type,
                         unsigned long idx1, unsigned long idx2);
asmlinkage long sys_finit_module(int fd, const char __user *uargs, int flags);
asmlinkage long sys_pks_first_call(void);
#endif  

In the above listing I've added my system call at the end right before #endif. We don't need to change the __NR_syscalls it'll be taken care of by the build system for x86. You can build and install your new kernel and check your new system call with a simple program as shown below.


#include <stdio.h>
#include <syscall.h>
#include <errno.h>
#define NR_pks_first_call 351
int main()
{
        if (syscall(NR_pks_first_call)) {
                perror("OOPS:");
        } 
        return 0;
}

When you run the above program you shouldn't get any message but if you look in dmesg output or your system log file usually /var/log/messages you should be able to see the message posted by our system call. This was quite a lot of information and next time we'll see other ways we can interact with kernel using something simpler instead of recompiling and installing the whole kernel.