Diving into the Linux kernel and making a kernel module

My goal for the next few days/weeks is to dive deeper into the Linux kernel and learn how it works, and I will document whatever I make here. Why? Because I can! So, first of all, what is the kernel? The kernel is a little program, usually named **vmlinuz-[version number]**. It has around 5Mb and resides comfortably in your `/boot` directory, if you are using linux. This program gets loaded by a bootloader (one of the most popular is **Grub**), which I will learn more later on. The bootloader will pass parameters to the kernel, and in return, the kernel will provide an API, to which we can make System Calls - usually, done by the Standard C Library (in this post I built the GNU C Library from source code, if you want to see it). Another way to interact with the kernel, aside from its API, is through a virtual file system - I am planning to make another post about this later on. So, what does the kernel do anyway? The kernel is a layer that sits between the hardware and the Standard C Library, it provides a layer that helps us interact with the hardware, peripheral devices, allocate memory, and so on. It also enforces privileges in order to tell if an operation is allowed or not. Some CPU instructions can only be done by the kernel, and not by any software that sits on top of it. Putting it in simple terms, the kernel is an abstraction layer on top of the hardware: an application calls a function in the Standard C Library, which calls the kernel, which interacts with the hardware. Another important detail is that the kernel is a not a huge program, it has just 5Mb, and has only the essential pieces to support the Operating System. However, we can expand its functionalities through modules - **Loadable Kernel Modules** (**LDM**s). These modules are normally used by device drivers, they get "linked" into the kernel so they run in the same scope. We can add and remove modules in the kernel at runtime, and this is what I am going to do in this post. Modules are built for a specific kernel version, and are conveniently installed in `/lib/modules/[your kernel version]`. If you use the command `uname -r`, you can get the name and version of your kernel. Before I show you the module, some useful commands: ##### Useful commands + **lsmod** Will give you a list of all loaded modules you have. This is one of the lines I got: ``` usbcore 208896 9 uvcvideo,usbhid,snd_usb_audio ``` These columsn are, respectively: name of the module, size, by how many things (I don't know exactly what they are) it is used, and its dependencies (there were more dependencies there, but I removed some) + **modinfo** Will give you a detailed description of a module. This is the description for usbcore: ```shell filename: /lib/modules/4.11.9-1-ARCH/kernel/drivers/usb/core/usbcore.ko.gz license: GPL alias: usb:v*p*d*dc*dsc*dp*ic09isc*ip*in* alias: usb:v*p*d*dc09dsc*dp*ic*isc*ip*in* alias: usb:v05E3p*d*dc*dsc*dp*ic09isc*ip*in* depends: usb-common intree: Y vermagic: 4.11.9-1-ARCH SMP preempt mod_unload modversions parm: usbfs_snoop:true to log all usbfs traffic (bool) parm: usbfs_snoop_max:maximum number of bytes to print while snooping (uint) parm: usbfs_memory_mb:maximum MB allowed for usbfs buffers (0 = no limit) (uint) parm: authorized_default:Default USB device authorization: 0 is not authorized, 1 is authorized, -1 is authorized except for wireless USB (default, old behaviour (int) parm: blinkenlights:true to cycle leds on hubs (bool) parm: initial_descriptor_timeout:initial 64-byte descriptor request timeout in milliseconds (default 5000 - 5.0 seconds) (int) parm: old_scheme_first:start with the old device initialization scheme (bool) parm: use_both_schemes:try the other device initialization scheme if the first one fails (bool) parm: nousb:bool parm: autosuspend:default autosuspend delay (int) ``` + **insmod** Used to insert (load) modules from a path + **rmmod** Used to remove modules + **modprobe** Used to load/unload modules. It is more powerful than insmod and rmmod, and can handle dependencies ##### Making the module A module is nothing more than a C program. It basically starts as two functions: one for initialising, and one for exiting (in case you have some cleanup to do). ```c #include #include int initmodule(void) { return 0; } void exitmodule(void) { } module_init(initmodule); module_exit(exitmodule); ``` This is the basic structure. Notice how we are importing two headers: **linux/init.h** and **linux/module.h**. These are headers that will give us some parts of the API to use in the kernel! Another interesting thing is: why am I not importing **stdio.h**? Well, this is because we are in a different layer: this program will not sit on top of the kernel and the Standard C Library - it is executed WITH the kernel, so there is no stdio.h! Alright. I'm going to add a little more stuff to this code now. How about some documentation about the module? ```c MODULE_AUTHOR("Henrique S. Coelho"); MODULE_DESCRIPTION("A completely useless kernel module"); MODULE_LICENSE("GPL"); ``` It would also be nice to add some functionality to it. How about this: + The module will ask your name, and will greet you like **Hello Joe!**; if no name is provided we will assume the name "there" so it will be displayed as **Hello there!** - hacky, I know! I love it. + The module will ask how many times you want this message to be printed. The default will be 5 + The module will ask if it should say "goodbye" when it exits. The default is true These options will be passed as arguments to the module. ```c // Default arguments static int repeats = 5; static char *name = "there"; static bool saybye = true; // default value, type, and permission // S_IRUGO = value is read only module_param(repeats, int, S_IRUGO); module_param(name, charp, S_IRUGO); module_param(saybye, bool, S_IRUGO); ``` After making the logic, this is how our module looks like: ```c // mymodule.c #include #include static int repeats = 5; static char *name = "there"; static bool saybye = true; // S_IRUGO = value is read only module_param(repeats, int, S_IRUGO); module_param(name, charp, S_IRUGO); module_param(saybye, bool, S_IRUGO); MODULE_AUTHOR("Henrique S. Coelho"); MODULE_DESCRIPTION("A completely useless kernel module"); MODULE_LICENSE("GPL"); int initmodule(void) { unsigned short i; for (i = 0; i < repeats; i++) printk("Hello %s! ", name); return 0; } void exitmodule(void) { if (saybye) printk("Bye bye! "); } module_init(initmodule); module_exit(exitmodule); ``` Another detail you may have noticed: what is **printk**? This is a function used by the kernel to print messages (no, no **printf** here). These messages will be directed to the buffer of the kernel. Awesome! Now, how do we compile this? To compile this thing, we will make a **Makefile** in this directory. It should contain this line: ```makefile obj-m := mymodule.o ``` **mymodule.o** is the name of my module (conveniently called mymodule.c) after it is compiled. We will not run this Makefile - the kernel will. We will use a Makefile from the kernel, which will use this Makefile to compile the module. Confusing? Yes, it is. This is how we call the Makefile of the kernel: ```shell $ make -C /lib/modules/`uname -r`/build M=$(PWD) modules ``` Some explanation: + **-C /lib/modules/`uname -r`/build** Tells `make` where the Makefile is. The makefile for the kernel is located in `/lib/modules/[my kernel version]/build` - I used the command `uname -r` as a shortcut to get the name and version of my kernel + **-M=$(PWD)** Tells make where to build the module. In this case: in my current directory + **modules** Tells which section of the Makefile to execute (remember `make install`?). We are telling `make` to make a module So, again: we are executing the kernel's Makefile, which will execute the Makefile of our project. Now, I like to automate things, so I made this Makefile instead: ```makefile all: mymodule.c make -C /lib/modules/`uname -r`/build M=$(PWD) modules clean: make -C /lib/modules/`uname -r`/build M=$(PWD) clean obj-m := mymodule.o ``` Nice. I can just call `make` and it builds the module. `make clean` will clean the directory. After running it, I get this lovely output: ```shell make -C /lib/modules/`uname -r`/build M=/home/hscasn/Desktop/kmodule modules make[1]: Entering directory '/usr/lib/modules/4.11.9-1-ARCH/build' CC [M] /home/hscasn/Desktop/kmodule/mymodule.o Building modules, stage 2. MODPOST 1 modules LD [M] /home/hscasn/Desktop/kmodule/mymodule.ko make[1]: Leaving directory '/usr/lib/modules/4.11.9-1-ARCH/build' ``` No erros, unlike this sentence! Cool. So, we got a **.ko** file. This stands for **Kernel Object**, and this is our module. Before I execute it, I will open a terminal and type the following command: ```shell $ dmesg -w ``` This command prints the message buffer from the kernel (our messages will be there). The **-w** option will make the command wait and print new lines as they come. Now we can finally load the module: ```shell $ sudo insmod ./mymodule.ko ``` Immediately, this pops up in my other terminal (with `dmesg` running): ```json [23430.693089] Hello there! [23430.693090] Hello there! [23430.693090] Hello there! [23430.693090] Hello there! [23430.693091] Hello there! ``` It is alive! Let's try unloading the module: ```bash ~/kmodule $ sudo rmmod mymodule ``` Result: ```json [23434.868493] Bye bye! ``` Now I will try with other arguments: this time my name will be "Joe", I want the message to be printed one time, and I do not want the module to say goodbye: ```bash ~/kmodule $ sudo insmod ./mymodule.ko repeats=1 name=Joe saybye=0 ~/kmodule $ sudo rmmod mymodule ``` Output: ```json [23448.108956] Hello Joe! ``` Again. This time, my name will be David, I want the message to be repeated 3 times, and I want a goodbye message: ```bash ~/kmodule $ sudo sudo insmod ./mymodule.ko name=David repeats=3 saybye=1 ~/kmodule $ sudo rmmod mymodule ``` And the output: ```json [23468.661605] Hello David! [23468.661605] Hello David! [23468.661605] Hello David! [23472.709457] Bye bye! ``` Today was a good day.