I started writing the program in february 2004. Back then, I had two months to get a working Teredo server and relay implementation for GNU/Linux, while I was otherwise attending the university, and deeply involved in management of my campus computer network.
At that time, the only published open-source implementation was ng_teredo, running on FreeBSD 4, written by Konstantin Kabassanov (LIP6) and Vincent Jardin (6WIND). Since it depends heavily on netgraph, I saw no point in trying to port it - it would have been as much difficult if not more that starting from scratch.
There was actually another implementation, NICI Teredo, for the Linux kernel version 2.4, written by S.-M. Huang, Q.-C. Wu, and Y.-B. Lin (NTCU), but it wasn't published or even clearly advertised then. They made their first public release ony in December 2004, 10 months later - in the mean time, I would have had released 7 different versions of Miredo.
It was obvious that writing a userland implementation was the only sensible option, given I had hardly any experience of kernel programming, and that it would have involved extra steps to run and debugging. The focus was on getting working code at all, not on performance or anything fancy.
Nowadays, I have no plan on moving Miredo into kernel space. There are several advantages to userland. Some of theses below are well-known at a time when the monolithic-vs-micro kernels debate is hot. Some might be more specific. Of course, if someone wants to make the effort to move Miredo into kernel space, (s)he will still be welcome, but that will not be me.
Userland is safer. If a program crashes, it will only kill the Teredo tunnel (and it should actually automatically restart and recover in most cases). And if it enters an infinite loop, it will not crash the whole system.
Userland allows running the program with very restricted permission, which limits the potential impact of any security compromise. Miredo can run setuid to a non privileged user; it also has support for chroot and POSIX capabilities.
If Miredo has a remote code execution flaw (and is properly chrooted), exploitation could at best compromise the IPv6 connectivity, and the system service. Bad guys will not be able to access or alter data, they will neither be able to install a rootkit, unless of course, there is a local vulnerability in another software.
It is way more portable. Only one component, libtun6, is not fully independant of the underlying OS. It is a wrapper around the OS layer 3 userland tunnel driver. Even then, a certain chunk of that code is common to all kernels, and the rest is mostly only implemented once for Linux, and once for all BSD variants. At the moment, Miredo can run on :
A kernel implementation would be highly dependant on the kernel and its major version. ndeed ng_teredo supports FreeBSD 4. FreeBSD 5 support was on the way, but it apparently never got into a public release.
NICI Teredo only runs on Linux 2.4. It is officially written for version 2.4.22 (as of version 0.5).
It is easier to install. On most of the suppported platform, “./configure; make; su -c "make install; miredo"” will start the program. Packaging binary builds of userland programs is also much easier, since it is not dependant only on the build architecture, rather the kernel version.
NICI Teredo also claims ease of use. It is a Linux kernel module (LKM), so it can safely be compiled separately from the kernel. However, it is not compatible with kernel version 2.6, which is the most common nowadays. Also, it will remain challenging to distribute as a binary package, because LKMs depend on a specific kernel version, when compiled.
I have not been able to compile it against the recommended kernel version (2.4.22) so far.
Teredo is not as trivial a protocol as 6to4 and most point-to-point IPv6-over-IPv4 tunnels. Since August 2004, Miredo can run as a Teredo client; none of the other existing open-source (kernel) implementations support this.
Teredo is a stateful protocol. The tunneling software has to maintain a list of peers, with contact informations, expiration timers, etc. Unfortunately, kernel memory cannot be wasted, which limits the number of simultaneous peers.
At the moment, ng_teredo (1.13) has an hard-coded limit of 2048 peers stored in a fixed-size table. For each processed packets, it has to look up through the table, until a match is found.
NICI Teredo has an hard-coded limit of 256 peers. It uses hashing to look up entries fast. Unfortuntaly, as currently implemented, that will not work properly if two entries collide (namely, if the last byte of their IPv6 is identical). As such, statistically, at any given time, there is 50% chance that it breaks with 129 simultaneously peers, 5% chance with only 14, (peers expires every 30 seconds), and obviously 100% chance with 257 peers.
Following on actual stress tests (libterdo/test/libteredo-stresslist), the hard-coded limit of Miredo is currently one million entries (given enough memory on the system - 64Mbytes are ok), and uses libjudy for extremely fast lookups, even with huge numbers of peers, and independant of any lame hashing algorithm. Memory usage is also adjusted depending on run-time requirements. When libjudy is not available, it falls back to a chained-list lookup, which is similar to the mechanism used by ng_teredo.
Of course, kernel space implementations could be improved, and will be improved if people actually use them. But they have no chance of competing with userland limitations with regard to available memory.
Running a tunnel in userland implies extra overhead in packets handling, while the kernel copies packets from kernel to user memory, and back.
In practice, Miredo can handle more than 317 Mbits/s both ways
simultaneously on my 1.3 GHz PC. This represents 634 Mbits/s IPv6 traffic
and even more IPv4 encapsulated traffic, or more than 30,900 packets/s in
each direction. In other words, it can handle about
634 Mbits/s or 61,800 packets/s (Teredo limits packet size to 1280 bytes).
Also, the packets size has some impact: With 48 bytes packets (smallest possible IPv6 ping), packets/s throughput rises by 34% (up to 40,000 pkt/s), while effective bandwidth collapses. That really means most of the time is spent in per-packet processing, which has constant cost per packet, rather than packet memory copying whose duration is proportional to the size of each packet.
It should also be noted that during these tests, the stress test program (also userland) was consuming around 60% of CPU cycles, while miredo was left with the rest. As such, Miredo should be able to exchange much more real traffic on the same host.
Ideally, the test would be done with the stress test program on a separate more powerful machine, whose CPU would not be saturated, and that additionnal machine would be attached via a Gigabit link (so that network would not be the bottleneck) to the running system. That system would run alternatively Miredo and some kernel-only stuff. Unfortunately, I do not have such hardware at hand, but I am definitely interested in the results.
As a reference, I have also ran some stress test with ping6 from iputils on that same PC. It would switch about 25,500 packets (both ways) of 1280 bytes each, or 28,500 packets of 48 bytes per seconds (with more than 95% CPU dedicated to it). But given ping6 is itself running in userland on the same host, this test is somewhat biased.
The authors of NICI Teredo have published a paper on fast Teredo relaying through IEEE. The paper might now finally be freely available. However, I suspect the tests where done with obsolete an version of Miredo; older versions would perform some extra userland memcpy’s, that were eliminated since then. In fact, running the same test with fairly obsolete miredo version 0.4.2, yields figures about 20% below the ones above.
Note that this IEEE paper is based on latency, rather than actual packet switching or bandwidth, which I find a bit weird: In my opinion, an additionnal delay of a few microseconds is completely neglectible when the actual round-trip time between Internet hosts is typically the order of a few to several hundreds milliseconds.
As for the difference between their (NICI Teredo) implementation and ng_teredo, it might be partially due to NICI Teredo’s as fast as lame hashing algorithm, or to varying degree of optimization of the Linux and FreeBSD kernels.
It is obvious that both ng_teredo and NICI Teredo need a serious rework of their respective way too simplistic peers list handling, so that they are actually usable. That the most polished open-source implementation is the userland one, might be a bit surprising; on the other hand, it is also the most recent one, and the only one that still appears to be actively maintained as of May 2006. Similarly any new would-be in-kernel implementation would definitely have to handle its peers list in a more elaborate manner (and no, you cannot get rid of the peers list - I have been thinking a long time about ways to do this, but it is not really possible within the Teredo architecture).
It would also be nice to have NICI Teredo run on Linux 2.6.
Miredo probably deserves a finer grained locking, and some profiling (but this has proved to be quite difficult) to reduce its latency.
And obviously, a more comprehensive and up-to-date study than this one and that of NTCU (NICI guys) would really be cool.