Thread-safe? Really?

On Linux (and probably other platforms), VLC media player has been plagued by thread-safety issues. First, the VLC code base was full of race conditions. We put a lot of efforts into fixing most of those issues in VLC 0.9, and kept going since then.

With VLC 1.1, we finally got rid of the Xlib-based rendering in favor of XCB, and also fixed the (still Xlib-based) skin engine. In Ubuntu alone, this should address several tens of bug reports.

Now, I have started looking at our underlying libraries as well...

...and it is not a pretty sight.

Operational mode

I could have used some powerful race conditions analysis tool such as valgrind's helgrind. But instead, I simply looked for a few well-known unsafe function calls. This is actually quite a trivial thing to do. The way link resolution works with the ELF executable file format, I simply needed to define those functions in the main program: in this case, the vlc binary. Using LD_PRELOAD would constitute a more involved but reusable approach. A simpler alternative is to run the program under a debugger and use break points, but it is not so convenient.

The faulty functions

This far, I only took a limited set of functions as follows:

putenv, setenv, unsetenv
The environment is shared with all threads. Any modification can potentially cause a crash or incorrect behavior. Really, there is no excuse for libraries to do this. If you need to change the environment for a child process, you can do it after fork() and before exec*(). If you need to retain a value process-wide, global variables (and a mutex) are safer and faster.
Changing the locale is not thread-safe. It modifies some shared process data which may be used from various C run-time functions, such as error messages, time/date, translation (gettext), floating point numbers formatting (printf) and parsing (scanf), character types (<ctype.h>) and wide to multi-byte characters conversion functions.
To change locale parameters in a thread-safe fashion, POSIX defined the newlocale(), uselocale() and freelocale()/ functions. They are also found in <locale.h>, except on MacOS/Darwin in <xlocale.h>. As an example, this can be useful to parse or format floating point numbers in american format regardless of the user-selected locale.
While some systems make it thread-safe, GNU/libc does not. This can lead to crashes - been there, done that.
strerror_r() must be used instead. In some cases, perror() is simpler. Also note that syslog() supports a special specific '%m' that expands to strerror (errno) but is thread-safe. With GNU/libc, all printf-like functions also support '%m'.
signal, sigaction
Changing the signal handlers is thread-safe technically speaking, but there is no sane way to do it, at least from a library. If another thread changes the signal handlers, a conflict occurs and incorrect behavior ensues. Only the main program should modify signal handlers.
To catch SIGCHLD, waitpid() can be used in blocking mode. In non-blocking mode, in an event loop, a dedicated thread will need to call waitpid().
To avoid SIGPIPE when writing to sockets, the MSG_NOSIGNAL flag can be used.
rand, srand
The ISO C pseudo-random number generator is not thread-safe. rand_r() is the most direct thread-safe substitution, but POSIX marked it as deprecated. random() (and srandom()), or nrand48 constiture better alternatives.
drand48*, lrand48*, mrand48*, seed48*
erand48(), nrand48 and jrand48 can be used instead.
getpwnam*, getpwuid*, getgrnam*, getgrgid*
All those functions use a static per-process buffer. All four of them have safe variants ending in _r
Unfortunately, there is no thread-safe replacement for this function. To mitigate the problem, it really should only be called when an actual errors occurs, i.e. dlopen or dlsym returned NULL. But even that would not be completely safe.
gethostbyname*, gethostbyaddr*, getservbyname*, getservbyport*
Use getaddrinfo and getnameinfo instead.

*: not yet checked in VLC.

Faulty libraries

There were not many libraries that did not exhibit any obvious problem this far. This does not imply that there are no issues, only that I found none:

And there were quite a bunch of nasty surprises:

N.B.: Those lists are very incomplete.