A Better Shell One-Liner For Most Common Includes

After reading some comments on my previous post I decided to write a better version that counts includes per compilation unit instead of per file.

It turns out that gcc/cpp has an option -H which dumps out a tree of includes in a compilation unit, which means that we don't have to do any parsing ourselves.

The output looks like this:

$ cpp -H main.c 2>&1 >/dev/null
. /usr/include/stdio.h
.. /usr/include/bits/libc-header-start.h
... /usr/include/features.h
[[ more header files... ]]
. hello.h
Multiple include guards may be useful for:
/usr/include/bits/libc-header-start.h
[[ more header files... ]]

The -H info is printed to stderr, so we pipe that to stdout and then throw away the normal cpp output. We can filter out the extra info using awk:

$ cpp -H main.c 2>&1 >/dev/null \
 | awk '/^\.+/ { print $2 }'
/usr/include/stdio.h
/usr/include/bits/libc-header-start.h
/usr/include/features.h
[[ more header files... ]]
hello.h

Putting that with the script from a couple of weeks ago looks like this:

find -name '*.c' \
 | xargs gcc -H 2>&1 >/dev/null \
 | awk '/^\.+/ { print $2 }' \
 | sort \
 | uniq -c \
 | sort -nr

One new problem with this approach is that gcc needs to know where to look for header files, so headers outside of places like /usr/include need to be passed in as an argument. A possible solution is to use a compile_commands.json file, which is what clang tools do, but that's a problem for another day :)

#bash #c #shell