Counting the bits set: processor can do best

Processor can do best

This is where we address the fundamental point, number 1 on the list of issues with a naive implementaion for counting the bits set. The processor/hardware can do that best, because it’s simpler circuitry compared with other operations, even compared with just simple addition of two numbers.

AMD/Intel processors have a POPCNT instruction since arround 2008.

C/C++ compilers have intrinsics that map to the assembly instruction if available or otherwise to some optimized algorithm as we’ve seen in the previous posts. The Microsoft compiler has a __popcnt intrinsic, Java has a bitCount method with similar behaviour.

Here is the gcc intrinsic:

1
2
3
int count_bits_6(unsigned int x) {
  return __builtin_popcount(x);
}

Interestingly some compilers recognise patterns like the Kernigan loop and replace it with POPCNT

Usage

Counting the bits set is a very niche task, I think. For example it’s useful for some interesting data structures. But I never had to actually do it, except in an interview scenario where the interviewer was looking for the naive loop. In that situation my first reaction was “you google for a solution” which made the interviewer believe I tried to avoid the answer.

References

CppCon 2017: Matt Godbolt “What Has My Compiler Done for Me Lately? Unbolting the Compiler’s Lid”
https://www.youtube.com/watch?v=bSkpMdDe4g4

C++Now 2017: Phil Nash “The Holy Grail!? A Persistent Hash-Array-Mapped Trie for C++”
https://www.youtube.com/watch?v=WT9kmIE3Uis

Sean Eron Anderson “Bit Twiddling Hacks”
https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive

Microsoft C++ compiler intrinsic __popcnt
https://msdn.microsoft.com/en-us/library/bb385231.aspx

Java intrinsic Integer.bitCount
https://docs.oracle.com/javase/7/docs/api/java/lang/Integer.html#bitCount%28int%29