algorithm - How to count the number of set bits in a 32-bit integer? -


8 bits representing number 7 this:

00000111 

three bits set.

what algorithms determine number of set bits in 32-bit integer?

this known 'hamming weight', 'popcount' or 'sideways addition'.

the 'best' algorithm depends on cpu on , usage pattern is.

some cpus have single built-in instruction , others have parallel instructions act on bit vectors. parallel instructions (like x86's popcnt, on cpus it's supported) fastest. other architectures may have slow instruction implemented microcoded loop tests bit per cycle (citation needed).

a pre-populated table lookup method can fast if cpu has large cache and/or doing lots of these instructions in tight loop. can suffer because of expense of 'cache miss', cpu has fetch of table main memory.

if know bytes 0's or 1's there efficient algorithms these scenarios.

i believe general purpose algorithm following, known 'parallel' or 'variable-precision swar algorithm'. have expressed in c-like pseudo language, may need adjust work particular language (e.g. using uint32_t c++ , >>> in java):

int numberofsetbits(int i) {      // java: use >>> instead of >>      // c or c++: use uint32_t      = - ((i >> 1) & 0x55555555);      = (i & 0x33333333) + ((i >> 2) & 0x33333333);      return (((i + (i >> 4)) & 0x0f0f0f0f) * 0x01010101) >> 24; } 

this has best worst-case behaviour of of algorithms discussed, efficiently deal usage pattern or values throw @ it.


this bitwise-swar algorithm parallelize done in multiple vector elements @ once, instead of in single integer register, speedup on cpus simd no usable popcount instruction. (e.g. x86-64 code has run on cpu, not nehalem or later.)

however, best way use vector instructions popcount using variable-shuffle table-lookup 4 bits @ time of each byte in parallel. (the 4 bits index 16 entry table held in vector register).

on intel cpus, hardware 64bit popcnt instruction can outperform ssse3 pshufb bit-parallel implementation factor of 2, if compiler gets right. otherwise sse can come out ahead. newer compiler versions aware of popcnt false dependency problem on intel.

references:

https://graphics.stanford.edu/~seander/bithacks.html

https://en.wikipedia.org/wiki/hamming_weight

http://gurmeet.net/puzzles/fast-bit-counting-routines/

http://aggregate.ee.engr.uky.edu/magic/#population%20count%20(ones%20count)


Comments

Popular posts from this blog

javascript - how to protect a flash video from refresh? -

android - Associate same looper with different threads -

visual studio 2010 - Connect to informix database windows form application -