I decided to publish the source code of my camshift implemenation via the spark project community. Many thanks to Yossy and all the team. Editing wiki pages with a japanese interface was a funny but also a bit stressing experience ;-) . So you can find the sources here. I called it FaceIt ;-) . I will try to improve it little by little, there is a lot of optimizations which could be done.

I will try also to explain a bit here how it works for those who want to have a better understanding of the algorithm or just for those who have nothing to do today ;-) .

As I said into my first post, Camshift is a object color based tracking algorithm. You use a search window (a rectangle area) at each frame of an image sequence that will try to find and to follow the tracked object according its position found into the previous frame. Actually the main idea is to compare two color histograms: the one of the tracked object and the one of the area defined by the search window on the current image .

Color histogram is just a way to represent colors distribution into an image. You divide the colors space into color ranges called bins and you count how many pixels belong to each bin according its color.  I used a 4096 bins histogram for my implementation. I just divided  each color channel by 16 (256/16 = 16 and 16×16x16 = 4096, the count is good ! :-) ). Then to achieve the histogram comparison camshift uses a method called  histogram back projection. The process searches the probability that a colors range “belongs” to the tracked object. The probability for each histogram bin is givened by the following formula that I used:


Again a wonderful math formula which actually is not that hard to understand. For each histogram bin (i)  we do a ratio between the number of pixels of the model by the number of pixels of the current analyzed image.  So more you have pixels  into the current analyzed image which belong to one histogram bin, more the ratio is closed to zero and more the probability is low. In other words, a lot of pixels  can “belong” to the tracked object and you can not really know which one and  be sure at 100%.   You can do the same kind of interpretation with high probabilities. After computing the probabilities, for each pixel into the search area and according their color,  you “back project” the probability value you just found before . It gives this kind of gray-scale image which shows for each pixel a kind of weight to “belong” to the tracked object:


Into the back projection image more a pixel is closed to the white more it probably “belongs” to the tracked object . Notice that for my part i did not used a gray-scale bitmapdata but just used an array structure to represent the probabilities distribution. After computing the probabilities distribution the algorithm tries to locate the maximum of density. To simplify the algorithm just tries to locate the center of the biggest “light grey” area. To achieve this goal camshift used the meanshift algorithm. Meanshift is an iterative method which tries to converge on something using previous iteration results.   I made this little animation to show the meanshift iteration. At each iteration you fin the center of the probability distribution inside the search window and then you move the search window to the found center and so on. The start position of the search window is the position of the search window into the last analysed image.

The Flash plugin is required to view this object.

After locating the maximum of density of the probabilities distribution, Camshift uses the 2nd order moment to retrieve the size and the orientation of the  distribution. Actually camshift is almost the same algorithm than Meanshift algorithm, except Bradski added a smart step at the end. Indeed at each new frame the search windows size is rescaled according to the size of the distribution found just before. That is  why camshift means Continuously Adaptive Meanshift. Like that the tracking accuracy is really better because during an image sequence the tracked object can be sometimes smaller or bigger than the original model according its position to the camera. Into the original camshift Bradski use the area of the shape to find the new search window size. To my part I just take a 10% bigger window.

Notice that in order to make it computationally efficient I needed to reduce my source image size by 4, because  otherwise 2D-loop I used killed the cpu. I  guess in that way I lose a lot of accuracy, even if the result is not that bad.

I will try for the next version to do this improvements :

  • Using HSV space color instead of RGB. Better I guess for brightness change.
  • Separating contours pixels in the histogram to give them an other weight in the probability distribution and to have a better differencing between the tracked object and the background.
  • Using maybe vector instead of array. I talked a bit with Thibault about it. He did a nice benchmark on this topic.
  • Doing some pre-processing like blurring on analyzed image or back projection image.

Maybe some of you will have good ideas or will  find better way to implement it. I am opened to all suggestions and I hope Thibault will be able to dance faster and better on the next release ;-) .

Tags: , ,
11 Responses to “Camshift: going to the source”
  1. alex says:

    you saved my day. i almost started writing the algorithm myself and then i found yours. Really great work! Thanks for sharing. There are some performance tricks you could use to rapidly increase the performance. e.g. linked lists instead of arrays – while instead of for loops. type casting… joa ebert is an expert on that http://wiki.joa-ebert.com/index.php/Main_Page

    once again thanks for sharing!

  2. Benjamin says:

    @Alex: Thanks Alex. Well I have already read the awesome work of Joa about loops optimization. Using linked list could be fast but I have not really a static data structure, yet maybe it is worth trying it. I know that Joa found a solution to avoid 2D loop, which as I already said are really cpu killer in the FaceIt library, in converting them into 1D loop. I externalized all the variables I used inside the loops, and It was really faster. I am not sure that the difference between while and for loop would be really visible, but it is true I have not tried yet. Next time :-)

  3. igor baloste says:


    I search for AS3 for flash or .xml for max/msp hand detection

    do you know where I can find the files ?

    thanks a lot and ‘bravo” for your work,


  4. Akshay says:

    man..you rock..it’s really cool..i’m trying to change the window size as a function of zeroth moment

  5. Akshay says:

    i tried to use HSV color model but the result is not so good.

  6. Valery says:


    Nice job! I have implemented a game with your library. You can play here http://vapes.na.by/fish/
    Library still unstable… Sometimes I get errors and sometimes object losts. But I believe you will do this library better ;)


  7. Valery says:

    Look this video http://www.youtube.com/watch?v=pMJS31S_eE4
    I’ve connect your library with game

  8. [...] (Papervision 3D was also ruled out due to performance issues), so I kept looking and stumbled onto a Flash port of the CAMSHIFT or MEANSHIFT algorithm.  With it, you sampled a region of color, and it could track it as a blob that moved around [...]

  9. [...] a little further research, I stumbled onto a Flash port of CAMSHIFT, where you sampled a region of color by drawing a box around it, and it could track that color [...]

Leave a Reply