I decided to publish the source code of my camshift implemenation via the spark project community. Many thanks to Yossy and all the team. Editing wiki pages with a japanese interface was a funny but also a bit stressing experience . So you can find the sources here. I called it FaceIt . I will try to improve it little by little, there is a lot of optimizations which could be done.
I will try also to explain a bit here how it works for those who want to have a better understanding of the algorithm or just for those who have nothing to do today .
As I said into my first post, Camshift is a object color based tracking algorithm. You use a search window (a rectangle area) at each frame of an image sequence that will try to find and to follow the tracked object according its position found into the previous frame. Actually the main idea is to compare two color histograms: the one of the tracked object and the one of the area defined by the search window on the current image .
Color histogram is just a way to represent colors distribution into an image. You divide the colors space into color ranges called bins and you count how many pixels belong to each bin according its color. I used a 4096 bins histogram for my implementation. I just divided each color channel by 16 (256/16 = 16 and 16×16x16 = 4096, the count is good ! ). Then to achieve the histogram comparison camshift uses a method called histogram back projection. The process searches the probability that a colors range “belongs” to the tracked object. The probability for each histogram bin is givened by the following formula that I used:
Again a wonderful math formula which actually is not that hard to understand. For each histogram bin (i) we do a ratio between the number of pixels of the model by the number of pixels of the current analyzed image. So more you have pixels into the current analyzed image which belong to one histogram bin, more the ratio is closed to zero and more the probability is low. In other words, a lot of pixels can “belong” to the tracked object and you can not really know which one and be sure at 100%. You can do the same kind of interpretation with high probabilities. After computing the probabilities, for each pixel into the search area and according their color, you “back project” the probability value you just found before . It gives this kind of gray-scale image which shows for each pixel a kind of weight to “belong” to the tracked object:
Into the back projection image more a pixel is closed to the white more it probably “belongs” to the tracked object . Notice that for my part i did not used a gray-scale bitmapdata but just used an array structure to represent the probabilities distribution. After computing the probabilities distribution the algorithm tries to locate the maximum of density. To simplify the algorithm just tries to locate the center of the biggest “light grey” area. To achieve this goal camshift used the meanshift algorithm. Meanshift is an iterative method which tries to converge on something using previous iteration results. I made this little animation to show the meanshift iteration. At each iteration you fin the center of the probability distribution inside the search window and then you move the search window to the found center and so on. The start position of the search window is the position of the search window into the last analysed image.
After locating the maximum of density of the probabilities distribution, Camshift uses the 2nd order moment to retrieve the size and the orientation of the distribution. Actually camshift is almost the same algorithm than Meanshift algorithm, except Bradski added a smart step at the end. Indeed at each new frame the search windows size is rescaled according to the size of the distribution found just before. That is why camshift means Continuously Adaptive Meanshift. Like that the tracking accuracy is really better because during an image sequence the tracked object can be sometimes smaller or bigger than the original model according its position to the camera. Into the original camshift Bradski use the area of the shape to find the new search window size. To my part I just take a 10% bigger window.
Notice that in order to make it computationally efficient I needed to reduce my source image size by 4, because otherwise 2D-loop I used killed the cpu. I guess in that way I lose a lot of accuracy, even if the result is not that bad.
I will try for the next version to do this improvements :
- Using HSV space color instead of RGB. Better I guess for brightness change.
- Separating contours pixels in the histogram to give them an other weight in the probability distribution and to have a better differencing between the tracked object and the background.
- Using maybe vector instead of array. I talked a bit with Thibault about it. He did a nice benchmark on this topic.
- Doing some pre-processing like blurring on analyzed image or back projection image.
Maybe some of you will have good ideas or will find better way to implement it. I am opened to all suggestions and I hope Thibault will be able to dance faster and better on the next release .Tags: camshift, face tracking, FaceIt