I've got my own C++ OpenCV (versions 2.4 through to 3.1) tracking and counting libraries. I've got most of what you are asking for in place, now, already. Facial recognition would be the major missing feature to add.
My trackers and my counters use many different techniques to do the job. For example, I've got low-CPU counters that use pixel flow to count, and other techniques that use various motion-blob tracking to count. The trackers use a muti-technique approach: MIL, TLD, compressive tracking, and more. Also there are detector-based counting techniques my libraries can use. This is because there are so many different types of video: resolution, frame-rate, camera angle, lighting.
I've done many OpenCV C++ counting contracts over the past few years.
Send me a video with people and I can demo my counters for you...