GMS - Global Monitoring System

An ounce of knowledge is worth more than a ton of data.

Overview

GMS is a fully automated system for discovering camera IoT devices based on IP addresses. Sampling and scoring discovered devices utilizing object recognition, and annotating camera content. Application is applying face recognition to cameras found in the wild (Millions of cameras distributed world-wide).

The Problem

Standard approaches (i.e., Deep Learning, CNN, RNN, etc.) fail at this task, even though small children (ages 3-5) are already capable to tell if an image contains a face, or not and point out its location. For more information on why PAC Theory and its derivatives will not cope with this and similar complex tasks see Towards a Practical Estimate of Training Sample Size. To demonstrate this, consider the following 11 images:

Previous Approaches

When applying face recognition API's from the following vendors:

  1. https://www.betafaceapi.com/demo.html
  2. https://facedetection.com/online-reverse-image-search
  3. http://www.elgom3a.com/post/online-face-detection
  4. https://how-old.net/
  5. https://www.twinsornot.net
  6. https://howhot.io
  7. https://www.faceplusplus.com/face-detection
  8. http://liuliu.me/ccv/js/nss
  9. http://www.codejungle.org/facedetection
  10. http://PicWiser.com
  11. http://facedetection.kuznech.com/
  12. http://www.facesearch.com
  13. https://skybiometry.com/demo/face-detect
  14. https://azure.microsoft.com/en-us/services/cognitive-services/face
  15. https://www.kairos.com/demos
  16. https://cmusatyalab.github.io/openface
  17. https://findface.pro/en/#free-demo
  18. http://www.bioenabletech.com/bioenable
  19. https://www.faceblind.org/facetests
  20. Google
  21. IBL Classifier

Just one of the first 19 vendor API's recognizes a face in a single image. All others fail without recognizing a face in a single image. The important thing to note here is that these images are not exceptions, but that there are large amounts for which either false negatives or positives are to be expected (based on a representative sample for one US metropolitan area).

Why retraining existing models on these images in not an option

When dealing with the output of millions of cameras (not including Smartphones and cell phones), the number of images to test and possibly retrain is computationally not feasible. Furthermore, a child at age 3-5 yrs only sees a very limited number of images of faces, but will very quickly attain the required proficiency for this cognitive task - i.e., deal with new unseen images (no retraining necessary).

Some more evaluations...

In the mean time, a couple more of the large international search engines were evaluated. Baidu, fairs no better than Google. For the first set of images (the child), the recommended similar images have little to nothing in common with the target images. For the second set of images, Baidu search never completes and delivers no similar image suggestions. Yandex fairs the best from all. For some of the images of the first type the search engine finds a few somewhat similar images. For the second set, it also fails. It appears, that the DoG algorithms used by Yandex image recognition engine provide somewhat better results than the deep learning approaches espoused by Goggle, Baidu, etc. All search engines are nowhere near to act as reliable tools for annotating image content.

© , Steve Romaniuk.