Capture, Train, Evaluate, Deploy and Demo a Custom CV Model for use with Cisco Meraki MV Cameras from scratch using Python and Google Colab.
Capturing and preparing the dataset
Exporting in TFLite format and deploying on camera
This tutorial guides you through the process of:
- Capturing an image dataset using a Meraki MV Camera
- Training a TensorFlow model on Google Colab from said dataset
- Evaluating the model
- Exporting the model to TFLite and deploying it on your camera
- Using a simple app with TKinter to demo your model
- Cisco Meraki MV 2nd Generation Camera (MV2, MV12, MV32, MV22, MV72)
- Software version 4.18 on the camera
- A Google account (to be able to use Colab)
- A laptop with a working installation of Python 3 and Docker
- A working installation of LabelImg
The first part of the process is capturing images for your dataset. Clone the project to your working directory and navigate to the folder containing the Python scripts.
The first script we will use is cvSnapper.py
. This script will use a Meraki camera you have in your environment to take snapshots of the objects you want it to learn to identify. Before using it, you must set the following values:
In the config.py
file:
api_key
is your Meraki Dashboard API Key. If you don't know how to generate one, follow the instructions in this article: https://documentation.meraki.com/General_Administration/Other_Topics/Cisco_Meraki_Dashboard_APIcamera_serial
is the serial number of the camera you will be working withdesired_snaps
is the number of snapshots you wish to take to construct your dataset. The more the better, especially if you want to be able to recognize many different classes, but I managed to obtain decent accuracy with around 100 images and 4 classes total.dataset_name
is how you want to name your dataset. I called minemeraki_devices
train_test_split
is the percentage of images in the dataset that will be used for training, while the remaining ones will be used for testing
In line 12
of the cvSnapper.py
script, change the win_or_mac
variable to mac
or win
depending on your environment.
Also, run pip install -r requirements.txt
to install the required Python libraries before proceeding.
How large does my dataset need to be?
- The more, the better, as long as you have roughly the same number of samples for each class. Try to keep the ratio between the most common and least common class below 2 to 1.
- The more classes you have, the more individual samples per class you will need
- As a rule of thumb, you should aim for at least 100 training samples per class if your classes are very different and few, and more than 200 if your classes have similarities between them and you have more than 50 classes
- If your model is to be used in many different settings, you should capture samples that are relevant to all those settings (person samples in a well lit room vs. deploying the model in a dark room)
- The samples you generate should try to place the objects in many different orientations, positions, locations and in the presence of other objects that you want and do not want to detect. It might be worth having samples in which your objects are slightly obscured (their view is partially blocked by a different object)
For more information on recommendations to put your dataset together: https://cloud.google.com/vision/automl/object-detection/docs/prepare
When you're ready and you have positioned the camera you wish to use in the desired location and position, you can run the script with python cvSnapper.py
. The script will do the following:
- It will create a directory structure in the form:
images/
|_train/
|_test/
- It will prompt you with
Press Enter to capture snapshot:
, and after you press Enter (with no other input), it will generate a Snapshot from the camera you chose. It can be useful to have a Meraki Dashboard video stream of the camera open side by side to know what the camera is looking at before you request the snapshot and make any necessary adjustments. - It will then ask you if you want to
Keep image? (Y/N):
. If you answerY
, it will increase the snap counter, and save the image assnap_{snap_count}.jpg
in theimages
folder. - If you answer
N
, it will discard the image and ask you if you wish to continue or not. If you answerY
, it will prompt you again to take a new snapshot, and if you answerN
it will exit the program - The script will continue fetching snapshots until the counter reaches your
desired_snaps
setting, and you have the desired number of images. - After this, the script will randomly split your captured images according to your specified
train_test_split
and will place them in thetrain
andtest
folders.
For labeling our dataset, we will use an open source tool called LabelImg that you can find in this Github repo: https://github.com/tzutalin/labelImg
Follow the installation instructions for your operating system, and run LabelImg.
- Click on
Open Dir
, and navigate to your/images/train
folder - For every image, draw a bounding box around the object you wish to identify, and give it a class name. This class name MUST be consistent across ALL of the images depicting this object. If an image has more than one object you wish to identify, draw bounding boxes around each of them, and assign a label to each bounding box. Keep your labels consistent!
- After you draw a bounding box and assign a label, click the button under the
Save
button until it saysPascalVOC
. After this, click Save, which will generate a.xml
file with the same name as the image in the folder - Do this for ALL of your training images
- After you finish with all of the training images, saving each one, click
Open Dir
and select the/images/test
folder, and repeat the exact same process
Once you're done labeling, zip the whole images
folder (including the folder images, not just train and test) into a .zip file.
The following sections are performed in Google Colab. Colab is a web based environment where you have access to compute and GPU resources, and can run scripts in the form of Notebooks on demand. It's a nice free alternative for training Machine Learning models without purchasing devices with GPUs or spending a lot on environments like AWS, Google Cloud or Azure using their machine learning resources, and it should work well for prototyping solutions in general. For more professional-grade use cases, you will likely want to leverage paid offerings, or a Meraki ecosystem partner like Cogniac: https://apps.meraki.io/details/cogniac-packaging-kitting
- Go to https://colab.research.google.com
- Click on
File
, and selectUpload notebook
- Add the
Meraki_TFLite_Demo.ipynb
file from the github repo and clickOpen
Follow the instructions in the Colab Notebook.
Follow the instructions in the Colab Notebook as specified in the previous section.
Follow the instructions in the Colab Notebook until you export the TFLite file. After you export the TFLite file, go to your Google Drive and navigate to the /content/gdrive/MyDrive/TensorFlow/workspace/custom_cv/{MODEL_NAME}/tflite_models
folder, and download the file. Also download the label_map.pbtxt
file from your annotations
folder.
You can visualize the resulting neural network of your TFLite model using the open source Netron app.
- Open a browser and navigate to https://netron.app
- Drag and drop your .tflite file into it
- Click on the very first node in the graph, and on the right you should see information on the inputs and outputs of your model
- Your
input
should be of typefloat32[1,x,y,3]
where the first number is the number of images processed at a time (1), x and y are the height and width of the model (don't worry if these don't match your input source, your model will automatically resize them to fit), and the 4th number is the number of color channels (3 for RGB) - Your
outputs
should be exactly 4, all oftype:float32
and theirlocation
MUST be numbered sequentially (it doesn't matter which 4 numbers, but if the first number is 333, the subsequent numbers must be 334, 335 and 336 in that exact order). If this is not the case, your model will not work with the MV Camera, as it expects the outputs in that specific order. Right now there seems to be an issue with versions of Tensorflow greater than 2.5 when exporting to TFlite, which is why the Notebook prompts you to downgrade to TF 2.5 BEFORE exporting your TFLite model
After verifying your model has the appropriate inputs and outputs on Netron:
- Rename the file to
model.tflite
- Zip the file into
model.tflite.zip
, this is very easy to do from the command line withzip model.tflite.zip model.tflite
and it might be preferable to using a GUI-based ZIP utility as sometimes these add more files into the archive that will give you issues when uploading to the Meraki platform - Log into your Meraki Dashboard account, navigate to the camera you wish to test with, go to
Settings - Sense
- Click on
Enabled
for theSense API
andCustom CV
, and then selectAdd or delete custom models
- Upload your
model.tflite.zip
file and give it a meaningful name - After it finishes uploading and it says
Ready
, close this popup and assign the model to your camera, and use a low threshold or even 0, we will be using our app to control which detections are plotted and which aren't
For demoing, we will be using the mvCustomCvDashboard.py
script.
-
Make sure you have a working Docker installation in your computer and that you have IP connectivity to your camera
-
Run
docker pull eclipse-mosquitto:1.6
from your terminal, this fetches the image for the Eclipse Mosquitto broker we will be using to receive the messages from our camera -
Run
docker run -p 1883:1883 eclipse-mosquitto:1.6
from your terminal, this starts the mosquitto broker in your computer, and you should see some console messages in your terminal -
Go into your Meraki Dashboard and back to the camera you were working on
-
Add a new MQTT Broker from the
Settings - Sense
tab with the IP address of your computer and leave the security settings blank. NOTE: In a production environment, it is highly recommended you use TLS and a user/password to authenticate your clients, but this is out of the scope of this document. -
Select the MQTT Broker you just added as your camera's broker and click
Save
Go back to the config.py
file:
- Set
max_classes
to the maximum amount of classes you wish your Dashboard to track. The app works well tracking up to 8 classes, otherwise you will run out of space in the Tkinter GUI and you may need to adjust the height and width values. - Set
mqtt_server
to127.0.0.1
(this assumes you're running the script from the same machine you're running the MQTT Broker on) - Set
det_threshold
to your preferred value,0.8
worked well for me, but essentially this will only consider objects your model is at least 0.8 certain of
Add the label_map.pbtxt
file you downloaded before to your working directory (same level as the scripts), and run python mvCustomCvDashboard.py
. This will:
- Connect your script as a client to your broker, which should be receiving the messages from your camera
- Show a small Tkinter-based GUI that will show the counts of every object your camera recognizes
- The GUI will only show the first
max_classes
labels your camera identifies, if your model can recognize more labels than this they won't be displayed in the GUI
- Kapkar, N. (2021, September 29). TensorFlow 2 object detection API with Google Colab. Medium. Retrieved April 27, 2022, from https://medium.com/swlh/tensorflow-2-object-detection-api-with-google-colab-b2af171e81cc
- Renotte, N. (2021, April 9). Tensorflow object detection in 5 hours with Python | full course with 3 projects. YouTube. Retrieved April 27, 2022, from https://www.youtube.com/watch?v=yqkISICHH-U
- Kapkar, N. (2021, September 29). TensorFlow 2 object detection API with Google Colab. Medium. Retrieved April 27, 2022, from https://medium.com/swlh/tensorflow-2-object-detection-api-with-google-colab-b2af171e81cc
- Cisco Meraki. (2022, March 29). MV Sense Custom Computer Vision. Cisco Meraki. Retrieved April 27, 2022, from https://documentation.meraki.com/MV/Video_Analytics/MV_Sense_Custom_Computer_Vision