Using An Object Detection Model with A Classifier

6 min readMar 3, 2020

The program detects three faces, and you can see the classified age for the corresponding face in the text box to the left. As you can see from Face 3, the model will still detect faces when they are occluded! We’ll let you try the app for yourself to see how accurate the classifications are.

Why use an object detection model with a classification model?

There are many situations where it is helpful to add a classification layer to an application using object detection. For instance, if you already have an app that detects people, you could add a model that classifies the gender of a detected individual. Or, as I will show in this tutorial, you could classify the detected individual in terms of an age range. Both applications would be useful for any situation that wants to track demographics.

In this tutorial I will demonstrate how to easily add a classification model to a starter app from alwasyAI that already uses a detection model with a few lines of code. All of the finished code from this tutorial is available on GitHub.

Before we get started, you’ll need:

An alwaysAI account (it’s free!)
alwaysAI set up on your machine
A text editor such as sublime or an IDE such as PyCharm, both of which offer free versions, or whatever else you prefer to code in

Please see the alwaysAI blog for more background on computer vision, developing models, how to change models, and more.

Let’s get started!

After you have your free account and have set up your developer environment, you need to download the starter apps; do so using this link before proceeding with the rest of the tutorial.

With the starter applications downloaded, you can begin to modify an existing starter app to use two object detection models. The app that was modified for this tutorial was the ‘Object Detector’ app, so cd into the starter apps folder and into the ‘realtime_object_detector’ folder.

cd ./alwaysai-starter-apps/realtime_object_detector

The object detection model that is used by default in the ‘realtime_object_detector’ starter app is ‘alwaysai/mobilenet_ssd’, but we are going to change this to the detection model ‘alwaysai/res10_300x300_ssd_iter_14000’, which identifies human faces; since we will be classifying by age, this is a well-suited detection model. The classification model, ‘alwaysai/agenet’ will classify people that were detected by ‘alwaysai/res10_300x300_ssd_iter_14000’ into age ranges of: 0–2, 4–6, 8–12, 15–20, 25–32, 38–43, 48–53, and 60–100 years old.

Since we added two new models to the app, we will need to add both of these models to our app environment. Make sure you are in the folder for your app being developed and type into the command lines:

aai app models add alwaysai/agenet

and subsequently,

aai app models add alwaysai/ares10_300x300_ssd_iter_140000

Now, since we are no longer using the default object detection model that was originally used with the ‘realtime_object_detector’, we are going to remove the model ‘alwaysai/mobilenet_ssd’ from the app environment to reduce the overall app size. Do this by typing the following command into the command line:

aai app models remove alwaysai/mobilenet_ssd

NOTE: You can easily change any model by following the ‘Changing the Computer Vision Model’ documentation.

Now that you’ve set up your app environment, you can begin to modify the starter app code. We’ll do this in the following steps:

First, alter the code that sets up the ‘obj_detect’ variable on line #16 of the original code.

1a. We’ll replace the name with something more appropriate for our new app: ‘facial_detector’. Do this for all instances of ‘obj_detect’ (in Pycharm you can highlight ‘obj_detect’ and click ‘Refactor’ → ‘Rename’ in the toolbar). Check that the print statements and other instances were properly altered.
1b. Change “alwaysai/mobilenet_ssd” to “alwaysai/res10_300x300_ssd_iter_14000” in line #18. The code should now contain the following lines:

facial_detector =    edgeiq.ObjectDetection(“alwaysai/res10_300x300_ssd_iter_140000”)facial_detector.load(engine=edgeiq.Engine.DNN)

2. Now, we’re going to add a classifier object and associated print statements, using the same format as we did for the ‘facial_detector’. Instead of ‘ObjectDetection’ as the class, we will instantiate a ‘Classification’ object (this can also be seen in the starter app ‘image_classifier’. You can always check out the starter apps for inspiration!)

2a. Add the following lines of code in after the ‘facial_detector’ start up but before the print statements:

classifier = edgeiq.Classification(“alwaysai/agenet”)classifier.load(engine=edgeiq.Engine.DNN)

2b. Add print statements to display the engine, accelerator, and model of the classifier to the terminal. You can copy the print statements for ‘facial_detector’ and change ‘facial_detector’ to ‘classifier’. At this stage, the first part of main, up until the ‘try’ statement should now look like this:

def main():# first make a detector to detect facial objects
    facial_detector = edgeiq.ObjectDetection(
            "alwaysai/res10_300x300_ssd_iter_140000")
    facial_detector.load(engine=edgeiq.Engine.DNN)# then make a classifier to classify the age of the image
    classifier = edgeiq.Classification("alwaysai/agenet")
    classifier.load(engine=edgeiq.Engine.DNN)# descriptions printed to console
    print("Engine: {}".format(facial_detector.engine))
    print("Accelerator: {}\n".format(facial_detector.accelerator))
    print("Model:\n{}\n".format(facial_detector.model_id))print("Engine: {}".format(classifier.engine))
    print("Accelerator: {}\n".format(classifier.accelerator))
    print("Model:\n{}\n".format(classifier.model_id))fps = edgeiq.FPS()

3. We’re going to add a counting variable to track the faces we detect. In the final code, I also removed some labels in order to make the markup less cluttered. See the code comments for these changes. Inside the while loop, before anything else add the following:

3a. Create a counting variable, I called mine ‘count’, and initialize it to 1.
3b. Create a loop that alters the labels for each prediction, using the ‘count’ variable to track which face is which:

for p in results.predictions:
    p.label = "Face " + str(count)
    count = count + 1

3c. Alter the original frame mark up code to show labels but not show confidences, and additionally, modify the text for the label to just display the faces shown:

frame = edgeiq.markup_image(
        frame, results.predictions, show_labels=True, show_confidences=False)

3d. Modify the description of what is appended to the text field. Change the line that appends ‘Objects:’ to the ‘text’ variable to instead say ‘Faces:’:

text.append("Faces:")

4. Now, we just need to get the results from the classifier.

4a. Within the ‘while’ loop but just before the ‘for’ loop, create a variable to track the faces, much like we did with ‘count’, and set it to 1. I named mine ‘age_label’.
4b. Now, inside the for loop, underneath the ‘for prediction in results.prediction’ do the following steps, append a label along with some text identifying each face, and then increment the variable:

text.append("Face {} ".format(
    age_label))

age_label = age_label + 1

4c. We need to trim out each face from the prediction, so that the classifer will work properly. The code to do this is:

face_image = edgeiq.cutout_image(frame, prediction.box)

4d. Create a variable ‘age_results’ and store the classification results from the face_image in this variable using the following code:

age_results = classifier.classify_image(face_image)

4e. Check if there were actually results from the classification model. Use an if/else statement, and append the results to the output text sent to the streamer if so. Add the following text for the ‘if’ part underneath your ‘age_results’ initialization:

if age_results.predictions:
    text.append("Label: {},  {:.2f}".format(age_results.predictions[0].label, age_results.predictions[0].confidence))

If there are no results, we will display this fact to the output stream. Finish the ‘else’ part of the conditional with:

else:
    text.append("No age prediction")

The final while loop code will look like this:

# loop detection
while True:

    # Step 3a: track how many faces are detected in a frame
    count = 1
    # read in the video stream
    frame = video_stream.read()


    # detect human faces
    results = facial_detector.detect_objects(
            frame, confidence_level=.5)

    # Step 3b: altering the labels to show which face was detected
    for p in results.predictions:
        p.label = "Face " + str(count)
        count = count + 1
    # Step 3c: alter the original frame mark up to just show labels
    frame = edgeiq.markup_image(
            frame, results.predictions, show_labels=True, show_confidences=False)
    # generate labels to display the face detections on the streamer
    text = ["Model: {}".format(facial_detector.model_id)]
    text.append(
            "Inference time: {:1.3f} s".format(results.duration))
    # Step 3d:
    text.append("Faces:")
    # Step 4a: add a counter for the face detection label
        age_label = 1
    # append each predication to the text output
    for prediction in results.predictions:

        # Step 4b: append labels for face detection & classification
        text.append("Face {} ".format(
            age_label))

        age_label = age_label + 1
        ## to show confidence, use the following instead of above:
        # text.append("Face {}: detected with {:2.2f}% confidence,".format(
          
  #count, prediction.confidence * 100))

        # Step 4c: cut out the face and use for the classification
        face_image = edgeiq.cutout_image(frame, prediction.box)
        # Step 4d: attempt to classify the image in terms of age
        age_results = classifier.classify_image(face_image)
        # Step 4e: if there are predictions for age classification,
        # generate these labels for the output stream
        if age_results.predictions:
            text.append("is {}".format(
                age_results.predictions[0].label,
            ))
        else:
            text.append("No age prediction")
        ## to append classification confidence, use the following
        ## instead of the above if/else:

        # if age_results.predictions:
        #     text.append("age: {}, confidence: {:.2f}\n".format(
        #         age_results.predictions[0].label,
        #         age_results.predictions[0].confidence))
        # else:
        #     text.append("No age prediction")
        # send the image frame and the predictions to the output stream
    streamer.send_data(frame, text)

    fps.update()

    if streamer.check_exit():
        break

That’s it! Now you can build and start your app to see it in action. You may want to configure the app first, especially if you changed edge devices or created a new folder from scratch. Do this with the following command and enter the desired configuration input when prompted:

aai app configure

Now, to see your app in action, first build the app by typing into the command line:

aai app deploy

And once it is done building, type the following command to start the app:

aai app start

Now open any browser to ‘localhost:5000’ and you should see the output illustrated at the beginning of the article!

Using An Object Detection Model with A Classifier

Written by Lila Mullany