How to Recognize Human Activity and Pose Estimation Using alwaysAI

By Andres Ulloa • Feb 14, 2020

The ability to recognize human activity with computer vision allows us to create applications that can interact with and respond to an user in real time. For instance, we can make an application that gives feedback to a user in the moment so that they can learn how to recreate the perfect golf swing, or that sends an immediate alert for help when someone has fallen, or that generates an immersive augmented reality experience based on the user's position.

In computer vision we use a technique called pose estimation to achieve these goals. Pose estimation maps out a person’s physical frame in an image by assigning sets of coordinates known as key-points to specific body parts (some pose estimation models return vector maps, while others return key-points. We will be focusing on a model that returns key-points in this guide). Once we have this map of key-points, we can begin to determine a person’s activity based on their positions in a video stream.

alwaysAI provides a set of open source starter models in the Model Catalog. The following example uses one of the starter models with a simple algorithm in order to achieve its goal.

Doing the YMCA

In this example we'll be using the alwaysai/human-pose model, along with a set of checks to determine if someone is performing a “Y,” “M,” “C,” or “A” pose. Then we’ll overlay the corresponding letter on the screen when the person in the image is in one of these poses.

The final code for this project can be found at this GitHub repo.

ymca, Human Activity Using alwaysAI

I’ll be prototyping this application on my laptop running Ubuntu, and then deploying it to an edge device. If you use Windows or MacOS, you can develop this app using remote development and your edge device.

To get started, we’ll use the real-time pose estimator app from the set of basic starter apps provided by alwaysAI.

First, I install alwaysAI and download all of the starter apps by running the command:

aai get-starter-apps

Now let’s try running the pose estimator app to see what it produces. After I navigate into the realtime_pose_estimator directory, I have to configure my project by running "aai app configure". I choose “Your local computer” for my deployment option and the default settings for the rest. Then I have to install the model and the dependencies, by running "aai app install". Finally, I start the application by running "aai app start". I click on the http://localhost:5000 link that appears in my terminal (after successfully starting the app) to open the Streamer window:

cd alwaysai-starter-apps/realtime_pose_estimator
aai app configure
aai app install
aai app start

ymca2. Human Activity Using alwaysAI

Looking at the code in the realtime_pose_estimator app, I see that it loads the model, camera, and Streamer in order to write the key-point values to the Streamer, and to display the image overlaid with the connected key-points. Here each pair of key-points maps to a body part (when the key-points appear as [-1, -1], it means that these parts aren’t in the scope of the image and so can’t be found by the model).

    pose_estimator = edgeiq.PoseEstimation("alwaysai/human-pose")
with edgeiq.WebcamVideoStream(cam=0) as video_stream, \
edgeiq.Streamer() as streamer:
while True:
frame =
results = pose_estimator.estimate(frame)
for key_point in pose.key_points:
streamer.send_data(results.draw_poses(frame), text)

But how do we know which key points correspond to each body part? Well, the documentation provides a handy little map which I can add to the starter app as a dictionary.

BODY_PART_MAP = {"Nose": 0,
"Neck": 1,
"Right Shoulder": 2,
"Right Elbow": 3,
"Right Wrist": 4,
"Left Shoulder": 5,
"Left Elbow": 6,
"Left Wrist": 7,
"Right Hip": 8,
"Right Knee": 9,
"Right Ankle": 10,
"Left Hip": 11,
"Left Knee": 12,
"Left Ankle": 13,
"Right Eye": 14,
"Left Eye": 15,
"Right Ear": 16,
"Left Ear": 17}

def main():
pose_estimator = edgeiq.PoseEstimation("alwaysai/human-pose")

The origin point (where the axes are 0, 0) of the image here is the top left corner, so the further up and to the left something is in the image, the closer it is to 0.

Let's think about the problem we are trying to solve. If we want to determine whether someone is doing the YMCA poses, we will need to know where a person's arms are in relation to each other and the person's head. Is there a way that we could find out if a person's arm and head key-points match a pose using the key-points available to us? Let's try to define a “Y” pose. A “Y” pose is when a person extends both arms straight and angled outwards from their body above their head. So we can define a “Y” pose as:

all([arms_overhead(pose), arms_outward(pose), arms_straight(pose)])

We can define the arms_overhead function with the key-points given to us, for example:

def arms_overhead(pose):
return all(y < pose.nose_y for y in (pose.r_elbow_y, pose.r_wrist_y, pose.l_elbow_y, pose.l_wrist_y))

Here we’re saying that when a person has their arms over their head, all wrist and elbow key-points will be above the nose. Let's take a crack at the arms_outward function:

def arms_outward(pose):
return all([x < pose.nose_x for x in (pose.r_elbow_x, pose.r_wrist_x)] +\
[x > pose.nose_x for x in (pose.l_elbow_x, pose.l_wrist_x)])

Here we’re saying that the right wrist and right elbow must come before the nose in the x direction, and the left wrist and left elbow must come after (note that we’re talking about the person’s right wrist, and not the right wrist in the image).

Now let's try arms straight. This is a tough one.

def arms_straight(pose):
return all([pose.r_wrist_y < pose.r_elbow_y,
pose.l_wrist_y < pose.l_elbow_y,
pose.r_wrist_x < pose.r_elbow_x,
pose.l_wrist_x > pose.l_elbow_x])

We’re saying that if a wrist is higher up than the elbow, and farther away from the nose than the elbow, then it’s basically straight (enough so for our purposes here).

So, there you have it! We’ve created a function that determines if a pose is “Y.” We just need to repeat this process to finish building our simple algorithmic classifier.

I applied the same logic to all 4 categories:

def is_y(pose):
"""Determines if the pose is a Y pose"""
return all([arms_overhead(pose), arms_outward(pose), arms_straight(pose)])

def is_a(pose):
"""Determines if the pose is an A."""
return all([arms_overhead(pose), arms_outward(pose), arms_bent_in(pose), wrists_high(pose)])

def is_m(pose):
"""Determines if the pose is an A."""
return all([wrists_overhead(pose), arms_outward(pose), arms_bent_in(pose), wrists_low(pose)])

def is_c(pose):
"""Determines if the pose is a C"""
return all([wrists_left(pose), right_wrist_overhead(pose)])

And created the associated body position functions:

def main():
pose_estimator = edgeiq.PoseEstimation("alwaysai/human-pose")
pose_estimator.load(engine=edgeiq.Engine.DNN_OPENVINO, accelerator=edgeiq.Accelerator.MYRIAD)

y_letter = cv2.imread('y_letter.png')
m_letter = cv2.imread('m_letter.jpg')
c_letter = cv2.imread('c_letter.jpeg')
a_letter = cv2.imread('a_letter.jpg')

with edgeiq.WebcamVideoStream(cam=0) as video_stream, \
edgeiq.Streamer() as streamer:
# Allow Webcam to warm up

# loop detection
while True:
frame =
results = pose_estimator.estimate(frame)
# Generate text to display on streamer
text = [""]
for ind, pose in enumerate(results.poses):
app_pose = YMCAPose(pose)

if is_a(app_pose):
overlay = edgeiq.resize(a_letter, frame.shape[1], frame.shape[0], False)
cv2.addWeighted(frame, 0.4, overlay, 0.6, 0, frame)
if is_m(app_pose):
overlay = edgeiq.resize(m_letter, frame.shape[1], frame.shape[0], False)
cv2.addWeighted(frame, 0.4, overlay, 0.6, 0, frame)
if is_y(app_pose):
overlay = edgeiq.resize(y_letter, frame.shape[1], frame.shape[0], False)
cv2.addWeighted(frame, 0.4, overlay, 0.6, 0, frame)
if is_c(app_pose):
overlay = edgeiq.resize(c_letter, frame.shape[1], frame.shape[0], False)
cv2.addWeighted(frame, 0.4, overlay, 0.6, 0, frame)

streamer.send_data(results.draw_poses(frame), text)

if streamer.check_exit():
print("Program Ending")

Deploying the App

Now that I have my app working, I want to run it on my edge device (I’ll be using a Pi 4 with a Pi camera attached) and test it. To do this, I use “aai app configure” to set up my target configuration to allow my code to be deployed and run on my device:

aai app configure
✔ Found
✔ What is the destination? › Remote device
✔ Found Dockerfile
✔ Please enter the hostname (with optional user name) to connect to your device via ssh (e.g. "pi@"): … alwaysai@nano
✔ Connect by SSH
✔ Check docker executable
✔ Check docker permissions
✔ Would you like to use the default installation directory "alwaysai/ymca_app"? … yes
✔ Create target directory
✔ Write

Now whenever we run “aai app install” it will deploy our updated application, and “aai app start” will run the application on the device.

aai app install
aai app start


We’ve walked through a simple example of modifying the pose estimation app so that it can detect when a person is doing the YMCA. This same process can be expanded to many use cases that involve assessing a person’s activity and actions.

The alwaysAI platform makes it easy to build, test, and deploy computer vision applications such as this YMCA detector. Sign up for a free account here. We can’t wait to see what you'll build! 

By Andres Ulloa • Feb 14, 2020

Developer stories to your inbox.

Subscribe to the Developer Digest, a monthly dose of all things code.

You may unsubscribe at any time using the unsubscribe link in the digest email. See our privacy policy for more information.

alwaysAI Ad
stylized image of a computer chip

Sign up today and start your project

We can't wait to see what you'll build!