The robot is a 4 DOF arm. The complete forward and inverse kinematics have been solved and are provided in the documentation.
All the components are 3D printed in PLA. Because we are directly driving the joints with the servo motors, we avoid using bearings. Except at the base of the robot, where we are using two radial ball bearings for smooth motion.
The motors are strong enough to handle the weight of the camera at the end of the arm. It is important to drive the motors at 7.5V, otherwise Joints 2 and 3 will tend to fail in extended positions.
At the base of the robot we use Nylon M3 threaded standoff to provide separation between the ESP32 and the servo board. As well as to mount the base joint of the motor. All screws are M3, except for those that come with the servo motors, which are M2.
At the end of the arm, we attach a USB camera and the LED light. A 3D printed diffuser is installed in front of the LED ring. This was printed with clear PETG.
The robot is a 4 DOF arm. Each joint is directly driven by a serial bus servo, Feetech SCS 225.
This servo is easy to control and because it is a serial bus servo, we can daisy-chain them together, which makes for a very clean setup (few wires).
The motors connect to the Feetech URT board, which is powered with a 7.5 V AC-to-DC power adapter.
Then the URT board is connected to an ESP32 through the TX and RX pin. We are using an ESP32 development board for easier connection
For this project we are using an Anker 2K camera. It provides excellent resolution and autofocus. It is a bit pricey, however, changing to a different camera will not require a big change. A different mounting solution will need to be 3D printed. Additionally, some tunning might be required to account for different camera settings. All these details are provided in the documentation.
We are using the OpenCV library to access the camera video feed and perform the vision detection. More specifically, we are using MediaPipe Hands, which is a machine learning (ML) solution to infer 3D landmarks of a hand.
We use these landmarks inferred by MediaPipe and apply some logic to find specific gestures. The detected gestures define the actions to be taken by the robot. Additionally, we find the offsets of the hand with respect to the center of the image to determine how to move the robot.
The rest of the python code is essentially a state machine driven by the outputs of the vision program.
Additionally, we implement a basic PID controller to drive the robot and center the detected hand in the field of view of the camera.
Finally, we send commands over serial to the ESP32 to drive the motors and set the LED ring behavior (colors and intensity).
The ESP32 receives the command from the python program. This is the servo angles and LED setting. The only logic performed by the ESP32 is to perform a final check of the servo commands to ensure the angles are within the limits of the motor.
Other actions can be easily implemented such as retrieving servo information such as: servo temperature, position, current, and torque.
You can find a demo of the robot in the Youtube video above.
The design files and documentation of this project can be found in the Shop page. This will include:
© Me Vertuoso All Rights Reserved 2025