⬅ Projects

Hand Detection System

A real-time hand-inside-person detection demo using OpenCV, MediaPipe & MobileNetSSD in a Tkinter GUI.

Author: Paul (SUT ZAW AUNG) Project: Computer Vision Date: Nov 2025

Project Overview

This project uses:

  • MediaPipe Hands to extract 21 hand landmarks per hand (x, y coordinates normalized 0–1).
  • MobileNet-SSD (Caffe) to detect people (class ID 15).
The logic then checks if any hand landmark lies inside a detected person bounding box. On detection:
  • Draws a red bounding box around the person.
  • Displays “Hand Detected”.
  • Plays a beep sound.
  • Saves a screenshot automatically in the screenshots/ folder.
  • Runs in a simple Tkinter GUI for live feedback.

Hand Detected

Detected hand inside person bounding box

No Hand Detected

No hand inside person bounding box

Step-by-Step Code Explanation

1) Imports & Setup

import cv2 import mediapipe as mp import threading import tkinter as tk from tkinter import Label, Button from PIL import Image, ImageTk import time import os import winsound

Explanation: OpenCV handles video input & DNN person detection; MediaPipe extracts hand landmarks; threading keeps Tkinter responsive; PIL converts frames to Tkinter images; winsound plays alert sounds.

2) Screenshot folder & model initialization

os.makedirs("screenshots", exist_ok=True) mp_hands = mp.solutions.hands mp_draw = mp.solutions.drawing_utils hands = mp_hands.Hands(min_detection_confidence=0.7, min_tracking_confidence=0.7) net = cv2.dnn.readNetFromCaffe( "MobileNetSSD_deploy.prototxt", "MobileNetSSD_deploy.caffemodel" ) PERSON_CLASS_ID = 15

Explanation: - 0.7 detection & tracking confidence ensures MediaPipe only detects reliable hand landmarks. - MobileNetSSD class 15 is person. - Screenshot folder is created automatically to save images.

3) Tkinter GUI & App class

class App: def __init__(self, window): self.window = window self.window.title("Hand Detection System") self.label = Label(window) self.label.pack() self.btn_start = Button(window, text="Start Detection", command=self.start_detection) self.btn_start.pack() self.btn_stop = Button(window, text="Stop Detection", command=self.stop_detection) self.btn_stop.pack() self.cap = None self.running = False self.screenshot_count = 0 self.last_alert_time = 0 self.alert_interval = 2 # seconds

Explanation: Tkinter buttons start/stop detection.
Alert interval of 2s avoids multiple alerts for same hand presence.

4) Detection Loop & hand landmarks

while self.running and self.cap.isOpened(): ret, frame = self.cap.read() if not ret: break frame = cv2.flip(frame, 1) # mirror image for natural webcam view h, w = frame.shape[:2] rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) result = hands.process(rgb) hand_points = [] if result.multi_hand_landmarks: for hand_landmarks in result.multi_hand_landmarks: mp_draw.draw_landmarks(frame, hand_landmarks, mp_hands.HAND_CONNECTIONS) for lm in hand_landmarks.landmark: x, y = int(lm.x * w), int(lm.y * h) hand_points.append((x, y))

Explanation: - Frame is flipped for mirror effect. - MediaPipe returns normalized 0–1 coordinates; multiply by width & height to get pixel positions. - Draw landmarks for visual feedback.

5) Person detection & hand-inside check

blob = cv2.dnn.blobFromImage(frame, 0.007843, (300, 300), 127.5) net.setInput(blob) detections = net.forward() for i in range(detections.shape[2]): confidence = detections[0, 0, i, 2] class_id = int(detections[0, 0, i, 1]) if confidence > 0.5 and class_id == PERSON_CLASS_ID: box = detections[0, 0, i, 3:7] * [w, h, w, h] x1, y1, x2, y2 = box.astype(int) hand_inside = any(x1 <= x <= x2 and y1 <= y <= y2 for (x, y) in hand_points) color = (0, 0, 255) if hand_inside else (0, 255, 0) label = "Hand Detected" if hand_inside else "No Hand" cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2) cv2.putText(frame, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, 2) if hand_inside and time.time() - self.last_alert_time > self.alert_interval: winsound.Beep(1000, 300) path = f"screenshots/screenshot_{self.screenshot_count:03}.jpg" cv2.imwrite(path, frame) print(f"[Saved] {path}") self.screenshot_count += 1 self.last_alert_time = time.time()

Explanation: - Blob scaling 0.007843 = 1/127.5 → normalize to [-1,1] for MobileNetSSD. - Resize 300×300 matches model input size. - Detection confidence > 0.5 balances false positives/negatives. - Hand-inside check uses simple overlap. - Alert triggers beep & screenshot (timestamped with 3-digit incremental).

6) Display in Tkinter GUI

img = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) img = Image.fromarray(img) imgtk = ImageTk.PhotoImage(image=img) self.label.imgtk = imgtk self.label.configure(image=imgtk)

Convert BGR→RGB and use PIL Image to show live feed in Tkinter.

7) Exit clean-up

if self.cap: self.cap.release()

Release webcam when stopping detection or closing app.

Source Code & Repository

Full Python code is available on GitHub. For heart-shape hand detection, you can check the extended version there.

View Source Code on GitHub

Summary & Insights

Key Features:

  • Real-time person detection using MobileNet-SSD
  • Accurate hand-landmark detection via MediaPipe
  • Overlap check triggers alert when hand is inside person bounding box
  • Screenshot auto-saving with incremental filenames
  • Threaded Tkinter GUI for smooth live feed
Limitations:
  • Overlap check does not guarantee hand belongs to the person.
  • Lighting, occlusion, and camera angle can reduce accuracy.
  • winsound is Windows-only; replace for cross-platform alerts.