Step-by-Step Code Explanation
1) Imports & Setup
import cv2
import mediapipe as mp
import threading
import tkinter as tk
from tkinter import Label, Button
from PIL import Image, ImageTk
import time
import os
import winsound
Explanation:
OpenCV handles video input & DNN person detection; MediaPipe extracts hand landmarks; threading keeps Tkinter responsive; PIL converts frames to Tkinter images; winsound plays alert sounds.
2) Screenshot folder & model initialization
os.makedirs("screenshots", exist_ok=True)
mp_hands = mp.solutions.hands
mp_draw = mp.solutions.drawing_utils
hands = mp_hands.Hands(min_detection_confidence=0.7,
min_tracking_confidence=0.7)
net = cv2.dnn.readNetFromCaffe(
"MobileNetSSD_deploy.prototxt",
"MobileNetSSD_deploy.caffemodel"
)
PERSON_CLASS_ID = 15
Explanation:
- 0.7 detection & tracking confidence ensures MediaPipe only detects reliable hand landmarks.
- MobileNetSSD class 15 is person.
- Screenshot folder is created automatically to save images.
3) Tkinter GUI & App class
class App:
def __init__(self, window):
self.window = window
self.window.title("Hand Detection System")
self.label = Label(window)
self.label.pack()
self.btn_start = Button(window, text="Start Detection", command=self.start_detection)
self.btn_start.pack()
self.btn_stop = Button(window, text="Stop Detection", command=self.stop_detection)
self.btn_stop.pack()
self.cap = None
self.running = False
self.screenshot_count = 0
self.last_alert_time = 0
self.alert_interval = 2 # seconds
Explanation: Tkinter buttons start/stop detection.
Alert interval of 2s avoids multiple alerts for same hand presence.
4) Detection Loop & hand landmarks
while self.running and self.cap.isOpened():
ret, frame = self.cap.read()
if not ret:
break
frame = cv2.flip(frame, 1) # mirror image for natural webcam view
h, w = frame.shape[:2]
rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
result = hands.process(rgb)
hand_points = []
if result.multi_hand_landmarks:
for hand_landmarks in result.multi_hand_landmarks:
mp_draw.draw_landmarks(frame, hand_landmarks, mp_hands.HAND_CONNECTIONS)
for lm in hand_landmarks.landmark:
x, y = int(lm.x * w), int(lm.y * h)
hand_points.append((x, y))
Explanation:
- Frame is flipped for mirror effect.
- MediaPipe returns normalized 0–1 coordinates; multiply by width & height to get pixel positions.
- Draw landmarks for visual feedback.
5) Person detection & hand-inside check
blob = cv2.dnn.blobFromImage(frame, 0.007843, (300, 300), 127.5)
net.setInput(blob)
detections = net.forward()
for i in range(detections.shape[2]):
confidence = detections[0, 0, i, 2]
class_id = int(detections[0, 0, i, 1])
if confidence > 0.5 and class_id == PERSON_CLASS_ID:
box = detections[0, 0, i, 3:7] * [w, h, w, h]
x1, y1, x2, y2 = box.astype(int)
hand_inside = any(x1 <= x <= x2 and y1 <= y <= y2 for (x, y) in hand_points)
color = (0, 0, 255) if hand_inside else (0, 255, 0)
label = "Hand Detected" if hand_inside else "No Hand"
cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
cv2.putText(frame, label, (x1, y1 - 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, 2)
if hand_inside and time.time() - self.last_alert_time > self.alert_interval:
winsound.Beep(1000, 300)
path = f"screenshots/screenshot_{self.screenshot_count:03}.jpg"
cv2.imwrite(path, frame)
print(f"[Saved] {path}")
self.screenshot_count += 1
self.last_alert_time = time.time()
Explanation:
- Blob scaling 0.007843 = 1/127.5 → normalize to [-1,1] for MobileNetSSD.
- Resize 300×300 matches model input size.
- Detection confidence > 0.5 balances false positives/negatives.
- Hand-inside check uses simple overlap.
- Alert triggers beep & screenshot (timestamped with 3-digit incremental).
6) Display in Tkinter GUI
img = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
img = Image.fromarray(img)
imgtk = ImageTk.PhotoImage(image=img)
self.label.imgtk = imgtk
self.label.configure(image=imgtk)
Convert BGR→RGB and use PIL Image to show live feed in Tkinter.
7) Exit clean-up
if self.cap:
self.cap.release()
Release webcam when stopping detection or closing app.