In this article we propose to detect the hand and face gestures in a peer to peer video
call. First the frames are captured from webcam using openCV, then hand and facial
landmarks in video frames are detected using Media Pipe Holistic model. Then the extracted
key point values from frames are processed and fed to the trained LSTM model which classifies
the gestures into specific texts. The unique feature added is detects accurate text based on
highest probability value. Then this is streamed in a peer to peer video call using WebRtc which
uses signalling to establish connection among peers using http and SDP protocol.