This paper discusses a real-time approach for capturing and recognizing sign
language gestures by efficient vision techniques and deep neural models. The key data
constraint of limited availability of comprehensive sign corpora is tackled through
configurable accumulation of hand image samples from continuous video. Configurable
interfaces enabled collection of diverse sign samples spanning the alphabet as well as
additional signs like “Good”, “Bad”, “Nice”, “Little” and “Stop”. The presented customizable
interface enables triggering scheduled collection protocols, while focusing bounding box
extraction algorithms only on active signing areas alleviates storage needs. Novel touch-less
tracking by homegrown computer vision algorithms also promotes inclusion. Created samples
receive augmentation including generative and projective transformations promoting
variability and reduced bias. The models trained thereafter demonstrate state-of-the-art
performance on internal benchmarks that surpass previous academic attempts in the domain.
Qualitative assessments by independent native interpreters provide encouraging indicators
on real-world viability. This expandable architecture via parameterized logging protocols,
paired with customizable assembly of training data shows promise in transitioning sign
recognition from controlled settings to unconstrained environments. Easy replicability also
enables rapid upgrading with new vocabulary and concepts. Future efforts include conversion
of identified gestures into both text and voice modalities ensuring multi-format accessibility
by diverse demographic groups. Overall, this work presents an end-to-end ecosystem tackling
the problem of sign language gesture recognition using bespoke computer vision and adaptive
machine learning techniques for accessibility and inclusion.