Görsel dikkat modeli ve derin öğrenme yöntemleri kullanılarak geniş dağarcıklı ayrık işaret dili tanıma sisteminin modellenmesi
Özet
Automatic Sign Language Recognition (SLR) problem is an active field of study in computer vision and is a complex and challenging problem that focuses on automatic recognition of signs from videos. Recent developments in hardware and software enable the possibility of developing real-time automatic SLR systems. However, in order to develop systems that are convenient to use in dailiy life activities, sign language datasets that are prepared in more realistic environments are needed. Within the scope of the thesis, a large-scale isolated Ankara University Turkish Sign Language (AUTSL) dataset, which focuses on user-independent recognition, has been created and made publicly available. In the literature, while large-scale isolated sign language datasets of other languages are usually recorded in laboratory environments and in front of a plain background, the AUTSL dataset has a wide variety of backgrounds, both static and dynamic. For the isolated SLR problem, firstly, various architectures based on 2D-CNN and LSTM with attention mechanisms have been proposed. Secondly, only one RGB-Motion History Image (RGB-MHI) was created, in which the motion histories were summarized for each video, and RGB-MHI model has been proposed. Finally, two novel approaches are proposed with RGB-MHI model and 3D-CNNs. In the first, a motion history-based spatial attention mechanism that does not need explicit segmentation has been proposed using the RGB-MHI model, and integrated into the 3D-CNN. Secondly, 3D-CNN and RGB-MHI features are combined with a late fusion technique. These architectures, which are proposed using RGB-only data, achieve competitive results with multi-modal models in the literature.