In this study, we present the Discriminative Temporal Shift Module (D-TSM), an enhancement of the Temporal Shift Module (TSM) for action recognition. TSM has limitations in capturing intricate temporal dynamics due to its simplistic feature shifting. D-TSM addresses this by introducing a subtraction operation before the shifting. This enables the extraction of discriminative features between adjacent frames, thereby allowing for effective action recognition where subtle motions serve as crucial cues. It preserves TSM’s foundational philosophy, prioritizing minimal computational overhead and no additional parameters. Our experiments demonstrate that D-TSM significantly improves performance of TSM and outperforms other leading 2D CNN-based methods.
2023
D-TSM: Discriminative Temporal Shift Module for Action Recognition
Sangyun Lee, and Sungjun Hong
In 2023 20th International Conference on Ubiquitous Robots (UR), Jun 2023
Action recognition is one of the representative perception tasks for robot application, but it still remains challenging due to complex temporal dynamics. Although temporal shift module (TSM) has been considered to be one of the best 2D CNN based architecture for temporal modeling, its inherent structural simplicity limits performance and has room for improvement. To mitigate this issue while following TSM’s philosophy, this paper presents a variant of TSM, termed as Discriminative TSM (D-TSM), with a focus on capturing dis-criminative features for motion pattern. Going further from the naive shift operation in TSM, our D-TSM explicitly transforms shifted features by applying element-wise subtraction. This simple approach is effective to create discriminative features between adjacent frames with a small extra computational cost and zero parameter. The experiments on Something-Something and Jester datasets demonstrate that our D-TSM outperforms TSM and achieves competitive performance with low FLOPs against other methods.
2022
Extended Siamese Convolutional Neural Networks for Discriminative Feature Learning
Sangyun Lee, and Sungjun Hong
International Journal of Fuzzy Logic and Intelligent Systems, 2022
Siamese convolutional neural networks (SCNNs) has been considered as among the best deep learning architectures for visual object verification. However, these models involve the drawback that each branch extracts features independently without considering the other branch, which sometimes lead to unsatisfactory performance. In this study, we propose a new architecture called an extended SCNN (ESCNN) that addresses this limitation by learning both independent and relative features for a pair of images. ESCNNs also have a feature augmentation architecture that exploits the multi-level features of the underlying SCNN. The results of feature visualization showed that the proposed ESCNN can encode relative and discriminative information for the two input images at multi-level scales. Finally, we applied an ESCNN model to a person verification problem, and the experimental results indicate that the ESCNN achived an accuracy of 97.7%, which outperformed an SCNN model with 91.4% accuracy. The results of ablation studies also showed that a small version of the ESCNN performed 5.6% better than an SCNN model.
2019
Multiple Object Tracking via Feature Pyramid Siamese Networks
When multiple object tracking (MOT) based on the tracking-by-detection paradigm is implemented, the similarity metric between the current detections and existing tracks plays an essential role. Most of the MOT schemes based on a deep neural network learn the similarity metric using a Siamese architecture, but the plain Siamese architecture might not be enough owing to its structural simplicity and lack of motion information. This paper aims to propose a new MOT scheme to overcome the existing problems in the conventional MOTs. Feature pyramid Siamese network (FPSN) is proposed to address the structural simplicity. The FPSN is inspired by a feature pyramid network (FPN) and it extends the Siamese network by applying FPN to the plain Siamese architecture and by developing a new multi-level discriminative feature. A spatiotemporal motion feature is added to the FPSN to overcome the lack of motion information and to enhance the performance in MOT. Thus, FPSN-MOT considers not only the appearance feature but also motion information. Finally, FPSN-MOT is applied to the public MOT challenge benchmark problems and its performance is compared to that of the other state-of-the-art MOT methods.
2017
Robust adaptive synchronization of a class of chaotic systems via fuzzy bilinear observer using projection operator
This study focuses on the problem of robust adaptive synchronization of uncertain bilinear chaotic systems. A Takagi–Sugeno fuzzy bilinear system (TSFBS) is employed herein to describe a bilinear chaotic system. A robust adaptive observer, which estimates the states of the TSFBS, is also developed. Advanced adaptive laws using a projection operator are designed to achieve both the robustness for the external disturbances and the adaptation of unknown system parameters. A comparison with the existing observer shows that the proposed observer can achieve a faster parameter adaptation and a robust synchronization for an uncertain TSFBS with disturbances when the adaptive laws are utilized. The asymptotic stability and the robust performance of the error dynamics are guaranteed by some assumptions and the Lyapunov stability theory. We verify the effectiveness of the proposed scheme using examples of the generalized Lorenz system in various aspects.