This study introduces a novel framework, "Comprehensive Optimization and Refinement through Ensemble Fusion in Domain Adaptation for Person Re-identification (CORE-ReID)", to address Unsupervised Domain Adaptation (UDA) for person Re-identification (ReID).
The framework utilizes CycleGAN to generate diverse data that harmonize differences in image characteristics from different camera sources in the pre-training stage. In the fine-tuning stage, based on a pair of teacher–student networks, the framework integrates multi-view features for multi-level clustering to derive diverse pseudo-labels. A learnable Ensemble Fusion component that focuses on fine grained local information within global features is introduced to enhance learning comprehensiveness and avoid ambiguity associated with multiple pseudo-labels. Experimental results on three common UDAs in Person ReID demonstrated significant performance gains over state-of-the-art approaches. Additional enhancements, such as Efficient Channel Attention Block and Bidirectional Mean Feature Normalization mitigate deviation effects and the adaptive fusion of global and local features using the ResNet-based model, further strengthening the framework.
The proposed framework ensures clarity in fusion features, avoids ambiguity, and achieves high accuracy in terms of Mean Average Precision, Top-1, Top-5, and Top-10, positioning it as an ad-vanced and effective solution for UDA in person ReID. Our codes and models are available at https://github.com/TrinhQuocNguyen/CORE-ReID.
Figure 1. Two stages in our process adopting the clustering-based approach. Primarily, we train the model on a customized source domain dataset, and subsequently, the parameters of this pre-trained model are transferred to both the student and teacher networks as an initiali-zation step for the next stage. During fine-tuning, we train the student model and then update the teacher model using momentum updates. To optimize computational re-sources, only the teacher model is utilized for inference purposes.
Figure 2. Our method follows a specific pipeline to create the comprehensive training set for the source domain. Initially, we combine both the training set (depicted in green boxes) and the test set (represented by dark green boxes) within the source dataset, forming the total training set con-sisting of real images. This combined set is then employed to train the camera-aware style trans-fer model. For each real image, the trained transfer model is applied to generate images (indicated by blue boxes for the training set and dark blue boxes for the test set) that align with the stylistic characteristics of the target cameras. Subsequently, the real images (green and dark green boxes) and the style-transferred images (blue and dark blue boxes) are amalgamated to produce the final training set within the source domain.
Figure 3. Some style-transferred samples in Market-1501 [39]. Each image, originally taken by a specific camera, is transformed to align with the styles of the other five cameras, both within the training and test data. The real images are shown on the left, while their corresponding style-transferred counterparts are shown on the right.
Figure 4. The overall training process in the fully supervised pre-training stage. We employ ResNet101 as the backbone in our training process.
Figure 5. Overview of our CORE-ReID framework. BMFN denotes Bi-directional mean feature Normalization. We combined local features and global features using Ensemble Fusion. The ECAB in Ensemble Fusion promotes to enhance the features. By using BMFN, the framework can merge the feature from original image x_(T,i) and its paired flipped image x_(T,i)^', then produce fusion feature φ_l,l∈{top,bot}. The student network is optimized using pseudo labels in a supervised manner, while the teacher network is updated by computing the temporal average of the student networks via the update momentum.
Figure 6. The Ensemble Fusion component. ς_top and ς_bot features are passed through a component namely Efficient Channel Attention Block (ECAB) to produce the channel attention maps by exploiting the inter-channel relationship of features which helps to enhance the features.
Figure 7. The structure of Efficient Channel Attention Block. The Shared Multilayer Perceptron has odd h hidden layers, where the first (h-1)/2 layers are reduced the size with the reduction rate r, and the last (h-1)/2 layers will be expanded with the same rate r.
Figure 8. Examples by drawing feature maps using Grad-CAM [63]. By using the CORE-ReID framework, we consider both original image and its flipped image. (a), (b), (c), (d) illustrate the feature maps of those pairs on Market➝CUHK, CUHK➝Market, Market➝MSMT, CUHK➝MSMT, respectively.
Figure 9. Impact of clustering parameter M_(T,j). Results on (a) Market→CUHK, (b) CUHK→Market, (c) Market→MSMT, and (d) CUHK→MSMT.
Figure 10. Effect of using ECAB and BMFN in our proposed methods. Results on (a) Market → CUHK, (b) CUHK → Market, (c) Market → MSMT, and (d) CUHK → MSMT. The evaluation metrics are mAP(%) and rank (R) at k accuracy (%).
Figure 11. Impact of the backbone configurations. Results on (a) Market → CUHK, (b) CUHK → Market show that ResNet101 backbone gives the best overall results. The evaluation metrics are mAP(%) and rank (R) at k accuracy (%).
Table 1. Experimental results of the proposed CORE-ReID framework and SOTA methods (Acc %) on Market-1501 and CUHK03 datasets. Bold denotes the best while Underline indicates the second-best results. a indicates the method uses multiple source datasets.
Table 2. Experimental results of the proposed CORE-ReID framework and SOTA methods (Acc %) from Market-1501 and CUHK03 source datasets to target domain MSMT17 dataset Bold denotes the best while Underline indicates the second-best results. a indicates the method uses multiple source datasets, b denotes the implementation is based on the author’s code.
Table 1. Experimental results of the proposed CORE-ReID framework and state-of-the-art algorithms (Acc %) on Market-1501 and DukeMTMC-ReID datasets. Bold denotes the best while Underline indicates the second-best results.
Table 2. Experimental results of the proposed CORE-ReID framework and state-of-the-art algorithms (Acc %) from Market-1501 and DukeMTMC-ReID source datasets to target domain MSMT17 dataset. Bold denotes the best while Underline indicates the second-best results.
@article{,
author = {Nguyen TQ, Prima ODA, Hotta K},
title = {CORE-ReID: Comprehensive Optimization and Refinement through Ensemble Fusion in Domain Adaptation for Person Re-Identification.},
journal = {Software},
doi = {https://doi.org/10.3390/software3020012},
volume = {3},
pages = {227-249},
year = {2024},
}
Nguyen, T.Q.; Prima, O.D.A.; Hotta, K.
CORE-ReID: Comprehensive Optimization and Refinement through Ensemble Fusion in Domain Adaptation for Person Re-Identification.
Software 2024, 3, 227-249. https://doi.org/10.3390/software3020012
Nguyen TQ, Prima ODA, Hotta K.
CORE-ReID: Comprehensive Optimization and Refinement through Ensemble Fusion in Domain Adaptation for Person Re-Identification.
Software. 2024; 3(2):227-249. https://doi.org/10.3390/software3020012
Nguyen, Trinh Quoc, Oky Dicky Ardiansyah Prima, and Katsuyoshi Hotta. 2024.
"CORE-ReID: Comprehensive Optimization and Refinement through Ensemble Fusion in Domain Adaptation for Person Re-Identification" Software 3, no. 2: 227-249.
https://doi.org/10.3390/software3020012