Abstract

This study introduces a novel framework, "Comprehensive Optimization and Refinement through Ensemble Fusion in Domain Adaptation for Person Re-identification (CORE-ReID)", to address Unsupervised Domain Adaptation (UDA) for person Re-identification (ReID).

The framework utilizes CycleGAN to generate diverse data that harmonize differences in image characteristics from different camera sources in the pre-training stage. In the fine-tuning stage, based on a pair of teacher–student networks, the framework integrates multi-view features for multi-level clustering to derive diverse pseudo-labels. A learnable Ensemble Fusion component that focuses on fine grained local information within global features is introduced to enhance learning comprehensiveness and avoid ambiguity associated with multiple pseudo-labels. Experimental results on three common UDAs in Person ReID demonstrated significant performance gains over state-of-the-art approaches. Additional enhancements, such as Efficient Channel Attention Block and Bidirectional Mean Feature Normalization mitigate deviation effects and the adaptive fusion of global and local features using the ResNet-based model, further strengthening the framework.

The proposed framework ensures clarity in fusion features, avoids ambiguity, and achieves high accuracy in terms of Mean Average Precision, Top-1, Top-5, and Top-10, positioning it as an ad-vanced and effective solution for UDA in person ReID. Our codes and models are available at https://github.com/TrinhQuocNguyen/CORE-ReID.

Overall

Figure 1. Two stages in our process adopting the clustering-based approach. Primarily, we train the model on a customized source domain dataset, and subsequently, the parameters of this pre-trained model are transferred to both the student and teacher networks as an initiali-zation step for the next stage. During fine-tuning, we train the student model and then update the teacher model using momentum updates. To optimize computational re-sources, only the teacher model is utilized for inference purposes.

Generate Training Data on Source Domain Using CycleGAN

Figure 2. Our method follows a specific pipeline to create the comprehensive training set for the source domain. Initially, we combine both the training set (depicted in green boxes) and the test set (represented by dark green boxes) within the source dataset, forming the total training set con-sisting of real images. This combined set is then employed to train the camera-aware style trans-fer model. For each real image, the trained transfer model is applied to generate images (indicated by blue boxes for the training set and dark blue boxes for the test set) that align with the stylistic characteristics of the target cameras. Subsequently, the real images (green and dark green boxes) and the style-transferred images (blue and dark blue boxes) are amalgamated to produce the final training set within the source domain.

Generated Images from CycleGAN

(a) Sample in training set

(b) Sample in validation set

Figure 3. Some style-transferred samples in Market-1501 [39]. Each image, originally taken by a specific camera, is transformed to align with the styles of the other five cameras, both within the training and test data. The real images are shown on the left, while their corresponding style-transferred counterparts are shown on the right.

Source Pretraining

Figure 4. The overall training process in the fully supervised pre-training stage. We employ ResNet101 as the backbone in our training process.

CORE-ReID (Target Fine-tuning)

Figure 5. Overview of our CORE-ReID framework. BMFN denotes Bi-directional mean feature Normalization. We combined local features and global features using Ensemble Fusion. The ECAB in Ensemble Fusion promotes to enhance the features. By using BMFN, the framework can merge the feature from original image x_(T,i) and its paired flipped image x_(T,i)^', then produce fusion feature φ_l,l∈{top,bot}. The student network is optimized using pseudo labels in a supervised manner, while the teacher network is updated by computing the temporal average of the student networks via the update momentum.

Ensemble Fusion

Figure 6. The Ensemble Fusion component. ς_top and ς_bot features are passed through a component namely Efficient Channel Attention Block (ECAB) to produce the channel attention maps by exploiting the inter-channel relationship of features which helps to enhance the features.

Efficient Channel Attention Block

Figure 7. The structure of Efficient Channel Attention Block. The Shared Multilayer Perceptron has odd h hidden layers, where the first (h-1)/2 layers are reduced the size with the reduction rate r, and the last (h-1)/2 layers will be expanded with the same rate r.

Feature Maps

(a)

(b)

(c)

(d)

Figure 8. Examples by drawing feature maps using Grad-CAM [63]. By using the CORE-ReID framework, we consider both original image and its flipped image. (a), (b), (c), (d) illustrate the feature maps of those pairs on Market➝CUHK, CUHK➝Market, Market➝MSMT, CUHK➝MSMT, respectively.

Accuracy

(a)

(b)

(c)

(d)

Figure 9. Impact of clustering parameter M_(T,j). Results on (a) Market→CUHK, (b) CUHK→Market, (c) Market→MSMT, and (d) CUHK→MSMT.

ECAB and BMFN Settings

(a)

(b)

(c)

(d)

Figure 10. Effect of using ECAB and BMFN in our proposed methods. Results on (a) Market → CUHK, (b) CUHK → Market, (c) Market → MSMT, and (d) CUHK → MSMT. The evaluation metrics are mAP(%) and rank (R) at k accuracy (%).

Backbone Configurations

(a)

(b)

Figure 11. Impact of the backbone configurations. Results on (a) Market → CUHK, (b) CUHK → Market show that ResNet101 backbone gives the best overall results. The evaluation metrics are mAP(%) and rank (R) at k accuracy (%).

Results

Table 1. Experimental results of the proposed CORE-ReID framework and SOTA methods (Acc %) on Market-1501 and CUHK03 datasets. Bold denotes the best while Underline indicates the second-best results. ^a indicates the method uses multiple source datasets.

Table 2. Experimental results of the proposed CORE-ReID framework and SOTA methods (Acc %) from Market-1501 and CUHK03 source datasets to target domain MSMT17 dataset Bold denotes the best while Underline indicates the second-best results. ^a indicates the method uses multiple source datasets, ^b denotes the implementation is based on the author’s code.

Additional Results

Table 1. Experimental results of the proposed CORE-ReID framework and state-of-the-art algorithms (Acc %) on Market-1501 and DukeMTMC-ReID datasets. Bold denotes the best while Underline indicates the second-best results.

Table 2. Experimental results of the proposed CORE-ReID framework and state-of-the-art algorithms (Acc %) from Market-1501 and DukeMTMC-ReID source datasets to target domain MSMT17 dataset. Bold denotes the best while Underline indicates the second-best results.

Citations

If you find this code useful for your research, please consider citing our paper.

BibTeX

@article{,
  author    = {Nguyen TQ, Prima ODA, Hotta K},
  title     = {CORE-ReID: Comprehensive Optimization and Refinement through Ensemble Fusion in Domain Adaptation for Person Re-Identification.},
  journal   = {Software},
  doi       = {https://doi.org/10.3390/software3020012},
  volume    = {3},
  pages     = {227-249},
  year      = {2024},
}

MDPI and ACS Style

Nguyen, T.Q.; Prima, O.D.A.; Hotta, K. 
CORE-ReID: Comprehensive Optimization and Refinement through Ensemble Fusion in Domain Adaptation for Person Re-Identification. 
Software 2024, 3, 227-249. https://doi.org/10.3390/software3020012

AMA Style

Nguyen TQ, Prima ODA, Hotta K. 
CORE-ReID: Comprehensive Optimization and Refinement through Ensemble Fusion in Domain Adaptation for Person Re-Identification. 
Software. 2024; 3(2):227-249. https://doi.org/10.3390/software3020012

Chicago/Turabian Style

Nguyen, Trinh Quoc, Oky Dicky Ardiansyah Prima, and Katsuyoshi Hotta. 2024.
"CORE-ReID: Comprehensive Optimization and Refinement through Ensemble Fusion in Domain Adaptation for Person Re-Identification" Software 3, no. 2: 227-249.
https://doi.org/10.3390/software3020012

CORE-ReID: Comprehensive Optimization and Refinement through Ensemble fusion in Domain Adaptation for person re-identification