Fine-grained remote sensing image segmentation is essential for accurately identifying detailed objects in remote sensing images. Recently, vision transformer models (VTMs) pre-trained on large-scale datasets have demonstrated strong zero-shot generalization. However, directly applying them to specific tasks may lead to domain shift. We introduce a novel end-to-end learning paradigm combining knowledge guidance with domain refinement to enhance performance. We present two key components: the Feature Alignment Module (FAM) and the Feature Modulation Module (FMM). FAM aligns features from a CNN-based backbone with those from the pretrained VTM's encoder using channel transformation and spatial interpolation, and transfers knowledge via KL divergence and L2 normalization constraint. FMM further adapts the knowledge to the specific domain to address domain shift. We also introduce a fine-grained grass segmentation dataset and demonstrate, through experiments on two datasets, that our method achieves a significant improvement of 2.57 mIoU on the grass dataset and 3.73 mIoU on the cloud dataset. The results highlight the potential of combining knowledge transfer and domain adaptation to overcome domain-related challenges and data limitations. The code and checkpoints are available at GitHub.
|
Framework: Overview of the proposed framework that integrates knowledge transfer and domain adaptation for fine-grained remote sensing image segmentation. Knowledge transfer is achieved through a feature alignment module and two loss constraints KL divergence loss \( \mathcal{L}_{\text{kl}}\) and L2 normalization loss \( \mathcal{L}_{\text{mse}}\). Domain generalization is facilitated by a feature modulation module and supervised with cross-entropy loss \( \mathcal{L}_{\text{ce}}\) and and auxiliary loss \( \mathcal{L}_{\text{aux}}\). In summary, we propose a novel end-to-end learning paradigm combining knowledge guidance with domain refinement. |
Method | mIoU ↑ | OA ↑ | F1 ↑ |
---|---|---|---|
FCN | 47.47 | 67.85 | 61.99 |
PSPNet | 47.95 | 69.12 | 62.55 |
DeepLabV3+ | 47.95 | 68.97 | 62.50 |
UNet | 48.17 | 69.77 | 62.34 |
SegFormer | 48.29 | 68.93 | 62.82 |
Mask2Former | 44.93 | 65.90 | 58.91 |
DINOv2 | 47.57 | 71.54 | 61.70 |
KTDA (Ours) | 50.86 | 74.26 | 65.01 |
Method | mIoU ↑ | OA ↑ | F1 ↑ |
---|---|---|---|
MCDNet | 33.85 | 69.75 | 42.76 |
SCNN | 32.38 | 71.22 | 52.41 |
CDNetv1 | 34.58 | 68.16 | 45.80 |
KappaMask | 42.12 | 76.63 | 68.47 |
UNetMobv2 | 47.76 | 82.00 | 56.91 |
CDNetv2 | 43.63 | 78.56 | 70.33 |
HRCloudNet | 43.51 | 77.04 | 71.36 |
KTDA (Ours) | 51.49 | 83.55 | 60.08 |
@misc{ktda,
title={Knowledge Transfer and Domain Adaptation for Fine-Grained Remote Sensing Image Segmentation},
author={Shun Zhang and Xuechao Zou and Kai Li and Congyan Lang and Shiying Wang and Pin Tao and Tengfei Cao},
year={2024},
eprint={2412.06664},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.06664},
}
This work was partially supported by the Natural Science Foundation of Qinghai Province under Grant No. 2024-ZJ-708 and the National Natural Science Foundation of China under Grant No. 62072027.