KTDA

Abstract

Fine-grained remote sensing image segmentation is essential for accurately identifying detailed objects in remote sensing images. Recently, vision transformer models (VTMs) pre-trained on large-scale datasets have demonstrated strong zero-shot generalization. However, directly applying them to specific tasks may lead to domain shift. We introduce a novel end-to-end learning paradigm combining knowledge guidance with domain refinement to enhance performance. We present two key components: the Feature Alignment Module (FAM) and the Feature Modulation Module (FMM). FAM aligns features from a CNN-based backbone with those from the pretrained VTM's encoder using channel transformation and spatial interpolation, and transfers knowledge via KL divergence and L2 normalization constraint. FMM further adapts the knowledge to the specific domain to address domain shift. We also introduce a fine-grained grass segmentation dataset and demonstrate, through experiments on two datasets, that our method achieves a significant improvement of 2.57 mIoU on the grass dataset and 3.73 mIoU on the cloud dataset. The results highlight the potential of combining knowledge transfer and domain adaptation to overcome domain-related challenges and data limitations. The code and checkpoints are available at GitHub.

Method

Framework: Overview of the proposed framework that integrates knowledge transfer and domain adaptation for fine-grained remote sensing image segmentation. Knowledge transfer is achieved through a feature alignment module and two loss constraints KL divergence loss \( \mathcal{L}_{\text{kl}}\) and L2 normalization loss \( \mathcal{L}_{\text{mse}}\). Domain generalization is facilitated by a feature modulation module and supervised with cross-entropy loss \( \mathcal{L}_{\text{ce}}\) and and auxiliary loss \( \mathcal{L}_{\text{aux}}\). In summary, we propose a novel end-to-end learning paradigm combining knowledge guidance with domain refinement.

Results

Visualization on the Grass Dataset.

Low

Middle Low

Middle

Middle High

High

Visualization on the Cloud Dataset.

Clear Sky

Thick Cloud

Thin Cloud

Cloud Shadow

Evaluation on Fine-Grained Grass Segmentation.
Method	mIoU ↑	OA ↑	F1 ↑
FCN	47.47	67.85	61.99
PSPNet	47.95	69.12	62.55
DeepLabV3+	47.95	68.97	62.50
UNet	48.17	69.77	62.34
SegFormer	48.29	68.93	62.82
Mask2Former	44.93	65.90	58.91
DINOv2	47.57	71.54	61.70
KTDA (Ours)	50.86	74.26	65.01

Evaluation on Fine-Grained Grass Segmentation.

Method

mIoU ↑

OA ↑

F1 ↑

FCN

47.47

67.85

61.99

PSPNet

47.95

69.12

62.55

DeepLabV3+

47.95

68.97

62.50

UNet

48.17

69.77

62.34

SegFormer

48.29

68.93

62.82

Mask2Former

44.93

65.90

58.91

DINOv2

47.57

71.54

61.70

KTDA (Ours)

50.86

74.26

65.01

Evaluation on Fine-Grained Cloud Segmentation.
Method	mIoU ↑	OA ↑	F1 ↑
MCDNet	33.85	69.75	42.76
SCNN	32.38	71.22	52.41
CDNetv1	34.58	68.16	45.80
KappaMask	42.12	76.63	68.47
UNetMobv2	47.76	82.00	56.91
CDNetv2	43.63	78.56	70.33
HRCloudNet	43.51	77.04	71.36
KTDA (Ours)	51.49	83.55	60.08

Evaluation on Fine-Grained Cloud Segmentation.

Method

mIoU ↑

OA ↑

F1 ↑

MCDNet

33.85

69.75

42.76

SCNN

32.38

71.22

52.41

CDNetv1

34.58

68.16

45.80

KappaMask

42.12

76.63

68.47

UNetMobv2

47.76

82.00

56.91

CDNetv2

43.63

78.56

70.33

HRCloudNet

43.51

77.04

71.36

KTDA (Ours)

51.49

83.55

60.08

BibTeX

@misc{ktda, title={Knowledge Transfer and Domain Adaptation for Fine-Grained Remote Sensing Image Segmentation}, author={Shun Zhang and Xuechao Zou and Kai Li and Congyan Lang and Shiying Wang and Pin Tao and Tengfei Cao}, year={2024}, eprint={2412.06664}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2412.06664}, }

Knowledge Transfer and Domain Adaptation for
Fine-Grained Remote Sensing Image Segmentation

Abstract

Method

Results

Visualization on the Grass Dataset.

Visualization on the Cloud Dataset.

Quantitative Comparison with Existing Methods

BibTeX

Acknowledgements