
软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
Journal of Software, [doi: 10.13328/j.cnki.jos.006620] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563
基于跨模态自蒸馏的零样本草图检索
∗
田加林
1
,
徐行
1
,
沈复民
1
,
申恒涛
1
1
(电子科技大学 计算机科学与工程学院, 四川 成都 611731)
通讯作者: 申恒涛, E-mail: shenhengtao@hotmail.com
摘 要: 零样本草图检索将未见类的草图作为查询样本,用于检索未见类的图像。因此,这个任务同时面临两
个挑战:草图和图像之间的模态差异以及可见类和未见类的不一致性。过去的方法通过将草图和图像投射到一个
公共空间来消除模态差异,还通过利用语义嵌入(如词向量和词相似度)来弥合可见类和未见类的语义不一致。
在本文中,我们提出了跨模态自蒸馏方法,从知识蒸馏的角度研究可泛化的特征,无需语义嵌入参与训练。具体
而言,我们首先通过传统的知识蒸馏将预训练的图像识别网络的知识迁移到学生网络。然后,通过草图和图像的
跨模态相关性,跨模态自蒸馏将上述知识间接地迁移到草图模态的识别上,提升草图特征的判别性和泛化性。为
了进一步提升知识在草图模态内的集成和传播,我们进一步地提出草图自蒸馏。通过为数据学习辨别性的且泛化
的特征,学生网络消除了模态差异和语义不一致性。我们在三个基准数据集,即 Sketchy、TU-Berlin 和 QuickDraw,
进行了广泛的实验,证明了我们提出的跨模态自蒸馏方法与当前方法相比较的优越性。
关键词: 零样本草图检索;零样本学习;跨模态检索;知识蒸馏
中图法分类号: TP311
中文引用格式: 田加林, 徐行, 沈复民, 申恒涛. 基于跨模态自蒸馏的零样本草图检索. 软件学报,2022.
http://www.jos.org.cn/1000-9825/6620.htm
英文引用格式: Tian JL, Xu X, Shen FM, Shen HT. Cross-Modal Self-Distillation for Zero-Shot Sketch-Based Image Retrieval.
Ruan Jian Xue Bao/Journal of Software, 2022 (in Chinese). http://www.jos.org.cn/1000-9825/6620.htm
Cross-Modal Self-Distillation for Zero-Shot Sketch-Based Image Retrieval
TIAN Jia-Lin
1
, XU Xing
1
, SHEN Fu-Min
1
, SHEN Heng-Tao
1
1
(School of Computer Science and Engineering, University of Electronic Science and Technology of China 611731, China)
Abstract: Zero-Shot sketch-based image retrieval uses sketches of unseen classes as query samples for retrieving images of unseen
classes. Thus, this task faces two challenges simultaneously: modal gap between sketches and images and inconsistencies between seen
and unseen classes. Previous approaches have tried to eliminate the modal gap by projecting sketches and images into a common space,
and bridged the semantic inconsistency between seen and unseen classes by using semantic embeddings (e.g., word vectors and word
similarity). In this paper, we propose a Cross-Modal Self-Distillation approach to study generalizable features from the perspective of
knowledge distillation without the involvement of semantic embeddings in training. Specifically, we first transfer the knowledge of the
pre-trained image recognition network to the student network through traditional knowledge distillation. Then, through the cross-modal
correlation of sketch and image, cross-modal self-distillation indirectly transfer the above knowledge to the recognition of sketches to
enhance the discriminative and generalizable features of sketch features. To further enhance the integration and propagation of knowledge
within the sketch modality, we further propose sketch self-distillation. By learning discriminative and generalizable features for the data,
the student network eliminates the modal gap and semantic inconsistencies. We conduct extensive experiments on three benchmark
datasets, namely Sketchy, TU-Berlin, and QuickDraw, to demonstrate the superiority of our proposed cross-modal self-distillation
approach compared to the state-of-the-art.
Key words: zero-shot sketch-based image retrieval; zero-shot learning; cross-modal retrieval; knowledge distillation
∗ 基金项目: 国家自然科学基金(61976049, 62072080, 61632007)
收稿时间: 2021-06-27; 修改时间: 2021-08-15; 采用时间: 2022-01-14; jos 在线出版时间: 2022-02-22
评论