MMASIA ’24, December 3–6, 2024, Auckland, New Zealand Xiaoyi Han. et al.
Attentive Fire and Smoke Dete ction Model (a-FSDM) is pro-
posed. Retaining the advantages of generic detection algorithms
in feature extraction and fusion, the re-related target detection
head network is redesigned to meet the distinctive needs of FSD.
2) A new Attentive Transparency Detection Head (ATDH) is
presented, which is tailor-made to address the challenge of detect-
ing transparent re and smoke targets by enhancing the distinct
features of transparent targets while suppressing non-target feature
maps. 3) Burning Intensity (BI) is introduced as an indicator to
measure the severity of combustion, and it serves as a key feature
for subsequent downstream assessment of re damage.
2 RELATED WORK
Fire and Smoke Detection (FSD) technology can be categorized into
two types: smoke detection and ame (re) detection. Smoke detec-
tion, which utilizes the characteristics of smoke, typically allows for
earlier detection, thereby minimizing resource and economic losses.
However, it may face challenges in environments where smoke
disperses slowly or is not visible. Conversely, ame detection relies
on the distinct features of ames, oering more easily detectable
signals but often occurring at a later stage, when the re is already
more developed [45].
In terms of FSD task classication, the model described in [
38
]
involves training pre-trained VGG16 [
40
] and ResNet50 [
14
] mod-
els using a custom FSD dataset [
38
] to enhance FSD performance.
Afterwards, a novel lightweight classication network designed for
re incidents is proposed by FireNet [
17
]. Subsequently, FireNet-v2,
an improved version of FireNet, achieves a percision of 94.95 [
39
].
Moreover, a recent study [
8
] introduces a deep learning network
that combines CNN and RNN [
47
] for the purpose of forest re clas-
sication. Subsequently, a novel smoke recognition method based
on dark-channel assisted mixed attention [
9
], is proposed. How-
ever, the traditional classied FSD task is limited to determining the
existence of re, which cannot provide more valuable information,
such as re location, for reghters.
In recent years, various detection algorithms are incorporated
with the aim of enhancing FSD task systems. For example, Faster
RCNN [
37
] is introduced in [
48
]. To reduce false and missed de-
tections, additional algorithms such as SSD [
27
], R-FCN [
3
], and
YOLOv3 [
35
] are integrated by researchers [
23
], resulting in high
levels of accuracy in their respective datasets. Li et al. [
25
] demon-
strate the successful application of the widely-used DERT [
1
] to
FSD tasks. Meanwhile, the FSD network GLCT [
44
], which merges
CNN and Transformers [
43
], achieves an mAP of 80.71 in the de-
tection of early re in surveillance video images. However, despite
their eectiveness, Transformers have higher computational re-
source requirements and slower inference times than traditional
CNNs. Subsequently, Venâncio et al. [
4
] introduce YOLOv5 [
19
] to
FSD tasks, reporting improved accuracy with a smoke detection
accuracy of 85.88 and a ame detection accuracy of 72.32 on their
surveillance video database. It is noteworthy that traditional object
detection-based FSD models are constructed upon general object
detection algorithms, rather than being specically designed for
the FSD task.
Deep learning techniques are applied to image segmentation [
22
]
and scene understanding [
28
] in FSD tasks. Guan et al. [
10
] develop
MaskSU RCNN, a forest re instance segmentation approach based
on the MS RCNN model. Meanwhile, Perrolas et al. [
31
] propose
a quad-tree search-based method for localizing and segmenting
res at dierent scales. It is noteworthy that re and smoke in
early-stage FSD present considerable diculties due to their trans-
parency and lack of specicity. The complex and variable nature
of re and smoke results in suboptimal performance of traditional
semantic segmentation techniques [
18
]. Moreover, in comparison
to semantic segmentation models that require substantial compu-
tational resources for training and inference, detection algorithms
are more appropriate for FSD tasks.
Despite advancements in FSD tasks using object detection meth-
ods, computer vision-based FSD algorithms still lag behind generic
computer vision algorithms. For example, the detection of transpar-
ent foregrounds has rarely been addressed in previous FSD studies.
Moreover, generic object detection does not adequately address
subsequent re-related concerns, including BI, which is crucial for
evaluating the cost of re damage and determining the human re-
sources required to respond to a re.
In order to address the issue of the transparent foreground, a
novel FSD method, namely the Attentive Fire and Smoke Detection
Model (a-FSDM), is presented. The proposed method preserves the
strengths of traditional detection algorithms in feature extraction
and fusion while redesigning the target detection head network
specically for FSD, termed the Attentive Transparency Detection
Head (ATDH). Furthermore, it is assumed that there is a positive
correlation between the severity of re and smoke disasters and
Burning intensity (BI). This indicates that higher BI levels corre-
spond to greater impacts caused by these disasters. To this end, a
novel representation of BI is developed with the aim of facilitat-
ing evaluation for subsequent downstream tasks related to FSD.
Additionally, the proposed algorithm is compared with multiple
multi-scale object detection baselines on various FSD datasets.
3 METHOD
The proposed method contains two parts: the Attentive Fire Smoke
Detection Model (a-FSDM) and the Representation of Burning In-
tensity (BI). In Fig. 1, the a-FSDM is presented, and it consists of
three main components: Feature Extraction, Feature Fusion Group-
ing, and the Attentive Transparency Detection Head (ATDH) for
the FSD task. Feature Fusion Grouping is used to fuse semantic and
spatial information from various convolutional layers to prevent
information loss. The ATDH is responsible for classication, regres-
sion, and centerness.
3.1 Attention Transparency Detection Head
To enhance the model’s comprehension and detection of transpar-
ent ame or smoke targets, a new Attentive Transparency Detection
Head (ATDH) is proposed. The feature map output from the Neck
undergoes four convolutional layers, followed by Global Average
Pooling (GAP) and Max Pooling (MP). GAP ensures that higher
values contribute more to the overall training, while MP assigns
importance to the maximum value [
49
]. To avoid downplaying the
评论