
22
22.5
23
23.5
24
24.5
25
25.5
26
26.5
27
27.5
360 365 370 375 380 385 390 395 400
Value
Time
Observation
Truth IMR
EWMA
SCREEN
ARX
Truth
IMR
SCREEN
ARX
EWMA
Observation
Figure 1: An example segment of sensor readings
In short, the anomaly detection method does n ot expect
the minimized mistakes in practice, whereas the constraint-
based repairing is not effective in addressing the temporal
nature of errors.
1.3 Intuition of Our Proposal
Since completely automatic data repairing might not work
well in repairing t ime series data (such as SCREEN [22] ob-
served in our ex periments in Figure 9), enlightened by the
idea of utilizing master data (a single repository of high-
quality data) in data repairing [9], we employ some labeled
truth of dirty observations to advance the repair. The truth
can be obtained either by manual labeling or automatically
by more reliable sources. For instance, accurate locations
are manually marked in the map by user check-in activ-
ities (and utilized to repair th e imprecise GPS readings).
Moreover, periodical automatic labeling may take place in
certain scenarios, e.g., precise equipments report accurate
air quality data (as labeled values) in a relatively long sens-
ing period, while crowd and participatory sensing generates
unreliable observations in a constant manner [28].
Being aware of both the error nature in anomaly detec-
tion and the minimum change principle in data repairing,
we propose iterative minimum repairing (IMR). The phi-
losophy beh ind the proposed iterative minimum repairing
is that the h igh confi dence repairs in the former iterations
could help the latter repairing. Specifically, IMR minimally
changes one point a time to obtain the most confident repair
only, referring to t he minimum change principle in data re-
pairing that human or systems always try to minimize their
mistakes. The high confidence repairs, together with the
labeled truth of error points, are utilized to learn an d en-
hance the temporal nature of errors in anomaly detection,
and thus generate more accurate repair candidates in the
latter iterations.
Example 1. Figure 1 presents an example segment of sen-
sor readings, denoted by black line. Suppose that sensor er-
rors occur in the period from time point 370 to 385, where
the observations are shifted from the truth, e.g., owing to
granularity mismatch or unit error. To repair the errors,
the truth of several observations are labeled, including time
points {370, 371, 372, 379, 387}.
Existing speed constraint-based cleaning (SCREEN) [22]
could not effectively repair such continuous errors in a pe-
riod (which is indeed also observed in Example 1 in [22]).
The reason is that speed constraints, restricting the amount
of value changing relative to time difference, can detect sharp
deviations such as from time point 369 to 370, but not con-
tinuous errors, e.g., in 383 and 384. The exponentially
weighted moving average (EWMA) [13] algorithm also hard
to find a proper way to clean the trace. These two methods
have similar repair trace.
By considering the predication values in anomaly detection
as repairs (see m ore details in Section 2), the result of AR X
based repairing is also reported. ARX, considering the errors
between truth and observations, shows better repair results
than EWMA and SCREEN methods.
Finally, our proposed IMR approach, with both error pred-
ication and minimum change considerations, obtains repairs
closest to the truth.
The iterative minimum repairing leads to new challenges:
(1) whether the repairing process converges; and (2) how to
efficiently/incrementally update the parameter of the tem-
poral model over the repaired d ata after each iteration. Both
issues in anomaly repairing are not considered in the anomaly
detection studies.
Contributions. Our major contributions in this paper are
summarized as follows.
(1) We formalize the anomaly repairing problem, given a
time series with some points having labeled truth, in Section
2. The adaption of existing anomaly detection techniques
(such as AR and ARX) is introduced for anomaly repairing.
(2) We devise an iterative minimum-change-aware repair-
ing algorithm IMR, in Section 3. Remarkably, we illustrate
that the ARX-based approach (in Section 2) is indeed a spe-
cial case of IMR with static parameter (Prop osition 2).
(3) We study the convergence of IMR in various scenarios,
in Section 4. In particular, the convergence is explicitly ana-
lyzed for the special case of I MR(1) with order p = 1, which
is sufficient to achieve high repair accuracy in p ractice (as
shown in the experiments in Section 6). We prove that under
certain inputs, the converged repair result could be directly
calculated without iterative computing (Proposition 8).
(4) We design efficient pruning and incremental computa-
tion for parameter estimation in each repair iteration, in Sec-
tion 5. Rather than performing parameter estimation over
all the n points, matrices for parameter estimation could be
pruned by simply removing rows with value 0 (Proposition
9). It is also remarkable that the incremental computation
among different repair iterations (Proposition 10) could fur-
ther reduce the complexity of parameter estimation from
O(n) to O(1).
(5) Experiments on real datasets with both real and syn-
thetic errors, in Section 6, demonstrate that ou r IMR method
shows significantly better repair performance than the state-
of-the-art approaches, includ ing the aforesaid anomaly de-
tection and constraint-based repairing.
Table 1 lists the notations frequently used in this pa-
per. All the proofs can be found in the full technical report
(http://ise.thss.tsinghua.edu.cn/sxsong/doc/anomaly.pdf).
2. PRELIMINARIES
This section first introduces the problem of anomaly re-
pairing. We then adapt the existing anomaly detection mod-
els for anomaly repairing, i.e., AR without considering la-
beled data and ARX supporting labeled data.
1047
文档被以下合辑收录
评论