-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathchapter2.tex
339 lines (277 loc) · 17.9 KB
/
chapter2.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
\chapter{Detector Combination} \label{chapter2}
\section{Boolean Combination of Detectors in the ROC Space}
\label{sec:boolean-combination}
This section provides information about ROC analysis and summarizes the Boolean combination in the ROC space \cite{Khreich2010-ICPR}.
A crisp detector outputs a decision or a class label (e.g., normal or anomaly) while a soft detector such as HMM assigns scores to the input samples, which can be converted to a crisp detector by setting a decision threshold on the scores.
Given the responses of a crisp detector on a validation set, the true positive rate ($tpr$) is the proportion of positives correctly classified over the total number of positive samples.
The false positive rate ($fpr$) is the proportion of negatives incorrectly classified over the total number of negative samples.
The positive (or target) class is typically the class of interest, which is the anomalous class for an ADS.
A ROC curve is a plot of $tpr$ against $fpr$ \cite{Fawcett2006}.
A crisp detector produces a single data point in the ROC space, while a soft detector produces a ROC curve by varying the decision thresholds.
In practice, an empirical ROC plot is obtained by connecting the observed ($tpr,fpr$) pairs of a soft detector at each decision threshold.
A point $a$ is \textit{superior} to another point $b$ in the ROC space, if $fpr(a) \leq fpr(b)$ and $tpr(a) \geq tpr(b)$.
The ROC convex hull (ROCCH) is therefore the outer envelope connecting superior points in the ROC space.
A ROC curve allows to visualize the performance of detectors and to select optimal operational points, without committing to a single decision threshold or to fixed error costs.
The area under the ROC curve (AUC), or under the ROCCH, provides a general measure for evaluation and selection of detectors \cite{Fawcett2006}.
The IBC is a general decision-level combination technique that attempts to select the decision thresholds (from each input detector) and the Boolean functions that maximize the overall ROCCH of the combined ensemble \cite{Khreich2010-ICPR}.
The core of IBC (only for the first iteration) is described in Algorithm~\ref{IBC1}.
%%%%%% Algorithms
\input{algos/IBC1.tex}
\input{algos/PBC.tex}
The IBC applies each Boolean function to combine the responses corresponding to each decision threshold from the first detector to those from the second detector.
Fused responses are then mapped to vertices in the ROC space, and their ROC convex hull (ROCCH) is computed.
Vertices that are superior to the ROCCH of original detectors are then selected.
The set ($\mathcal{S}$) of decision thresholds from each detector and Boolean functions corresponding to these vertices is stored, and the ROCCH is updated to include emerging vertices.
The responses corresponding to each decision threshold from the third detector are then combined with the responses of each emerging vertex, and so on, until the last detector in the pool is combined.
The BC technique yields a final ROCCH for visualization and selection of operating points, and the set of selected thresholds and Boolean functions, $\mathcal{S}$, for each vertex on the composite ROCCH to be applied during operations.
Although not shown in Algorithm~\ref{IBC1}, the original IBC algorithm can iterate by re-combining the resulting combination (of all detectors) on the ROCCH with each of the original detectors (sequentially) until the ROCCH stops improving \cite{Khreich2010-ICPR}.
However, as stated previously, combining all available detectors without pruning may be inefficient due to the redundancy in their outputs.
The sequential combinations of soft detectors, illustrated in the loop in line 16 of Algorithm~\ref{IBC1}, produces a sequence of combination rules that grows linearly with $K$ (and the number of iterations), which becomes difficult to track, analyse and understand when the value of $K$ becomes large.
In addition, the IBC algorithm is sensitive to the order in which the detectors are input for combinations, which increases the effort required to find best subset for operations.
% or to control the number of combined detectors.
\section{Synthetic Data Set}
\label{sec:synthetic}
To better explain the reason for choosing the proper combination of detectors among millions of trained detectors, in this section, examples of training, pruning and combining detectors on a synthetic data set are discussed.
For this purpose, three known synthetic data sets are chosen~\cite{Duin2000}.
\begin{enumerate}
\item Lithuanian
\item Banana (Moon)
\item Circle
\end{enumerate}
Unlike the n-dimensional real data sets, it is possible to plot synthetic data sets and to illustrate the trained models generated by them. This would assist an individual to observe what happens when unnecessary trained models are pruned and only those that can increase the detection accuracy remain.
Figure ~\ref{Figure::lithuanian}, ~\ref{Figure::banana} and ~\ref{Figure::circle} shows the Lithuanian, Banana and Circle data sets.
Figure ~\ref{Figure::lithuanian_all}, ~\ref{Figure::banana_all} and ~\ref{Figure::circle_all} show the related trained models on these data sets.
\begin{figure}[]
\centering
\includegraphics[scale=0.6]{figs/dataset_Lithuanian}
\caption{Lithuanian Data Set}
\label{Figure::lithuanian}
\end{figure}
\begin{figure}[]
\centering
\includegraphics[scale=0.6]{figs/dataset_Banana}
\caption{Banana (Moon) Data Set}
\label{Figure::banana}
\end{figure}
\begin{figure}[]
\centering
\includegraphics[scale=0.6]{figs/dataset_Circle}
\caption{Circle Data Set}
\label{Figure::circle}
\end{figure}
Another purpose of this example is to show that we can reach the same level of detection by combining very simple and naïve model detectors, instead of spending time to find the proper model and to train that model on the data set, which is usually a time consuming task. In this example, hundreds of linear detectors are trained the data set. In order to have diverse learner detectors, we changed the ratio of data and their label during training to obtain as many diverse detectors as possible.
Another purpose of this example is to show that we can reach the same level of detection by combining simple and naïve model detectors, instead of spending time to find the proper model and to train that model on the data set. In this example, hundreds of linear detectors are trained on data set. In order to have diverse learner detectors, the ratio of data and their label during the training process are changed to obtain as many diverse detectors as possible.
Figure~\ref{Figure::lithuanian_all}, ~\ref{Figure::banana_all} and ~\ref{Figure::circle_all}show different trained detectors on these synthetic data sets respectively.
None of them, however, can detect all the classes. Yet by combining these simple detectors, a better data set classification is achieved as it is shown later.
\begin{figure}[H]
\centering
\includegraphics[scale=0.6]{figs/AllClassifiersLithuanian}
\caption{Linear detectors trained on the Lithuanian Data Set}
\label{Figure::lithuanian_all}
\end{figure}
\begin{figure}[H]
\centering
\includegraphics[scale=0.6]{figs/AllClassifiersBanana}
\caption{Linear detectors trained on the Banana (Moon) Data Set}
\label{Figure::banana_all}
\end{figure}
\begin{figure}[H]
\centering
\includegraphics[scale=0.6]{figs/AllClassifiersCircle}
\caption{Linear detectors trained on the Circle Data Set}
\label{Figure::circle_all}
\end{figure}
By selecting the right models and combining them the detection rate ma increase, as shown in Figures~\ref{fig:ROC_comparison_banana}, ~\ref{fig:ROC_comparison_lithuanian} and ~\ref{fig:ROC_comparison_circle}. The figures on the left show the original detectors and their ROC before combination. After selecting the proper detectors and combining them together (i.e. after just one iteration) the accuracy portrayed on the right figure is achievable, which show the combined detectors (in yellow) and their improved ROC curve. The next step is to show whether combination provides a better result or not, and how to obtain the same result just by combining parts of the trained models.
\begin{figure}[H]
\centering
\begin{adjustbox}{minipage=\linewidth,scale=1.1}
\begin{subfigure}[b]{0.5\columnwidth}
\centering
\includegraphics[width=\linewidth]{figs/ROCHLeft-Lithuanian}
\caption{ROC before combination}
\label{fig:ROC_left_lithuanian}
\end{subfigure}
\begin{subfigure}[b]{0.5\columnwidth}
\centering
\includegraphics[width=\linewidth]{figs/ROCHRight-Lithuanian}
\caption{ROC after combination}
\label{fig:ROC_right_lithuanian}
\end{subfigure}
\caption{ROC comparison on Lithuanian data set before and after first round of combination}
\label{fig:ROC_comparison_lithuanian}
\end{adjustbox}
\end{figure}
\begin{figure}[H]
\centering
\begin{adjustbox}{minipage=\linewidth,scale=1.1}
\begin{subfigure}[b]{0.5\columnwidth}
\centering
\includegraphics[width=\linewidth]{figs/ROCHLeft-Banana}
\caption{ROC before combination}
\label{fig:ROC_left_banana}
\end{subfigure}
\begin{subfigure}[b]{0.5\columnwidth}
\centering
\includegraphics[width=\linewidth]{figs/ROCHRight-Banana}
\caption{ROC after combination}
\label{fig:ROC_right_banana}
\end{subfigure}
\caption{ROC comparison on Banana data set before and after first round of combination}
\label{fig:ROC_comparison_banana}
\end{adjustbox}
\end{figure}
\begin{figure}[H]
\centering
\begin{adjustbox}{minipage=\linewidth,scale=1.1}
\begin{subfigure}[b]{0.5\columnwidth}
\centering
\includegraphics[width=\linewidth]{figs/ROCHLeft-Circle}
\caption{ROC before combination}
\label{fig:ROC_left_circle}
\end{subfigure}
\begin{subfigure}[b]{0.5\columnwidth}
\centering
\includegraphics[width=\linewidth]{figs/ROCHRight-Circle}
\caption{ROC after combination}
\label{fig:ROC_right_circle}
\end{subfigure}
\caption{ROC comparison on Circle data set before and after first round of combination}
\label{fig:ROC_comparison_circle}
\end{adjustbox}
\end{figure}
For each data set, it is concluded there is no single detector that can classify the data set properly.
\begin{figure}[t!] % "[t!]" placement specifier just for this example
\begin{subfigure}{0.48\textwidth}
\includegraphics[width=\linewidth]{figs/Lithuanian/20All-Classifiers}
\caption{} \label{fig:Lithuanian_all_single_a}
\end{subfigure}\hspace*{\fill}
\begin{subfigure}{0.48\textwidth}
\includegraphics[width=\linewidth]{figs/Lithuanian/43All-Classifiers}
\caption{} \label{fig:Lithuanian_all_single_b}
\end{subfigure}
\medskip
\begin{subfigure}{0.48\textwidth}
\includegraphics[width=\linewidth]{figs/Lithuanian/44All-Classifiers}
\caption{} \label{fig:Lithuanian_all_single_c}
\end{subfigure}\hspace*{\fill}
\begin{subfigure}{0.48\textwidth}
\includegraphics[width=\linewidth]{figs/Lithuanian/65All-Classifiers}
\caption{} \label{fig:Lithuanian_all_single_d}
\end{subfigure}
\medskip
\begin{subfigure}{0.48\textwidth}
\includegraphics[width=\linewidth]{figs/Lithuanian/118All-Classifiers}
\caption{} \label{fig:Lithuanian_all_single_e}
\end{subfigure}\hspace*{\fill}
\begin{subfigure}{0.48\textwidth}
\includegraphics[width=\linewidth]{figs/Lithuanian/146All-Classifiers}
\caption{} \label{fig:Lithuanian_all_single_f}
\end{subfigure}
\caption{Some of best LDAs ( Linear Discriminant Analysis) that can be trained on the Lithuanian data set} \label{fig:Lithuanian_all_single}
\end{figure}
As it is shown in Figure~\ref{fig:Lithuanian_all_single}, with just one Linear Discriminant Analysis (LDA), it is not possible to classify the entire classes with 100\% accuracy. To obtain a more accurate result, however, two of these detectors may be chosen and combined. As shown in Figure~\ref{fig:Dataset_ROC_Lithuanian}, a detector from Figure~\ref{fig:Lithuanian_all_single_a} and detectors from ~\ref{fig:Lithuanian_all_single_d} are selected and combined.
\begin{figure}[H]
\centering
\begin{adjustbox}{minipage=\linewidth,scale=1.1}
\begin{subfigure}[b]{0.5\columnwidth}
\centering
\includegraphics[width=\linewidth]{figs/Lithuanian/20Dataset-ROC}
\caption{First selected LDA}
\label{fig:Dataset_ROC_Lithuanian_a}
\end{subfigure}
\begin{subfigure}[b]{0.5\columnwidth}
\centering
\includegraphics[width=\linewidth]{figs/Lithuanian/65Dataset-ROC}
\caption{Second selected LDA}
\label{fig:Dataset_ROC_Lithuanian_b}
\end{subfigure}
\caption{Selected LDA and their respective point compare to current ROC}
\label{fig:Dataset_ROC_Lithuanian}
\end{adjustbox}
\end{figure}
If these two detectors with an $AND$ operation are combined, as shown in Figure~\ref{fig::combined_lithuanian_roc}, , a more reasonable result is obtained. The red point is the mapping of the combined detectors. If new ROC based on this new generated point will be recalculated, a greater AUC is achieved.
\begin{figure}[H]
\centering
\includegraphics[scale=0.6]{figs/Lithuanian/9999999-combined_All-Classifiers}
\caption{Selected detectors and the respective combined point on the ROC curve}
\label{fig::combined_lithuanian_roc}
\end{figure}
A similar approach can be applied to the Banana data set. As it is shown in Figure~\ref{fig:Banana_all_single}, like the Lithuania data set, it is not possible to classify the data set completely with one detector. In this specific example, at least three detectors are needed to precisely classify the data in an accurate way.
\begin{figure}[t!] % "[t!]" placement specifier just for this example
\begin{subfigure}{0.48\textwidth}
\includegraphics[width=\linewidth]{figs/Banana/2All-Classifiers}
\caption{} \label{fig:Banana_all_single_a}
\end{subfigure}\hspace*{\fill}
\begin{subfigure}{0.48\textwidth}
\includegraphics[width=\linewidth]{figs/Banana/12All-Classifiers}
\caption{} \label{fig:Banana_all_single_b}
\end{subfigure}
\medskip
\begin{subfigure}{0.48\textwidth}
\includegraphics[width=\linewidth]{figs/Banana/22All-Classifiers}
\caption{} \label{fig:Banana_all_single_c}
\end{subfigure}\hspace*{\fill}
\begin{subfigure}{0.48\textwidth}
\includegraphics[width=\linewidth]{figs/Banana/54All-Classifiers}
\caption{} \label{fig:Banana_all_single_d}
\end{subfigure}
\medskip
\begin{subfigure}{0.48\textwidth}
\includegraphics[width=\linewidth]{figs/Banana/147All-Classifiers}
\caption{} \label{fig:Banana_all_single_e}
\end{subfigure}\hspace*{\fill}
\begin{subfigure}{0.48\textwidth}
\includegraphics[width=\linewidth]{figs/Banana/160All-Classifiers}
\caption{} \label{fig:Banana_all_single_f}
\end{subfigure}
\caption{Some of best LDA ( Linear Discriminant Analysis) that can be trained on Banana data set} \label{fig:Banana_all_single}
\end{figure}
Given the fact that three detectors are needed for data set classification, combining the selected detectors shown in Figure ~\ref{fig:Banana_all_single_a}, Figure~\ref{fig:Banana_all_single_c} and Figure\ref{fig:Banana_all_single_f} may result in achieving the highest accuracy among all other detectors.
Figure~\ref{fig::combined_banana_roc} portraits the whole mentioned detectors in one plot
\begin{figure}[H]
\centering
\includegraphics[scale=0.6]{figs/Banana/9999999-combined_selected_before_combination}
\caption{Best potential selected detectors on Banana data set}
\label{fig::combined_banana_roc}
\end{figure}
These examples show that it is not always necessary to train complex models, which takes time and resources, over the data set to obtain the best result possible. The same result can be achieved by combining two or more simple models, which can be trained much faster. Also by increasing the number these simple models, most of the cases (including the outliers) in the data set may be covered.
Moreover, due to the imperfection nature of likely every model, training the complex models would not guarantee the best achievable result. Therefore, a combination is needed to enhance the outcome.
Since it is not possible to train and to combine thousands of models, our approach on how to select a set of meaningful models to improve the expected result will be shown in the next chapter.
The Circle data set is a good example where combining in an iterative way improves the accuracy shown in Figure~\ref{fig:Circle_all_single}.
W. Since it is not possible to combine any two detectors which could classify the whole data set, the entire detectors are selected and combined two by two iteratively.
The result of classifying iteratively the whole data set with the help of a few simple detectors is shown in Figure ~\ref{fig::combined_circle_roc}.
\begin{figure}[H] % "[t!]" placement specifier just for this example
\begin{subfigure}{0.48\textwidth}
\includegraphics[width=\linewidth]{figs/Circle/11-two-circle}
\caption{} \label{fig:Circle_all_single_a}
\end{subfigure}\hspace*{\fill}
\begin{subfigure}{0.48\textwidth}
\includegraphics[width=\linewidth]{figs/Circle/22-two-circle}
\caption{} \label{fig:Circle_all_single_b}
\end{subfigure}
\medskip
\begin{subfigure}{0.48\textwidth}
\includegraphics[width=\linewidth]{figs/Circle/33-two-circle}
\caption{} \label{fig:Circle_all_single_c}
\end{subfigure}\hspace*{\fill}
\begin{subfigure}{0.48\textwidth}
\includegraphics[width=\linewidth]{figs/Circle/44-two-circle}
\caption{} \label{fig:Circle_all_single_d}
\end{subfigure}
\medskip
\begin{subfigure}{0.48\textwidth}
\includegraphics[width=\linewidth]{figs/Circle/55-two-circle}
\caption{} \label{fig:Circle_all_single_e}
\end{subfigure}\hspace*{\fill}
\begin{subfigure}{0.48\textwidth}
\includegraphics[width=\linewidth]{figs/Circle/66-two-circle}
\caption{} \label{fig:Circle_all_single_f}
\end{subfigure}
\caption{Series of 2 selected LDA ( Linear Discriminant Analysis) that can be trained and combined on the Circle data set} \label{fig:Circle_all_single}
\end{figure}
\begin{figure}[H]
\centering
\includegraphics[scale=0.6]{figs/Circle/9999999-combined_selected_before_combination}
\caption{Best potential selected detectors on the Circle data set}
\label{fig::combined_circle_roc}
\end{figure}