Archives of Mining Sciences 51, Issue 4 (2006) 475–492

Transkrypt

Archives of Mining Sciences 51, Issue 4 (2006) 475–492
Archives of Mining Sciences 51, Issue 4 (2006) 475–492
475
MAREK SIKORA*, BEATA SIKORA **
APPLICATION OF MACHINE LEARNING FOR PREDICTION
A METHANE CONCENTRATION IN A COAL-MINE
ZASTOSOWANIE ALGORYTMU MASZYNOWEGO UCZENIA DO PREDYKCJI
STĘŻENIA METANU W KOPALNI WĘGLA KAMIENNEGO
Applications of machine learning methods for creation rule-based data model used in prediction
of a methane concentration in the excavation are described in the paper. Data coming from a methane
concentration monitoring system and methodology of their transformation into a form acceptable by
analytic algorithms that have been used are presented in the second chapter. Next chapter describes the
rules induction algorithm used for prediction. Results of the analysis that has been performed on data
coming from a coal-mine are presented at the end of the paper.
Keywords: prediction of a methane concentration, prediction by means of rules, knowledge discovery
W artykule przedstawiono pomysł zastosowania inteligentnych technik komputerowych do eksploracyjnej analizy danych pochodzących z systemu monitorowania zagrożeń związanych z wydzielaniem
metanu w kopalniach węgla kamiennego. Celem stawianym zastosowanym metodom analitycznym jest
predykcja stężenia metanu mierzonego przez wybrany metanomierz z wyprzedzeniem dziesięciominutowym i godzinnym.
Spośród różnych metodologii generowania systemów umożliwiających predykcję (m.in. systemy
rozmyte, sztuczne sieci neuronowe, metody statystyczne) w artykule wybrano algorytm indukcji reguł
o konkluzjach w postaci funkcji liniowych. Przedstawiony algorytm charakteryzuje się jednym z najszybszych czasów analizy oraz dobrymi wynikami predykcji uzyskiwanymi na ogólnodostępnych danych
benchmarkowych.
Istotną cechą zastosowanego algorytmu jest również to, że wyniki analizy, a więc syntetyczny opis
analizowanego zbioru danych jest stosunkowo łatwy do interpretacji przez użytkownika. Z punktu widzenia
dziedziny znanej jako odkrywanie wiedzy, w bazach danych jest to bardzo istotna cecha.
*
**
INSTITUTE OF COMPUTER SCIENCES, SILESIAN UNIVERSITY OF TECHNOLOGY; AKADEMICKA 16, 44-100 GLIWICE, POLAND; e-mail: [email protected]
EMAG CENTRE, LEOPOLDA 31, 40-282 KATOWICE, POLAND.
INSTITUTE OF MATHEMATICS, SILESIAN UNIVERSITY OF TECHNOLOGY, KASZUBSKA 2 , 44-100 GLIWICE, POLAND;
e-mail: [email protected]
476
Dane poddane analizie pochodziły z wyrobiska znajdującego się na kopalni niezagrożonej tąpnięciami. Na rysunku pierwszym przedstawiono schemat rejonu, w którym znajduje się rozważane przez nas
wyrobisko, widoczne jest tam również rozmieszczenie czujników.
Graficzna analiza szeregów czasowych odzwierciedlających wskazania metanomierzy i anemometrów (rys. 2) wykazała, że największą dynamikę stężenia metanu obserwuje się na wylocie ze ściany.
W badaniach podjęto zatem próbę predykcji wskazań metanomierza M32.
Dane pomiarowe zbierane były z dziesięciosekundowym interwałem czasowym. Do celów badań
dane poddano agregacji tworząc dwa zbiory danych, w których kolejne rekordy zawierały: maksymalne
wartości mierzonych wartości w okresach jednominutowych (był to zbiór danych dla predykcji dziesięciominutowej), maksymalne wartości mierzonych wartości w okresach dziesięciominutowych (był to
zbiór danych dla predykcji godzinowej).
W celu umożliwienia zastosowania metod analitycznych wykorzystujących paradygmat maszynowego
uczenia, dostępny zbiór danych należało poddać modyfikacjom. Dane pobrane z systemu monitorowania
reprezentowane są przez zbiór rekordów, pomiędzy którymi istnieje związek temporalny, tymczasem
zastosowany w artykule algorytm analizuje tabele, w których każdy wiersz jest niezależny. Zatem, informacja o stanie danego procesu w danej chwili czasu (w tym o dynamice zmian parametrów opisujących
ten proces) musi być zawarta w jednym wierszu. W rozdziale drugi przedstawiono sposób, w jaki możliwe
jest przejście z reprezentacji danych uzyskanych wprost z systemu monitorowania (Tab. 1) do reprezentacji
akceptowanej przez wykorzystany algorytm analityczny (Tab. 2).
W rozdziale drugim wyspecyfikowano także zbiór zmiennych niezależnych: AN31 – wskazania
anemometru AN31; AN32 – wskazania anemometru AN32; MM32 – wskazania metanomierza MM32;
Wydobycie, DAN31 – suma wskazań AN31 za ostatnie dziesięć minut; DAN32 – suma wskazań AN32
za ostatnie dziesięć minut; DMM32 – suma wskazań MM32 za ostatnie dziesięć minut. Zmiennej zależnej
nadano nazwę MM32_Pred.
W rozdziale trzecim dokładnie opisano zastosowany algorytm analityczny, który umożliwia generowanie reguł o liniowych konkluzjach (1). Algorytm buduje regułę w taki sposób, aby część warunkowa reguły
opisywała jak największa liczbę obiektów ze zbioru treningowego (2) przy jednoczesnym ograniczeniu
wariancji zmiennej zależnej. Wielowymiarowy model liniowy pozwalający dla danej reguły wyznaczyć
wartość zmiennej zależnej znajduje się w jej konkluzji. Algorytm jest heurystyczny i jako kryterium
optymalności w czasie budowy reguły wykorzystuje wyrażenie (3). W rozdziale trzecim omówiono także
metody optymalizacji (w tym upraszczania) otrzymanego regułowego modelu danych.
Rozdział czwarty zawiera wyniki przeprowadzonych analiz. Analizę prowadzono na wydzielonych
zbiorach danych, efektywność wyznaczonych modeli sprawdzono na niezależnych zbiorach testowych.
Obiektywną miarą efektywności był błąd RMS (4) popełniany przez wyznaczone modele, jako miarę
subiektywną przyjęto skomplikowanie (możliwość interpretacji przez użytkownika) wyznaczonego
modelu.
Metodę proponowaną w artykule porównano z metodami statystycznymi (regresja wielowymiarowa,
ARIMA) oraz z metodą stochastyczną (sieci neuronowe). Wyniki eksperymentów dla predykcji dziesięciominutowej podano w tabeli trzeciej, wyniki dla predykcji godzinowej podano w tabeli czwartej.
Wizualnie, rzeczywisty szereg czasowy stężenia metanu rejestrowanego przez metanomierz M32 oraz
szereg przewidywany przez model pokazano na rysunkach trzecim i czwartym.
Przeprowadzone badania wykazały, że zastosowana metoda pozwoliła na uzyskanie najmniejszego
błędu predykcji przy jednoczesnym zachowaniu przejrzystości wyznaczonego modelu. Metoda charakteryzowała się również najkrótszym czasem analizy.
Słowa kluczowe: predykcja stężenia metanu, predykcja za pomocą reguł, odkrywanie wiedzy w bazach
danych
477
1. Introduction
A rules induction method based on a set of examples is one of the most popular
techniques of machine learning. Discovering nontrivial, true, and in advance unknown
patterns in data (represented by rules that have been discovered) and creation a classification or prediction system based on these patterns is the main task of rules induction
algorithms. Therefore systems that automatically (or semi-automatically) realized rules
induction and classification or prediction tasks can be treated as members of a wider
group of intelligent computer systems.
Among industrial applications, intelligent computer systems are mainly used in
solving problems connected with control (e.g. fuzzy and neural controllers) (Czogała
& Łęski 2000; Tadeusiewicz, 2003; Yager & Filev, 1994). These tasks are more prediction than classification ones.
A group of computer systems that are not directly connected with control problems is
the group of monitoring systems that monitor a production process and hazards connected
with this process. It can be recognized that, functionality of these systems secures complete monitoring and visualization for any industrial process (of production, of machines
and equipment work, of hazards monitoring). In majority of applications, data gathered
by the systems are exploited for current visualization and reporting only.
At present, producers and users of monitoring systems more and more frequently point
at necessity of analysis of information collected by these systems, especially analysis in
the context of discovering knowledge about the process that is monitored.
Information collected in databases of monitoring systems is usually numerical, data
are gathered with high frequency and are usually burden with an uncertainty (implying,
among others, from transmission breaks or sensors distortions).
An idea of application intelligent computer techniques for data analysis coming from
a system monitoring hazards connected with a methane emission in coal-mines is presented in the paper. Applied analytic methods are required to make prediction of methane
concentration by a selected methane detector ten minutes and one hour in advance.
An induction algorithm with conclusions in linear functions form (Quinlan, 1992b,
1993) has been chosen from among various methodologies of generating systems that
enable prediction (among others: fuzzy systems (Czogała & Łęski, 2000; Yager & Filev,
1994), artificial neural networks (Tadeusiewicz, 2003), statistic methods (Box & Jenkins,
1994; Dixon, 1992). The presented algorithm is characterized by one of the best time of
analysis and good prediction results obtained on freely available benchmark data.
The fact that analysis results, that is the synthetic description of an analyzed data set,
are relatively easy for interpretation by a user, is also a characteristic of the algorithm. It
is very important advantage from the point of view of knowledge discovery in databases
(Michalski & Kaufman, 1998).
478
2. Data acquisition and preparation
Data that have been analyzed came from the excavation placed in a mine free from
rock bumps hazards. For a selected data set we had measurements not only from methane detectors, but also from anemometers at our disposal. We also had approximate
information about production in each shift.
A location plan of the region (Bojko, 2004), in which the considered excavation is
placed, is presented in Fig. 1. Sensors location can also be seen in the figure.
The analysis of methane concentration time series registered by particular sensors
showed that the highest rate of changes of the concentration can be observed at the
AN31
1.07.2002
1.05.2002
1.06.2002
M31
M32
M34
M33
AN32
M35
Fig. 1. Scheme of the excavation from which the analyzed data came
Rys. 1. Schemat wyrobiska, z którego pochodziły analizowane dane
479
face end. In our research we took an attempt of prediction readings of a sensor which
is placed at the face end (the methane detector M32). Readings of anemometers AN31
and AN32 were selected as auxiliary variables, data about atmospheric pressure were
inaccessible (although, it is known that in the period we took into consideration changes
were not big).
Results of initial analysis gathered from research conducted by dr Bojko (Bojko,
2004) showed weekly periodicity in registered methane concentrations. For the sake of
the fact that we had information about production during successive shifts for the week
from 10th to 16th of June 2002, just this period was put to the analysis.
Measurements data were gathered with ten-seconds time interval. For the purpose
of our research, data were put to aggregation creating two data sets which successive
records included:
• maximum values of measured quantities in one-minute periods,
• maximum values of measured quantities in ten-minutes periods.
To preliminary analysis original data (with noises) and data after filtration (smoothing)
were put by means of first order low-pass recursive filter (the Brown filter (Box & Jenkins,
1994)). The smoothing was also used for data coming from anemometers.
Combining analysis of readings of smoothed measurements registered by anemometers and methane detectors, dependences between readings of anemometers and methane
detectors can be noticed.
For the purpose of prediction, operating on smoothed data does not seem to be a good
approach since it lowers the real (registered) methane content in atmosphere. Obviously,
the filter that we exploited decompose a time series into nonstationary component (smoothed series) and stationary component (noises). Therefore, the series can be considered
as a sum of the two components and put to separate analysis (e.g. statistical). However,
in our research we used raw data for the purpose of prediction methane concentration.
Before start the prediction by methods that are considered in this paper there is necessary to convert data sets to appropriate form, because gathering data received from
a monitoring system we have usually in our disposal a set of records between which there
is temporal relation. Meanwhile the algorithm we have applied analyses tables whose
each row is independent, thus information about temporal relations between analyzed
variables and their rates of change need to be contained in one row.
An example of a data set gathered from a monitoring system in which data acquisition
is done each second, is presented in Table 1.
If a prediction of future values of a selected variable (dependent variable) is the aim of
the analysis, then there is necessary to move a value of the variable forward by a number
of rows such that a demanded time passage is reflected (in this manner we obtain the
dependent variable). A prediction horizon is established arbitrarily by a user.
Therefore, values of independent variables are delayed with respect to a value of
a dependent variable by a prediction horizon. In some situations it occurs that the strongest
480
[m/s]
Anemometer AN31
2
1
0
Monday
Tuesday
Wednesday
[m/s]
Thursday
Friday
Saturday
Sunday
Monday
Sunday
Monday
Sunday
Monday
Face end. Anemometer AN32
2
1
0
Monday
Tuesday
Wednesday
[% CH4]
Thursday
Friday
Saturday
Face end. Metanometer M32
2
1
0
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Fig 2. Weekly course of air flow velocity and methane concentration in the longwall face
within the period from 10th to 16th of June 2002.
Rys. 2. Tygodniowy przebieg prędkości przepływu powietrza i stężenia metanu w ścianie
w okresie od 10 do 16 czerwca 2002
influence on a value of the dependent variable has values of independent variables for
delays bigger than the established by the user prediction horizon (Box & Jenkins, 1994;
Sikora & Kozielski 2005). Correlation factors (Sobczyk, 1997) and self-correlation
and partial self-correlation factors (Box & Jenkins, 1994) are usually exploited for
determining delays of particular independent variables that influence the most strongly
on values of the dependent variable.
481
TABLE 1
Raw data taken from a monitoring system
TABLICA 1
Dane „surowe” pobrane z systemu monitorowania
Time [s]
1
2
:
n
Sensor x
1
1
:
2.3
Sensor y
123
145
:
146
Sensor z
0.4
0.42
:
0.47
For limiting a number of considered records, an aggregation of variables values that
consist in replacement every m records with one record in which aggregated values of
the variables appear is carried out (as aggregating functions are usually used: average,
sum, minimum, maximum).
Selection of an aggregating function is made depending on the analysis purpose (e.g.
if a system has to warn against exceed some critical value by the dependent variable,
then it is necessary to exploit a maximum function as the aggregating function).
Irrespective of the method of data preparing, transformations that have been discussed
keep the rate of changes of independent variables only then, if a column specifying
successive time whiles still appears in a data set.
It is necessary to introduce new attributes (new variables) to express the rate of
changes of independent variables and get rid of the column representing time going by,
simultaneously. Information about rate of changes is determined by computing so called
absolute increments, relative increments or dynamics indicators (Box & Jenkins, 1994;
StatSoft 2001a). The new attributes may then include exact values of differences or sums of
particular (original) attributes or express these changes in the linguistic form (Tab. 2).
TABLE 2
The transformed data set
TABLICA 2
Przekształcony zbiór danych
Sensor x
Sensor x
dynamics
Sensor y
Sensor y
dynamics
Sensor z
–
–
123
–
0.4
–
–
145
5
0.42
–
–
156
2
0.56
1
–
without
changes
146
–1
0.47
–
without
changes
increases
quickly
decreases
144
0
0.43
decreases
1
Sensor z
Sensor z + 2
dynamics (dependent variable)
–
–
0.0.4
0.42
0.47
482
An order of separate rows in the data set prepared like this is not important, since
whole information about the process dynamics at a given moment is included in a single
row of the data set. As it is shown in Tab. 2, it is necessary to remove from such data set
some initial rows in which no values occur.
The considered data set came from a monitoring system in which data acquisition
was done every ten seconds. As it was mentioned, for the analysis purpose data were
aggregated by choosing maximal values of readings from each minute (ten-minutes
prediction) and each ten minutes (one-hour prediction). On the basis of the analysis of
mutual correlation, self-correlation, and partial self-correlation, a single record in the
analyzed data set had the following form:
• AN31 – readings of the anemometer AN31,
• AN32 – reading of the anemometer AN32,
• MM32 – reading of the methane detector MM32,
• Production (values of the variable are written by a user – related to a whole
shift),
• DAN31 – sum of readings of AN31 for the last ten minutes,
• DAN32 – sum of readings of AN32 for the last ten minutes,
• DMM32 – sum of readings of MM32 for the last ten minutes,
• MM32_Pred – readings of MM32, that will occur in ten minutes (or in one hour
– for the one-hour prediction).
The task of DAN31, DAN32, and DMM32 fields is to reflect the rate of changes of
monitored values. Three indicators giving information about the rate of changes (denoted
by yd): the difference (yd = yt – yt–10), the sum ( yd =
9
y
i 0
t i ), and the product (yd = yt–10/yt)
have been tested. The sum was chosen for the sake of the fact, that for that indicator the
best results (the lowest error) after three test analyses were obtained.
Each of prepared data sets (let us recall that the set for the ten-minutes prediction
is different from the one-hour one) was divided into two subsets containing 66 and 34
percent of all records, respectively. Larger sets were put to further analysis (so there were
training sets), smaller ones were exploited for verification of obtained results (test sets).
3. A machine learning algorithm – an induction of rules with linear conclusions
Rules with linear conclusions induction algorithm (called m5 algorithm) was applied
for realization a prediction task. The algorithm had been proposed by R. Quinlan (Quinlan,
1992b; Quinlan, 1993). It belongs to the family of algorithms that learn based on examples
they are provided with (so called training set). The task of the algorithm is to generalize
the knowledge given it and to write dependences appearing in data in the form of some
483
knowledge description language, in synthetic way. In the case of the m5 algorithm, a rule
with linear conclusion is a single formula of such language.
In more formal notation, we write a training set as a table DT = (U, A∪{d}) in which
U is a set of examples (rows in the analyzed set), A is a vector of independent variables
that described the examples (here there are attributes: AN31, AN32, MM32, Production,
DAN31, DAN32, DMM32). A feature d is the dependent feature, values of which we
want to predict. Each independent feature a ∈ A may be treated as a function a:U→Da
that to each example from U assigns a value of a. A set Da is called the domain of the
attribute a.
The m5 algorithm enable to create rules of the form (1).
IF ai1 ∈ Vai1 and ... and aik ∈ Vaik THEN d = b + b1aj1 +…+ bjmam
(1)
where {ai1 ,.., aik} ⊆ A; Vai1 ⊆ Da1, ..., Vaik ⊆ Dak; {aj1, ..., ajm} ⊆ A; b,b1,..,bm∈R
(R denotes the real numbers set).
The single component a ∈ Va is called the conditional descriptor. It can be easily
observed that some independent variables different from these ones which are in
a conclusion of a rule may occur in a premise of the rule. The single conditional descriptor
can take one of the following forms:
• a ∈ [v1, v2] where v1, v2 ∈ Da;
• a > v where v ∈ Da;
• a < v where v ∈ Da.
The task of the m5 algorithm is to determine which features describing the analyzed
data set are placed in a premise of a rule, what are the ranges of descriptors forming the
rule premise, and which features and coefficients create conclusions of the rule.
In the standard version, the m5 algorithm form, in reality, a decision tree (Quinlan,
1992a) in leaves of which many-dimensional linear models are placed. Then, for needs
of a user, the tree is transformed into the set of rules of the form (1). The idea of the
algorithm is similar to the idea of creation so called regression trees (Breiman et al., 1994).
Now, a procedure according to which the m5 algorithm works will be briefly described.
Rules of the form (1) are built iteratively, at first a rule contains no conditional
descriptors. The algorithm looks over all variables that belong to and for each a ∈ A
looks for an optimal form of the descriptor a ∈ Va. The descriptor that has been created
is added to conditional part of the rule. After adding each new descriptor or modifying
already existing one, a stop criterion is checked. If the stop criterion is satisfied, then
a multidimensional regression model (Breiman et al., 1994; StatSoft, 2001a) is determined
in rules conclusions. If the stop criterion is not satisfied, then the algorithm adds a new
conditional descriptor or limits a range of Va in one of already existing descriptors. The
process of rules creation is finished while the whole training set is covered by the rules that
have been created. The described algorithm can be presented in the following way:
484
Begin
RUL:=∅; P:=U; G:=U; Create a rule r without premises and without conclusions
Until G≠∅ repeat
For each conditional attribute a
In a set P find the best descriptor (a, Va)
Limit the set P
Add the descriptor (a, Va) to conditional part of the rule r
If the rule r satisfies stop criterion then
Determine repeated regression parameters for objects match(r)
The determined model place in the conclusion of the rule;
RUL:=RUL∪{r}; G:=G-match(r); Extend the set (P:=U-G);
Create a new rule r without premises and without conclusions
End // Until
End
G is the set of objects which are not covered by determined rules yet. P is the set of
objects considered during a specific rule creation. The set match(r) is the set of objects
from U for which the left part of the rule activates. For the rule r of the form (1) it is
defined in the following way:
match(r) = {u ∈ U: ∀l ∈ {1, ..., card(A)} a1(u) ∈ Val}
(2)
The best descriptor is found by setting for a given variable a the boundary point
q ∈ Da in such manner that a partition of the set P into subsets P<q and P>q minimizes
expected variance of the dependent variables in the subsets. In other words, as the optimal
at a given stage of the rule creation are recognized these feature a and boundary point
q which minimize a value of the expression:
P
P
err V ( P) ( P q V(Pq ) P q V (P q))
(3)
where V(P) denoted a variance of the set P.
After the partition of the training set into two parts, in further stage of rule generation as P we choose this subset from P>q, P<q for which we obtain smaller variance
of the dependent variable (it corresponds to the row „Limit the set P” in the algorithm
description).
A rule satisfies a stop criterion if there is satisfied one of the following conditions:
attaining established by a user variance of dependent variable in rule conclusion; recognizing by the created rule a number of examples less than a critical value acceptable by
a user (even while the value of the variance is not acceptable); non of boundary points
that are considerable return a positive value of the criterion determined by (3).
485
Second of the above mentioned conditions secure from generating rules that cover
very big number of examples. The third one does not allow to unnecessary adding
conditional descriptors which do not cause better expectation of values (ranges) of the
dependent variable.
In practical applications of the algorithm the number of created rules is usually quite
big. The part of rules that have been determined is too well matched to training data.
So called overlearning phenomenon can be observed. Then, rules too much fitted to
data describe disturbances and noises of various type which may occur in data. Weaker
ability of generalization of the determined rules set is the consequence of such situation.
It means that the obtained rule model may wrongly predict the value of the dependent
variable for the cases that do not appear in the training set.
In order to limit the number of parameters occurring in rules premises and rules
conclusion, m5 exploits exhaustive approach considering all possible subsets of the
set of features appearing in a given rule. During features removing whole descriptors
are removed from rules premises, and during limitation of the number of parameters
in the rule conclusion, a new form of the regression function is determined. In order to
evaluate an error made by the simplified rule, the average absolute error made by the
rule on training data is exploited. Additionally, the error is multiply by the expression
(n + v)/(n – v), where n is the number of objects in the training set, and v is the number
of parameters appearing in the conclusion of the rule.
Many research carried out by means of the m5 algorithm on benchmark data and
data describing real problems (also connected with coal-mining (Sikora & Krzykawski,
2005)), show that it is one of the best algorithms of this kind all over the world, which
is additionally characterized by unusually short work time.
4. Data analysis
An analysis of correlations, self-correlation, and partial self-correlations was the
first step of our research. Verification which delays of independent variables has the
strongest influence on a value of the dependent variable was the aim of the analysis.
A choice of a method of representations of independent variables rate of changes that
has been described in second chapter was the result of our works (it is reflected in sums
– DAN31, DAN32, DMM32).
In further works, a multiple regression model for calculation a value of the dependent
variable was determined by means of Statistica program. The model has the following
form:
MM32_Pred=0.455-0.047AN31+0.018AN32+0.21MM32+
+0.000015Production-0.01DAN31+0.008DAN32+0.05DMM32
486
MM32_Pred=0.22-0.03AN31+0.026AN32+0.18MM32+
+0.000056Production-0.0016DAN31+0.0035DAN32+0.05DMM32
The former equation allows to predict the variable MM32_Pred in situations when
the variable MM32 takes on values less than 1. For greater values of the variable MM32,
the latter equation is used.
It follows that according to the model, the strongest influence on values of MM32_Pred
have earlier and accumulated (summed up) values of MM32 and DMM32. Values of
AN31 and DAN31 influence negatively the dependent variable, while values AN32 and
DAN32 influence positively.
The Statistica Neural Network package was exploited to apply a mechanism that
enables describing nonlinear dependences between independent variables and the dependent one. Twenty eight different architectures of neural networks have been tested.
The best results were obtained by means of a three-layers network with sigmoidal activation functions (Tadeusiewicz, 2003; StatSoft 2001b). Interestingly, the network has
only three input neurons (MM32, DMM32, Production). It means that only these three
variables are used for prediction of the dependent variable.
For evaluation an efficiency of the regression model that has been obtained and the
neural network as well, the RMS error defined by the formula (4) was applied.
RMS(T) =
T
(d ( x ) p( x ))
i 1
i
2
i
(4)
In the formula (4): T is a testing set of objects, d(xi) – is a real value of the
dependent variable for the testing object xi, p(xi) – is a value of the dependent
variable which is predicted by the model that has been obtained.
TABLE 3
RMS error on the testing data set obtained by various analytic methods
TABLICA 3
Błąd RMS na zbiorze danych testowych uzyskiwany przez różne metody analityczne
Method
RMS error
Statistical analysis (multivariate
linear regression)
0.081
Statistica Neural Network
0.081
M5 algorithm
M5 algorithm
0.071
0.076
Remarks
Break point
MM32_Pred = 0.824
94% explained variance
MPL three-layers network.
Activation functions in consecutive layers:
linear, sigmoidal, linear with saturation
32 rules
The number of rules was limited to 4
487
The analysis of gathered data was carried out, by means of the m5 algorithm, in two
ways. The former way consisted in starting the algorithm without any parameters that
limit their work, in effect thirty two rules describing dependences between independent
variables and the dependent variable were obtained. Such solution, although allowed to
get the smallest error of prediction, was unsatisfactory for us because the simultaneous
analysis of thirty two rules does not allow to interpret knowledge included in the rules
easily and unequivocal. In the latter way, the m5 algorithm was activated with parameters, such as to prevent induction rules in which the variance of the dependent variable
was less then 20% of its general variance in the training set. The necessity of covering
at least twenty percent examples from an analyzed set by the determined rule was the
other constraint. In this manner four rules with good prediction abilities were obtained
(Table 3). Rules that have been determined are presented below; the range of the dependent variable which is included in the rule is given next to rules:
If Production = 0 and DMM32 <= 10 then
MM32_Pred = 0.04 + 0.063 DMM32 + 0.24 MM32
[0.4, 1.3] StdErr = 0.03
If Production > 0 and DMM32 <= 10 then
MM32_Pred = 0.29 + 0.061 DMM32 + 0.26 MM32 – 0.013 DAN31
[0.3, 1.3] StdErr = 0.07
If Production = 0 and DMM32 > 10 then
MM32_Pred = 0.11 + 0.038 DMM32 + 0.41 MM32 + 0.029 DAN31
– 0.3 AN31
[0.6, 1.5] StdErr = 0.06
If Production > 0 and DMM32 >10 then
MM32_Pred = 0.19 + 0.054 DMM32 + 0.000135 Output
– 0.016 DAN31+ 0.12 MM32 + 0.01 DAN32
[0.8, 2] StdErr = 0.1
Making analysis of rules which have been determined by our algorithm it is more
clear (more than, for example, in the case of the multiple regression) that:
• in the case of low methane contents in the atmosphere that stays low for some
time interval (in our case, for ten minutes) and lack of production during a shift,
future values of methane concentration are also rather low, and both production and
ventilation have no influence on methane contents in atmosphere (the first rule);
• in the case of low methane contents in atmosphere that stay low for some time
interval (in our case, for ten minutes) and conducted production, predicted values of methane concentration are average; ventilation (especially readings of the
488
anemometer AN31) influences negatively future values of methane concentration
(the second rule);
• for remaining average and high methane concentration values in some time interval
(in our case, for ten minutes), future values of methane concentration will be also
high and average (the linear model in conclusions of the rule decides about this
fact); for the third and fourth rule it can be seen that there exists positive production
influence (if it is conducted), positive influence of ventilation through top road in
which the anemometer AN32 is installed, and negative influence of ventilation
through bottom road in which the anemometer AN31 is installed.
The above conclusions were drawn only on basis of forms of rules that have been
determined, probably a domain expert would be able to interpret better the obtained
rules set. A rules set limited like that can be easily interpreted and, as it is visible in the
third table, still allows to get a prediction error less than in statistic or neural networks
methods.
Average error made by the rules model was equal to 0.04, the error variance was
equal to 0.003. The biggest errors were made for big methane values (the biggest mistake: 0.54; the model predicted 1.16, while the real value was 1.7). An analysis of the
fourth figure shows that the obtained model lowers expected values in local maxima of
the methane time series. Information about the period of the methane time series (here
the period is clear, fig. 4) can be used for accuracy improving (but we did not undertake
such attempts).
The graph of methane concentration in the place in which the biggest error was made
(in principle, three the biggest errors) is presented in the third fig. 3.
1.8
[% CH4]
1.6
1.4
1.2
1
0.8
M32
M32 Predicted
Fig. 3. Time series of methane concentration in the place in which the model makes the biggest error
Rys. 3. Przebieg stężenia metanu w miejscu, w którym model popełnia największy błąd
489
Looking over records of the analyzed data set which correspond with the
fragment presented in the third figure, it can be noticed that there are no changes
of independent variables values (also the describing the rate of changes ones)
that could testify to such sudden change of the dependent variable value (methane concentration). Therefore probably none of models that appeal to earlier
measured values would be able to predict accurate violent increasing of methane
concentration values lasting not longer than five minutes.
The set that allow to make one-hour prediction was the second of analyzed
sets. In this paper we quote research results conducted on data smoothed by means
of so called F4253H filter (Statistica package; smoothing by means of several
moving median (StatSoft, 2001a), the filter F4253H better reflects characteristic
of an original time series than recursive filter).
Original data, smoothed data, and data that have been obtained after prediction
are presented in the fourth fig. 4.
2
[% CH4]
1.5
1
0.5
M32
M32 Smoothed
M32 Predicted
Fig. 4. One-hour methane time series – original, smoothed and predicted data
Rys. 4. Godzinowy przebieg stężenia metanu – dane oryginalne, wygładzone i przewidywane
Similarly as in the previous case, the m5 algorithm without constraints was
applied to available data, which gave eleven rules, and the algorithm with
constraints identical as in the case of the ten-minutes prediction was applied,
490
which gave four rules. Based on the analysis of correlation and self-correlation,
the ARIMA model (Box & Jenkins, 1994) describing smoothed data was also
determined for comparison purposes. Results are presented in Table 4.
TABLE 4
RMS error on the testing data set obtained by various analytic methods
TABLICA 4
Błąd RMS na zbiorze danych testowych uzyskiwany przez różne metody analityczne
Method
M5 algorithm
M5 algorithm
RMS error
0.036
0.057
ARIMA
0.086
Remarks
11 rules
The number of rules was limited to 4
One parameter of autoregression and
one parameter of moving average
Prediction results obtained by the model determined for smoothed data were compared
with raw data. In this manner, the RMS error value has been obtained.
The error was equal to 0.10, average error’s value was equal to 0.07 (variance 0.005),
the biggest error’s value was equal to 0.41.
5. Conclusions
Application of machine learning methods consisting in induction rules with linear
conclusions to the problem of methane prediction in a coal-mining excavation has been
presented in the paper. The m5 algorithm that enables rules induction has been presented
and necessary modifications of the input data set into a form acceptable by the algorithms
have been shown, too.
Two data sets which reflect the task of ten-minutes and one-hour prediction were put
to the analysis. Results obtained by the m5 algorithm allowed to get the smallest prediction error of the dependent variable value on the testing data set. The m5 algorithm was
compared with statistical methods (multivariate regression, ARIMA) and with stochastic
methods like training artificial neural networks. Except getting the smallest prediction
error, the m5 algorithm is also characterized by the shortest work time which may be
significant during the analysis of big data sets. A rule data model that has been obtained
by means of the m5 algorithm allowed relatively simple explanation dependences between
values of independent features (earlier values of methane concentration, the method and
intensity of air ventilation, conducted production) and values of the dependent variable
(future methane concentration).
491
Analysis of cases in which the determined data model makes the bigger errors shows
that the model lowers evaluation of methane concentrations for high methane concentrations. If methane concentrations, registered by the detector MM32, are treated as the
time series, then the Fourier spectrum analysis of the series is possible (or self-correlation
analysis). It clearly follows from this analysis, that there exists a periodicity of both the
analyzed series (ten-minutes prediction, one-hour prediction), the period is equal to about
twelve hours (the highest values of the periodogram indicate the period). This fact can
be used for a modification of values that are predicted by the rule model. For example,
if the model predicts methane concentrations greater than one percent, then predicted
methane concentration can be evaluated up proportionally to this exceeding.
To recapitulate, machine learning methods that are so far sporadically used to the coalmining data analysis may be the alternative for analytic methods exploited up to now.
Applying machine learning methods to coal-mining equipment diagnostics (Sikora
& Widera, 2004) and monitoring environment in dewater pumps stations (Sikora &
Krzykawski, 2005) shows that these methods can be used in mining industry on many
other fields.
The important advantage of machine learning methods is also the fact, that the determined data model can be easily interpreted by a user (especially by the domain expert).
REFERENCES
B o j k o , B., 2004. Dynamika stężenia metanu w wyrobiskach górniczych. Instytut Mechaniki Górotworu PAN, promotor
dr hab. inż. Stanisław Wasilewski, Kraków.
B o x , G.E.P., J e n k i n s , G.M., 1994. Time series analysis: forecasting and control. Prentice Hall, New Jersey,
3th edition.
B r e i m a n , L., F r i e d m a n , J.H., O l s h e n , R.A., S t o n e , C.J., 1994. Classification and Regression Trees. Wadsworth, Belmont CA.
C z o g a ł a , E., Ł ę s k i , J., 2000. Fuzzy and Neuro-Fuzzy Intelligent Systems. Studies in Fuzziness and Soft Computing
vol. 47. Springer-Verlag Company.
D i x o n , W.D., 1992. A statistical analysis of monitored data for methane prediction. Ph. D. Thesis. University of Nottingham. Dept. of Mining Engineering, May 1992.
M i c h a l s k i , R.S., K a u f m a n n , K., 1998. Data Mining and Knowledge Discovery: A Review of Issues and a Multistrategy Approach. Kubat M., Bratko I. Michalski R.S. (ed.): Machine Learning and Data Mining: Methods and
Applications. John Wiley and Sons.
Q u i n l a n , R., 1992a. C4.5 Programs for Machine Learning. Morgan Kaufman Publishers, San Mateo, California.
Q u i n l a n , J.R., 1992b. Learning with continuous classes. Proc. of the International Conference on Artificial Intelligence
(AI`92), Singapore, World Scientific.
Q u i n l a n , R., 1993. Combining instance-based learning and model-based learning. Proc of the Tenth International
Conference on Machine Learning (ML-93).
S i k o r a , M., W i d e r a , D., 2004. Identyfication of diagnostics states for dewater pumps working in abyssal mining
pump stations, Proceedings of the XV International conference on system sciences, Wrocław, Poland, p. 394-402.
S i k o r a , M., K o z i e l s k i , M., 2005. Application of hybrid data exploration methods to prediction tasks. Materiały
konferencji Technologie Przetwarzania Danych. Politechnika Poznańska. Poznań, s. 195-215.
492
S i k o r a , M., K r z y k a w s k i , D., 2005. Zastosowanie metod eksploracji danych do analizy wydzielania się dwutlenku
węgla w pomieszczeniach stacji odwadniania kopalń węgla kamiennego. Mechanizacja i Automatyzacja Górnictwa,
6/413, Katowice, 2005, s. 29-40.
S o b c z y k , M., 1997. Statystyka. Wydawnictwo Naukowe PWN.
StatSoft Polska. 2001a. Statistica 5.0 podręcznik użytkownika tom II-IV. StatSoft, Kraków.
StatSoft Polska. 2001b. Statistica Neural Network podręcznik użytkownika. Statsoft, Kraków.
T a d e u s i e w i c z , R., 2003. Sieci neuronowe. Akademicka Oficyna Wydawnicza RM, Warszawa.
Ya g e r , R.R., F i l e v , D.P., 1994. Essential of Fuzzy Modelling and Control. John Wiley & Sons, Inc.
REVIEW BY: PROF. DR HAB. INŻ. WACŁAW TRUTWIN, KRAKÓW
Received: 08 August 2006