Archives of Mining Sciences 51, Issue 4 (2006) 475–492
Transkrypt
Archives of Mining Sciences 51, Issue 4 (2006) 475–492
Archives of Mining Sciences 51, Issue 4 (2006) 475–492 475 MAREK SIKORA*, BEATA SIKORA ** APPLICATION OF MACHINE LEARNING FOR PREDICTION A METHANE CONCENTRATION IN A COAL-MINE ZASTOSOWANIE ALGORYTMU MASZYNOWEGO UCZENIA DO PREDYKCJI STĘŻENIA METANU W KOPALNI WĘGLA KAMIENNEGO Applications of machine learning methods for creation rule-based data model used in prediction of a methane concentration in the excavation are described in the paper. Data coming from a methane concentration monitoring system and methodology of their transformation into a form acceptable by analytic algorithms that have been used are presented in the second chapter. Next chapter describes the rules induction algorithm used for prediction. Results of the analysis that has been performed on data coming from a coal-mine are presented at the end of the paper. Keywords: prediction of a methane concentration, prediction by means of rules, knowledge discovery W artykule przedstawiono pomysł zastosowania inteligentnych technik komputerowych do eksploracyjnej analizy danych pochodzących z systemu monitorowania zagrożeń związanych z wydzielaniem metanu w kopalniach węgla kamiennego. Celem stawianym zastosowanym metodom analitycznym jest predykcja stężenia metanu mierzonego przez wybrany metanomierz z wyprzedzeniem dziesięciominutowym i godzinnym. Spośród różnych metodologii generowania systemów umożliwiających predykcję (m.in. systemy rozmyte, sztuczne sieci neuronowe, metody statystyczne) w artykule wybrano algorytm indukcji reguł o konkluzjach w postaci funkcji liniowych. Przedstawiony algorytm charakteryzuje się jednym z najszybszych czasów analizy oraz dobrymi wynikami predykcji uzyskiwanymi na ogólnodostępnych danych benchmarkowych. Istotną cechą zastosowanego algorytmu jest również to, że wyniki analizy, a więc syntetyczny opis analizowanego zbioru danych jest stosunkowo łatwy do interpretacji przez użytkownika. Z punktu widzenia dziedziny znanej jako odkrywanie wiedzy, w bazach danych jest to bardzo istotna cecha. * ** INSTITUTE OF COMPUTER SCIENCES, SILESIAN UNIVERSITY OF TECHNOLOGY; AKADEMICKA 16, 44-100 GLIWICE, POLAND; e-mail: [email protected] EMAG CENTRE, LEOPOLDA 31, 40-282 KATOWICE, POLAND. INSTITUTE OF MATHEMATICS, SILESIAN UNIVERSITY OF TECHNOLOGY, KASZUBSKA 2 , 44-100 GLIWICE, POLAND; e-mail: [email protected] 476 Dane poddane analizie pochodziły z wyrobiska znajdującego się na kopalni niezagrożonej tąpnięciami. Na rysunku pierwszym przedstawiono schemat rejonu, w którym znajduje się rozważane przez nas wyrobisko, widoczne jest tam również rozmieszczenie czujników. Graficzna analiza szeregów czasowych odzwierciedlających wskazania metanomierzy i anemometrów (rys. 2) wykazała, że największą dynamikę stężenia metanu obserwuje się na wylocie ze ściany. W badaniach podjęto zatem próbę predykcji wskazań metanomierza M32. Dane pomiarowe zbierane były z dziesięciosekundowym interwałem czasowym. Do celów badań dane poddano agregacji tworząc dwa zbiory danych, w których kolejne rekordy zawierały: maksymalne wartości mierzonych wartości w okresach jednominutowych (był to zbiór danych dla predykcji dziesięciominutowej), maksymalne wartości mierzonych wartości w okresach dziesięciominutowych (był to zbiór danych dla predykcji godzinowej). W celu umożliwienia zastosowania metod analitycznych wykorzystujących paradygmat maszynowego uczenia, dostępny zbiór danych należało poddać modyfikacjom. Dane pobrane z systemu monitorowania reprezentowane są przez zbiór rekordów, pomiędzy którymi istnieje związek temporalny, tymczasem zastosowany w artykule algorytm analizuje tabele, w których każdy wiersz jest niezależny. Zatem, informacja o stanie danego procesu w danej chwili czasu (w tym o dynamice zmian parametrów opisujących ten proces) musi być zawarta w jednym wierszu. W rozdziale drugi przedstawiono sposób, w jaki możliwe jest przejście z reprezentacji danych uzyskanych wprost z systemu monitorowania (Tab. 1) do reprezentacji akceptowanej przez wykorzystany algorytm analityczny (Tab. 2). W rozdziale drugim wyspecyfikowano także zbiór zmiennych niezależnych: AN31 – wskazania anemometru AN31; AN32 – wskazania anemometru AN32; MM32 – wskazania metanomierza MM32; Wydobycie, DAN31 – suma wskazań AN31 za ostatnie dziesięć minut; DAN32 – suma wskazań AN32 za ostatnie dziesięć minut; DMM32 – suma wskazań MM32 za ostatnie dziesięć minut. Zmiennej zależnej nadano nazwę MM32_Pred. W rozdziale trzecim dokładnie opisano zastosowany algorytm analityczny, który umożliwia generowanie reguł o liniowych konkluzjach (1). Algorytm buduje regułę w taki sposób, aby część warunkowa reguły opisywała jak największa liczbę obiektów ze zbioru treningowego (2) przy jednoczesnym ograniczeniu wariancji zmiennej zależnej. Wielowymiarowy model liniowy pozwalający dla danej reguły wyznaczyć wartość zmiennej zależnej znajduje się w jej konkluzji. Algorytm jest heurystyczny i jako kryterium optymalności w czasie budowy reguły wykorzystuje wyrażenie (3). W rozdziale trzecim omówiono także metody optymalizacji (w tym upraszczania) otrzymanego regułowego modelu danych. Rozdział czwarty zawiera wyniki przeprowadzonych analiz. Analizę prowadzono na wydzielonych zbiorach danych, efektywność wyznaczonych modeli sprawdzono na niezależnych zbiorach testowych. Obiektywną miarą efektywności był błąd RMS (4) popełniany przez wyznaczone modele, jako miarę subiektywną przyjęto skomplikowanie (możliwość interpretacji przez użytkownika) wyznaczonego modelu. Metodę proponowaną w artykule porównano z metodami statystycznymi (regresja wielowymiarowa, ARIMA) oraz z metodą stochastyczną (sieci neuronowe). Wyniki eksperymentów dla predykcji dziesięciominutowej podano w tabeli trzeciej, wyniki dla predykcji godzinowej podano w tabeli czwartej. Wizualnie, rzeczywisty szereg czasowy stężenia metanu rejestrowanego przez metanomierz M32 oraz szereg przewidywany przez model pokazano na rysunkach trzecim i czwartym. Przeprowadzone badania wykazały, że zastosowana metoda pozwoliła na uzyskanie najmniejszego błędu predykcji przy jednoczesnym zachowaniu przejrzystości wyznaczonego modelu. Metoda charakteryzowała się również najkrótszym czasem analizy. Słowa kluczowe: predykcja stężenia metanu, predykcja za pomocą reguł, odkrywanie wiedzy w bazach danych 477 1. Introduction A rules induction method based on a set of examples is one of the most popular techniques of machine learning. Discovering nontrivial, true, and in advance unknown patterns in data (represented by rules that have been discovered) and creation a classification or prediction system based on these patterns is the main task of rules induction algorithms. Therefore systems that automatically (or semi-automatically) realized rules induction and classification or prediction tasks can be treated as members of a wider group of intelligent computer systems. Among industrial applications, intelligent computer systems are mainly used in solving problems connected with control (e.g. fuzzy and neural controllers) (Czogała & Łęski 2000; Tadeusiewicz, 2003; Yager & Filev, 1994). These tasks are more prediction than classification ones. A group of computer systems that are not directly connected with control problems is the group of monitoring systems that monitor a production process and hazards connected with this process. It can be recognized that, functionality of these systems secures complete monitoring and visualization for any industrial process (of production, of machines and equipment work, of hazards monitoring). In majority of applications, data gathered by the systems are exploited for current visualization and reporting only. At present, producers and users of monitoring systems more and more frequently point at necessity of analysis of information collected by these systems, especially analysis in the context of discovering knowledge about the process that is monitored. Information collected in databases of monitoring systems is usually numerical, data are gathered with high frequency and are usually burden with an uncertainty (implying, among others, from transmission breaks or sensors distortions). An idea of application intelligent computer techniques for data analysis coming from a system monitoring hazards connected with a methane emission in coal-mines is presented in the paper. Applied analytic methods are required to make prediction of methane concentration by a selected methane detector ten minutes and one hour in advance. An induction algorithm with conclusions in linear functions form (Quinlan, 1992b, 1993) has been chosen from among various methodologies of generating systems that enable prediction (among others: fuzzy systems (Czogała & Łęski, 2000; Yager & Filev, 1994), artificial neural networks (Tadeusiewicz, 2003), statistic methods (Box & Jenkins, 1994; Dixon, 1992). The presented algorithm is characterized by one of the best time of analysis and good prediction results obtained on freely available benchmark data. The fact that analysis results, that is the synthetic description of an analyzed data set, are relatively easy for interpretation by a user, is also a characteristic of the algorithm. It is very important advantage from the point of view of knowledge discovery in databases (Michalski & Kaufman, 1998). 478 2. Data acquisition and preparation Data that have been analyzed came from the excavation placed in a mine free from rock bumps hazards. For a selected data set we had measurements not only from methane detectors, but also from anemometers at our disposal. We also had approximate information about production in each shift. A location plan of the region (Bojko, 2004), in which the considered excavation is placed, is presented in Fig. 1. Sensors location can also be seen in the figure. The analysis of methane concentration time series registered by particular sensors showed that the highest rate of changes of the concentration can be observed at the AN31 1.07.2002 1.05.2002 1.06.2002 M31 M32 M34 M33 AN32 M35 Fig. 1. Scheme of the excavation from which the analyzed data came Rys. 1. Schemat wyrobiska, z którego pochodziły analizowane dane 479 face end. In our research we took an attempt of prediction readings of a sensor which is placed at the face end (the methane detector M32). Readings of anemometers AN31 and AN32 were selected as auxiliary variables, data about atmospheric pressure were inaccessible (although, it is known that in the period we took into consideration changes were not big). Results of initial analysis gathered from research conducted by dr Bojko (Bojko, 2004) showed weekly periodicity in registered methane concentrations. For the sake of the fact that we had information about production during successive shifts for the week from 10th to 16th of June 2002, just this period was put to the analysis. Measurements data were gathered with ten-seconds time interval. For the purpose of our research, data were put to aggregation creating two data sets which successive records included: • maximum values of measured quantities in one-minute periods, • maximum values of measured quantities in ten-minutes periods. To preliminary analysis original data (with noises) and data after filtration (smoothing) were put by means of first order low-pass recursive filter (the Brown filter (Box & Jenkins, 1994)). The smoothing was also used for data coming from anemometers. Combining analysis of readings of smoothed measurements registered by anemometers and methane detectors, dependences between readings of anemometers and methane detectors can be noticed. For the purpose of prediction, operating on smoothed data does not seem to be a good approach since it lowers the real (registered) methane content in atmosphere. Obviously, the filter that we exploited decompose a time series into nonstationary component (smoothed series) and stationary component (noises). Therefore, the series can be considered as a sum of the two components and put to separate analysis (e.g. statistical). However, in our research we used raw data for the purpose of prediction methane concentration. Before start the prediction by methods that are considered in this paper there is necessary to convert data sets to appropriate form, because gathering data received from a monitoring system we have usually in our disposal a set of records between which there is temporal relation. Meanwhile the algorithm we have applied analyses tables whose each row is independent, thus information about temporal relations between analyzed variables and their rates of change need to be contained in one row. An example of a data set gathered from a monitoring system in which data acquisition is done each second, is presented in Table 1. If a prediction of future values of a selected variable (dependent variable) is the aim of the analysis, then there is necessary to move a value of the variable forward by a number of rows such that a demanded time passage is reflected (in this manner we obtain the dependent variable). A prediction horizon is established arbitrarily by a user. Therefore, values of independent variables are delayed with respect to a value of a dependent variable by a prediction horizon. In some situations it occurs that the strongest 480 [m/s] Anemometer AN31 2 1 0 Monday Tuesday Wednesday [m/s] Thursday Friday Saturday Sunday Monday Sunday Monday Sunday Monday Face end. Anemometer AN32 2 1 0 Monday Tuesday Wednesday [% CH4] Thursday Friday Saturday Face end. Metanometer M32 2 1 0 Monday Tuesday Wednesday Thursday Friday Saturday Fig 2. Weekly course of air flow velocity and methane concentration in the longwall face within the period from 10th to 16th of June 2002. Rys. 2. Tygodniowy przebieg prędkości przepływu powietrza i stężenia metanu w ścianie w okresie od 10 do 16 czerwca 2002 influence on a value of the dependent variable has values of independent variables for delays bigger than the established by the user prediction horizon (Box & Jenkins, 1994; Sikora & Kozielski 2005). Correlation factors (Sobczyk, 1997) and self-correlation and partial self-correlation factors (Box & Jenkins, 1994) are usually exploited for determining delays of particular independent variables that influence the most strongly on values of the dependent variable. 481 TABLE 1 Raw data taken from a monitoring system TABLICA 1 Dane „surowe” pobrane z systemu monitorowania Time [s] 1 2 : n Sensor x 1 1 : 2.3 Sensor y 123 145 : 146 Sensor z 0.4 0.42 : 0.47 For limiting a number of considered records, an aggregation of variables values that consist in replacement every m records with one record in which aggregated values of the variables appear is carried out (as aggregating functions are usually used: average, sum, minimum, maximum). Selection of an aggregating function is made depending on the analysis purpose (e.g. if a system has to warn against exceed some critical value by the dependent variable, then it is necessary to exploit a maximum function as the aggregating function). Irrespective of the method of data preparing, transformations that have been discussed keep the rate of changes of independent variables only then, if a column specifying successive time whiles still appears in a data set. It is necessary to introduce new attributes (new variables) to express the rate of changes of independent variables and get rid of the column representing time going by, simultaneously. Information about rate of changes is determined by computing so called absolute increments, relative increments or dynamics indicators (Box & Jenkins, 1994; StatSoft 2001a). The new attributes may then include exact values of differences or sums of particular (original) attributes or express these changes in the linguistic form (Tab. 2). TABLE 2 The transformed data set TABLICA 2 Przekształcony zbiór danych Sensor x Sensor x dynamics Sensor y Sensor y dynamics Sensor z – – 123 – 0.4 – – 145 5 0.42 – – 156 2 0.56 1 – without changes 146 –1 0.47 – without changes increases quickly decreases 144 0 0.43 decreases 1 Sensor z Sensor z + 2 dynamics (dependent variable) – – 0.0.4 0.42 0.47 482 An order of separate rows in the data set prepared like this is not important, since whole information about the process dynamics at a given moment is included in a single row of the data set. As it is shown in Tab. 2, it is necessary to remove from such data set some initial rows in which no values occur. The considered data set came from a monitoring system in which data acquisition was done every ten seconds. As it was mentioned, for the analysis purpose data were aggregated by choosing maximal values of readings from each minute (ten-minutes prediction) and each ten minutes (one-hour prediction). On the basis of the analysis of mutual correlation, self-correlation, and partial self-correlation, a single record in the analyzed data set had the following form: • AN31 – readings of the anemometer AN31, • AN32 – reading of the anemometer AN32, • MM32 – reading of the methane detector MM32, • Production (values of the variable are written by a user – related to a whole shift), • DAN31 – sum of readings of AN31 for the last ten minutes, • DAN32 – sum of readings of AN32 for the last ten minutes, • DMM32 – sum of readings of MM32 for the last ten minutes, • MM32_Pred – readings of MM32, that will occur in ten minutes (or in one hour – for the one-hour prediction). The task of DAN31, DAN32, and DMM32 fields is to reflect the rate of changes of monitored values. Three indicators giving information about the rate of changes (denoted by yd): the difference (yd = yt – yt–10), the sum ( yd = 9 y i 0 t i ), and the product (yd = yt–10/yt) have been tested. The sum was chosen for the sake of the fact, that for that indicator the best results (the lowest error) after three test analyses were obtained. Each of prepared data sets (let us recall that the set for the ten-minutes prediction is different from the one-hour one) was divided into two subsets containing 66 and 34 percent of all records, respectively. Larger sets were put to further analysis (so there were training sets), smaller ones were exploited for verification of obtained results (test sets). 3. A machine learning algorithm – an induction of rules with linear conclusions Rules with linear conclusions induction algorithm (called m5 algorithm) was applied for realization a prediction task. The algorithm had been proposed by R. Quinlan (Quinlan, 1992b; Quinlan, 1993). It belongs to the family of algorithms that learn based on examples they are provided with (so called training set). The task of the algorithm is to generalize the knowledge given it and to write dependences appearing in data in the form of some 483 knowledge description language, in synthetic way. In the case of the m5 algorithm, a rule with linear conclusion is a single formula of such language. In more formal notation, we write a training set as a table DT = (U, A∪{d}) in which U is a set of examples (rows in the analyzed set), A is a vector of independent variables that described the examples (here there are attributes: AN31, AN32, MM32, Production, DAN31, DAN32, DMM32). A feature d is the dependent feature, values of which we want to predict. Each independent feature a ∈ A may be treated as a function a:U→Da that to each example from U assigns a value of a. A set Da is called the domain of the attribute a. The m5 algorithm enable to create rules of the form (1). IF ai1 ∈ Vai1 and ... and aik ∈ Vaik THEN d = b + b1aj1 +…+ bjmam (1) where {ai1 ,.., aik} ⊆ A; Vai1 ⊆ Da1, ..., Vaik ⊆ Dak; {aj1, ..., ajm} ⊆ A; b,b1,..,bm∈R (R denotes the real numbers set). The single component a ∈ Va is called the conditional descriptor. It can be easily observed that some independent variables different from these ones which are in a conclusion of a rule may occur in a premise of the rule. The single conditional descriptor can take one of the following forms: • a ∈ [v1, v2] where v1, v2 ∈ Da; • a > v where v ∈ Da; • a < v where v ∈ Da. The task of the m5 algorithm is to determine which features describing the analyzed data set are placed in a premise of a rule, what are the ranges of descriptors forming the rule premise, and which features and coefficients create conclusions of the rule. In the standard version, the m5 algorithm form, in reality, a decision tree (Quinlan, 1992a) in leaves of which many-dimensional linear models are placed. Then, for needs of a user, the tree is transformed into the set of rules of the form (1). The idea of the algorithm is similar to the idea of creation so called regression trees (Breiman et al., 1994). Now, a procedure according to which the m5 algorithm works will be briefly described. Rules of the form (1) are built iteratively, at first a rule contains no conditional descriptors. The algorithm looks over all variables that belong to and for each a ∈ A looks for an optimal form of the descriptor a ∈ Va. The descriptor that has been created is added to conditional part of the rule. After adding each new descriptor or modifying already existing one, a stop criterion is checked. If the stop criterion is satisfied, then a multidimensional regression model (Breiman et al., 1994; StatSoft, 2001a) is determined in rules conclusions. If the stop criterion is not satisfied, then the algorithm adds a new conditional descriptor or limits a range of Va in one of already existing descriptors. The process of rules creation is finished while the whole training set is covered by the rules that have been created. The described algorithm can be presented in the following way: 484 Begin RUL:=∅; P:=U; G:=U; Create a rule r without premises and without conclusions Until G≠∅ repeat For each conditional attribute a In a set P find the best descriptor (a, Va) Limit the set P Add the descriptor (a, Va) to conditional part of the rule r If the rule r satisfies stop criterion then Determine repeated regression parameters for objects match(r) The determined model place in the conclusion of the rule; RUL:=RUL∪{r}; G:=G-match(r); Extend the set (P:=U-G); Create a new rule r without premises and without conclusions End // Until End G is the set of objects which are not covered by determined rules yet. P is the set of objects considered during a specific rule creation. The set match(r) is the set of objects from U for which the left part of the rule activates. For the rule r of the form (1) it is defined in the following way: match(r) = {u ∈ U: ∀l ∈ {1, ..., card(A)} a1(u) ∈ Val} (2) The best descriptor is found by setting for a given variable a the boundary point q ∈ Da in such manner that a partition of the set P into subsets P<q and P>q minimizes expected variance of the dependent variables in the subsets. In other words, as the optimal at a given stage of the rule creation are recognized these feature a and boundary point q which minimize a value of the expression: P P err V ( P) ( P q V(Pq ) P q V (P q)) (3) where V(P) denoted a variance of the set P. After the partition of the training set into two parts, in further stage of rule generation as P we choose this subset from P>q, P<q for which we obtain smaller variance of the dependent variable (it corresponds to the row „Limit the set P” in the algorithm description). A rule satisfies a stop criterion if there is satisfied one of the following conditions: attaining established by a user variance of dependent variable in rule conclusion; recognizing by the created rule a number of examples less than a critical value acceptable by a user (even while the value of the variance is not acceptable); non of boundary points that are considerable return a positive value of the criterion determined by (3). 485 Second of the above mentioned conditions secure from generating rules that cover very big number of examples. The third one does not allow to unnecessary adding conditional descriptors which do not cause better expectation of values (ranges) of the dependent variable. In practical applications of the algorithm the number of created rules is usually quite big. The part of rules that have been determined is too well matched to training data. So called overlearning phenomenon can be observed. Then, rules too much fitted to data describe disturbances and noises of various type which may occur in data. Weaker ability of generalization of the determined rules set is the consequence of such situation. It means that the obtained rule model may wrongly predict the value of the dependent variable for the cases that do not appear in the training set. In order to limit the number of parameters occurring in rules premises and rules conclusion, m5 exploits exhaustive approach considering all possible subsets of the set of features appearing in a given rule. During features removing whole descriptors are removed from rules premises, and during limitation of the number of parameters in the rule conclusion, a new form of the regression function is determined. In order to evaluate an error made by the simplified rule, the average absolute error made by the rule on training data is exploited. Additionally, the error is multiply by the expression (n + v)/(n – v), where n is the number of objects in the training set, and v is the number of parameters appearing in the conclusion of the rule. Many research carried out by means of the m5 algorithm on benchmark data and data describing real problems (also connected with coal-mining (Sikora & Krzykawski, 2005)), show that it is one of the best algorithms of this kind all over the world, which is additionally characterized by unusually short work time. 4. Data analysis An analysis of correlations, self-correlation, and partial self-correlations was the first step of our research. Verification which delays of independent variables has the strongest influence on a value of the dependent variable was the aim of the analysis. A choice of a method of representations of independent variables rate of changes that has been described in second chapter was the result of our works (it is reflected in sums – DAN31, DAN32, DMM32). In further works, a multiple regression model for calculation a value of the dependent variable was determined by means of Statistica program. The model has the following form: MM32_Pred=0.455-0.047AN31+0.018AN32+0.21MM32+ +0.000015Production-0.01DAN31+0.008DAN32+0.05DMM32 486 MM32_Pred=0.22-0.03AN31+0.026AN32+0.18MM32+ +0.000056Production-0.0016DAN31+0.0035DAN32+0.05DMM32 The former equation allows to predict the variable MM32_Pred in situations when the variable MM32 takes on values less than 1. For greater values of the variable MM32, the latter equation is used. It follows that according to the model, the strongest influence on values of MM32_Pred have earlier and accumulated (summed up) values of MM32 and DMM32. Values of AN31 and DAN31 influence negatively the dependent variable, while values AN32 and DAN32 influence positively. The Statistica Neural Network package was exploited to apply a mechanism that enables describing nonlinear dependences between independent variables and the dependent one. Twenty eight different architectures of neural networks have been tested. The best results were obtained by means of a three-layers network with sigmoidal activation functions (Tadeusiewicz, 2003; StatSoft 2001b). Interestingly, the network has only three input neurons (MM32, DMM32, Production). It means that only these three variables are used for prediction of the dependent variable. For evaluation an efficiency of the regression model that has been obtained and the neural network as well, the RMS error defined by the formula (4) was applied. RMS(T) = T (d ( x ) p( x )) i 1 i 2 i (4) In the formula (4): T is a testing set of objects, d(xi) – is a real value of the dependent variable for the testing object xi, p(xi) – is a value of the dependent variable which is predicted by the model that has been obtained. TABLE 3 RMS error on the testing data set obtained by various analytic methods TABLICA 3 Błąd RMS na zbiorze danych testowych uzyskiwany przez różne metody analityczne Method RMS error Statistical analysis (multivariate linear regression) 0.081 Statistica Neural Network 0.081 M5 algorithm M5 algorithm 0.071 0.076 Remarks Break point MM32_Pred = 0.824 94% explained variance MPL three-layers network. Activation functions in consecutive layers: linear, sigmoidal, linear with saturation 32 rules The number of rules was limited to 4 487 The analysis of gathered data was carried out, by means of the m5 algorithm, in two ways. The former way consisted in starting the algorithm without any parameters that limit their work, in effect thirty two rules describing dependences between independent variables and the dependent variable were obtained. Such solution, although allowed to get the smallest error of prediction, was unsatisfactory for us because the simultaneous analysis of thirty two rules does not allow to interpret knowledge included in the rules easily and unequivocal. In the latter way, the m5 algorithm was activated with parameters, such as to prevent induction rules in which the variance of the dependent variable was less then 20% of its general variance in the training set. The necessity of covering at least twenty percent examples from an analyzed set by the determined rule was the other constraint. In this manner four rules with good prediction abilities were obtained (Table 3). Rules that have been determined are presented below; the range of the dependent variable which is included in the rule is given next to rules: If Production = 0 and DMM32 <= 10 then MM32_Pred = 0.04 + 0.063 DMM32 + 0.24 MM32 [0.4, 1.3] StdErr = 0.03 If Production > 0 and DMM32 <= 10 then MM32_Pred = 0.29 + 0.061 DMM32 + 0.26 MM32 – 0.013 DAN31 [0.3, 1.3] StdErr = 0.07 If Production = 0 and DMM32 > 10 then MM32_Pred = 0.11 + 0.038 DMM32 + 0.41 MM32 + 0.029 DAN31 – 0.3 AN31 [0.6, 1.5] StdErr = 0.06 If Production > 0 and DMM32 >10 then MM32_Pred = 0.19 + 0.054 DMM32 + 0.000135 Output – 0.016 DAN31+ 0.12 MM32 + 0.01 DAN32 [0.8, 2] StdErr = 0.1 Making analysis of rules which have been determined by our algorithm it is more clear (more than, for example, in the case of the multiple regression) that: • in the case of low methane contents in the atmosphere that stays low for some time interval (in our case, for ten minutes) and lack of production during a shift, future values of methane concentration are also rather low, and both production and ventilation have no influence on methane contents in atmosphere (the first rule); • in the case of low methane contents in atmosphere that stay low for some time interval (in our case, for ten minutes) and conducted production, predicted values of methane concentration are average; ventilation (especially readings of the 488 anemometer AN31) influences negatively future values of methane concentration (the second rule); • for remaining average and high methane concentration values in some time interval (in our case, for ten minutes), future values of methane concentration will be also high and average (the linear model in conclusions of the rule decides about this fact); for the third and fourth rule it can be seen that there exists positive production influence (if it is conducted), positive influence of ventilation through top road in which the anemometer AN32 is installed, and negative influence of ventilation through bottom road in which the anemometer AN31 is installed. The above conclusions were drawn only on basis of forms of rules that have been determined, probably a domain expert would be able to interpret better the obtained rules set. A rules set limited like that can be easily interpreted and, as it is visible in the third table, still allows to get a prediction error less than in statistic or neural networks methods. Average error made by the rules model was equal to 0.04, the error variance was equal to 0.003. The biggest errors were made for big methane values (the biggest mistake: 0.54; the model predicted 1.16, while the real value was 1.7). An analysis of the fourth figure shows that the obtained model lowers expected values in local maxima of the methane time series. Information about the period of the methane time series (here the period is clear, fig. 4) can be used for accuracy improving (but we did not undertake such attempts). The graph of methane concentration in the place in which the biggest error was made (in principle, three the biggest errors) is presented in the third fig. 3. 1.8 [% CH4] 1.6 1.4 1.2 1 0.8 M32 M32 Predicted Fig. 3. Time series of methane concentration in the place in which the model makes the biggest error Rys. 3. Przebieg stężenia metanu w miejscu, w którym model popełnia największy błąd 489 Looking over records of the analyzed data set which correspond with the fragment presented in the third figure, it can be noticed that there are no changes of independent variables values (also the describing the rate of changes ones) that could testify to such sudden change of the dependent variable value (methane concentration). Therefore probably none of models that appeal to earlier measured values would be able to predict accurate violent increasing of methane concentration values lasting not longer than five minutes. The set that allow to make one-hour prediction was the second of analyzed sets. In this paper we quote research results conducted on data smoothed by means of so called F4253H filter (Statistica package; smoothing by means of several moving median (StatSoft, 2001a), the filter F4253H better reflects characteristic of an original time series than recursive filter). Original data, smoothed data, and data that have been obtained after prediction are presented in the fourth fig. 4. 2 [% CH4] 1.5 1 0.5 M32 M32 Smoothed M32 Predicted Fig. 4. One-hour methane time series – original, smoothed and predicted data Rys. 4. Godzinowy przebieg stężenia metanu – dane oryginalne, wygładzone i przewidywane Similarly as in the previous case, the m5 algorithm without constraints was applied to available data, which gave eleven rules, and the algorithm with constraints identical as in the case of the ten-minutes prediction was applied, 490 which gave four rules. Based on the analysis of correlation and self-correlation, the ARIMA model (Box & Jenkins, 1994) describing smoothed data was also determined for comparison purposes. Results are presented in Table 4. TABLE 4 RMS error on the testing data set obtained by various analytic methods TABLICA 4 Błąd RMS na zbiorze danych testowych uzyskiwany przez różne metody analityczne Method M5 algorithm M5 algorithm RMS error 0.036 0.057 ARIMA 0.086 Remarks 11 rules The number of rules was limited to 4 One parameter of autoregression and one parameter of moving average Prediction results obtained by the model determined for smoothed data were compared with raw data. In this manner, the RMS error value has been obtained. The error was equal to 0.10, average error’s value was equal to 0.07 (variance 0.005), the biggest error’s value was equal to 0.41. 5. Conclusions Application of machine learning methods consisting in induction rules with linear conclusions to the problem of methane prediction in a coal-mining excavation has been presented in the paper. The m5 algorithm that enables rules induction has been presented and necessary modifications of the input data set into a form acceptable by the algorithms have been shown, too. Two data sets which reflect the task of ten-minutes and one-hour prediction were put to the analysis. Results obtained by the m5 algorithm allowed to get the smallest prediction error of the dependent variable value on the testing data set. The m5 algorithm was compared with statistical methods (multivariate regression, ARIMA) and with stochastic methods like training artificial neural networks. Except getting the smallest prediction error, the m5 algorithm is also characterized by the shortest work time which may be significant during the analysis of big data sets. A rule data model that has been obtained by means of the m5 algorithm allowed relatively simple explanation dependences between values of independent features (earlier values of methane concentration, the method and intensity of air ventilation, conducted production) and values of the dependent variable (future methane concentration). 491 Analysis of cases in which the determined data model makes the bigger errors shows that the model lowers evaluation of methane concentrations for high methane concentrations. If methane concentrations, registered by the detector MM32, are treated as the time series, then the Fourier spectrum analysis of the series is possible (or self-correlation analysis). It clearly follows from this analysis, that there exists a periodicity of both the analyzed series (ten-minutes prediction, one-hour prediction), the period is equal to about twelve hours (the highest values of the periodogram indicate the period). This fact can be used for a modification of values that are predicted by the rule model. For example, if the model predicts methane concentrations greater than one percent, then predicted methane concentration can be evaluated up proportionally to this exceeding. To recapitulate, machine learning methods that are so far sporadically used to the coalmining data analysis may be the alternative for analytic methods exploited up to now. Applying machine learning methods to coal-mining equipment diagnostics (Sikora & Widera, 2004) and monitoring environment in dewater pumps stations (Sikora & Krzykawski, 2005) shows that these methods can be used in mining industry on many other fields. The important advantage of machine learning methods is also the fact, that the determined data model can be easily interpreted by a user (especially by the domain expert). REFERENCES B o j k o , B., 2004. Dynamika stężenia metanu w wyrobiskach górniczych. Instytut Mechaniki Górotworu PAN, promotor dr hab. inż. Stanisław Wasilewski, Kraków. B o x , G.E.P., J e n k i n s , G.M., 1994. Time series analysis: forecasting and control. Prentice Hall, New Jersey, 3th edition. B r e i m a n , L., F r i e d m a n , J.H., O l s h e n , R.A., S t o n e , C.J., 1994. Classification and Regression Trees. Wadsworth, Belmont CA. C z o g a ł a , E., Ł ę s k i , J., 2000. Fuzzy and Neuro-Fuzzy Intelligent Systems. Studies in Fuzziness and Soft Computing vol. 47. Springer-Verlag Company. D i x o n , W.D., 1992. A statistical analysis of monitored data for methane prediction. Ph. D. Thesis. University of Nottingham. Dept. of Mining Engineering, May 1992. M i c h a l s k i , R.S., K a u f m a n n , K., 1998. Data Mining and Knowledge Discovery: A Review of Issues and a Multistrategy Approach. Kubat M., Bratko I. Michalski R.S. (ed.): Machine Learning and Data Mining: Methods and Applications. John Wiley and Sons. Q u i n l a n , R., 1992a. C4.5 Programs for Machine Learning. Morgan Kaufman Publishers, San Mateo, California. Q u i n l a n , J.R., 1992b. Learning with continuous classes. Proc. of the International Conference on Artificial Intelligence (AI`92), Singapore, World Scientific. Q u i n l a n , R., 1993. Combining instance-based learning and model-based learning. Proc of the Tenth International Conference on Machine Learning (ML-93). S i k o r a , M., W i d e r a , D., 2004. Identyfication of diagnostics states for dewater pumps working in abyssal mining pump stations, Proceedings of the XV International conference on system sciences, Wrocław, Poland, p. 394-402. S i k o r a , M., K o z i e l s k i , M., 2005. Application of hybrid data exploration methods to prediction tasks. Materiały konferencji Technologie Przetwarzania Danych. Politechnika Poznańska. Poznań, s. 195-215. 492 S i k o r a , M., K r z y k a w s k i , D., 2005. Zastosowanie metod eksploracji danych do analizy wydzielania się dwutlenku węgla w pomieszczeniach stacji odwadniania kopalń węgla kamiennego. Mechanizacja i Automatyzacja Górnictwa, 6/413, Katowice, 2005, s. 29-40. S o b c z y k , M., 1997. Statystyka. Wydawnictwo Naukowe PWN. StatSoft Polska. 2001a. Statistica 5.0 podręcznik użytkownika tom II-IV. StatSoft, Kraków. StatSoft Polska. 2001b. Statistica Neural Network podręcznik użytkownika. Statsoft, Kraków. T a d e u s i e w i c z , R., 2003. Sieci neuronowe. Akademicka Oficyna Wydawnicza RM, Warszawa. Ya g e r , R.R., F i l e v , D.P., 1994. Essential of Fuzzy Modelling and Control. John Wiley & Sons, Inc. REVIEW BY: PROF. DR HAB. INŻ. WACŁAW TRUTWIN, KRAKÓW Received: 08 August 2006