« Home « Kết quả tìm kiếm

Ranking objective interestingness measures with sensitivity values


Tóm tắt Xem thử

- Preliminary data of the biodiversity in the area VNU Journal of Science, Natural Sciences and Technology .
- VNU Journal of Science, Natural Sciences and Technology Ranking objective interestingness measures.
- In this paper, we propose a new approach to evaluate the behavior of objective interestingness measures on association rules.
- The objective interestingness measures are ranked according to the most significant interestingness interval calculated from an inversely cumulative distribution.
- The results will help the user (a data analyst) to have an insight view on the behaviors of objective interestingness measures and as a final purpose, to select the hidden knowledge in a rule set or a set of rule sets represented in the form of the most interesting rules..
- The enormous number of rules discovered in the mining task requires not only an efficient postprocessing task but also an adapted results with the user’s preferences [2-7].
- One of the most interesting and difficult approach to reduce the number of rules is to construct interestingness measures [8,7].
- Based on the data distribution, the objective interestingness measures can evaluate a rule via its statistical factors.
- Depending on the user’s point of view, each objective interestingness measures reflects his/her own interests on the data.
- As we known, it is difficult to have a common ranking on a set of association rules for all the objective interestingness measures..
- In this paper we proposed a new approach for ranking objective interestingness measures using observations on the intervals of the distribution of interestingness values and the number of association rules having the highest interestingness values.
- We focused on the most significance interval in the inversely cumulative distribution calculated from each objective interestingness measures.
- The sensitivity evaluation is experimented on a rule set and on a set of rule sets to rank the objective interestingness measures.
- The objective interestingness measures with the highest ranks will be chosen to find the most interesting rules from a rule set.
- Section 2 introduces the post-processing stage in a KDD process with interestingness measures.
- Section 3 gives some evaluations based on the cardinalities of the rules as well as rule’s interestingness distributions.
- Section 4 presents a new approach with sensitivity values calculated from the most interesting bins (a bin is considered as an interestingness interval) of an interestingness distribution in comparison with the number of best rules.
- Finally, section 6 gives a summarization of the paper.
- This work is lead to the validation of the discovered patterns to find the interesting patterns or hidden knowledge among the large amount of discovered patterns.
- So that, a postprocessing task is necessary to help the user to select a reduced number of interesting patterns [1].
- Association rules Association rule [2,4], taking an important role in KDD, is one of the discovered patterns issued from the mining task to represent the discovered knowledge.
- Both of the two parts of an association rule (i.e., the antecedent and the consequence) are composed with many items (i.e., a set of items or itemset).
- Post-processing with interestingness measures The notion of interestingness is introduced to evaluate the patterns discovered from the mining task .
- The patterns are transformed in value by interestingness measures.
- The patterns may have different ranks because their ranks depend strongly on the choice of interestingness measures.
- The interestingness measures are classified into two categories [7]: subjective measures and objective measures.
- Particularly, most of the interestingness measures proposed in the literature can be used for association rules .
- To restrict the research area in this paper, we will working on objective interestingness measures only.
- So we can use the words objective interestingness measures, objective measures and interestingness measures interchangeably (see Appendix for a complete list of 40 objective interestingness measures)..
- Each rule set with its list of 4 cardinalities.
- Each element of the order set is an order mapping f: 1 ( 1 for each element in the corresponding interestingness set.
- The value set contains the list of interestingness values correspond to the position of the elements in the rank set (i.e.
- From this information the user can have a quick evaluation on the rule set.
- The shape information of the last two arguments are also determined.
- The images are drawn with the support of the JFreeChart package [26].
- We have added to this package the visualization of the inversely cumulative histogram.
- Frequency histogram of the Lift measure from a rule set.
- Inversely cumulative histogram of the Lift measure from a rule set.
- Assume that  is the set of p association rules, called a rule set.
- An interestingness histogram is a histogram [27] in which the size of a category (i.e., a bin) is the number of rules having the same interval of interestingness values.
- Suppose that the number of rules that fall into an interestingness interval i is hi, the total number of bins are k, and the total number of rules is p.
- An interestingness cumulative histogram is a cumulative histogram [27] in which the size of a bin is the cumulative number of rules from the smaller bins up to the specified bin.
- The cumulative number of rules ci in a bin i is determined as:.
- For our purpose, we take the inversely cumulative distribution representation in order to show the number of rules that have been ranked higher than an eventually specified minimum threshold.
- Intuitively, the user can see exactly the number of rules that he will have to deal with in the case in which he/she will choose a particular value for the minimum threshold.
- The inversely cumulative number of rules ici can be computed as:.
- The number of bins k are directly dependent of the rule set size p.
- Rule set characteristics.
- Before evaluating the sensitivity of the interestingness measures observed from interestingness distribution, we propose some arguments on rule set to give the user a quick observation on the characteristics of a rule set.
- For instance, table 3 gives 16 necessary characteristic types in our study in which the first line gives the number of "logical" rules (i.e.
- The percentage of each characteristic type in the rule set is also computed.
- Each rule in the rule set is then examined by its cardinalities to match the characteristic types.
- The complexity of the algorithm is linear (p).
- Sensitivity The sensitivity of an interestingness measure is referred at the number of best rules (i.e., rules that have the highest interestingness values) that an interested user should have to analyze, and if these rules are still well distributed (have different assigned ranks), or all have ranks equal to the maximum assigned value for the specified data set.
- Average Due to the fact that the number of bins is not the same when we have many rule sets to evaluate the sensitivity, so the number of rules that returned in the last interval also has not the same significance.
- Assume that the total number of measures to rank is fixed, the average ranks is used.
- The latter one is calculated according to the rank of each measure obtained from each rule set.
- A weight can be assigned to each rule set to favorite the level of importance, given by the user.
- We use the average ranks to rank the measure over a set of rule sets based on the sensitivity values computed.
- Average structure to evaluate sensitivity on a set of rule set.
- rule set 1.
- rule set 2.
- The first two columns are represent the current rank of the measure.
- 2 has the typical characteristic of the AGRAWAL data set T5I2D10k (T5: average size of the transactions is 5, I2: average size of the maximal potentially large itemsets is 2, D10k: number of items is 100).
- Information on the data sets Data set.
- Number of items.
- From the data sets discussed above, the corresponding rule sets (i.e., the set of association rules) are generated with the rule mining techniques [2]..
- Rule set.
- Number of rules 1.
- Evaluation on a rule set The sensitivity evaluation is based on the number of rules that falls in each interval is compared to rank the measures .
- For a measure on a rule set, the most significance interval will be the last bin (i.e., interval) of the inversely cumulative distribution.
- To have an approximation view on the sensitivity value, the number of rules has the maximum value is also retained.
- A remark is that the number of rules in the first interval is not always the same for all the measures because of the affectation of the number of NaN (not a number) values.
- Sensitivity rank on the 1 rule set.
- 6 on the 1 rule set.
- The meaning for this ranking is that the measure Implication index is more sensitive than the measure Rule Interest on 1 rule set even if the number of the most interesting rules returned with the maximum value is greater for the measure Rule Interest (3>2).
- The differences counted from each couple intervals, beginning from the last interval are quite important because the user will feel easier when looking at 11 rules in the last interval of the measure Implication index instead of looking at 64 rules from the same interval of the measure Rule Interest.
- Comparison of sensitivity values on a couple of measures of the 1 ruleset.
- 7 (a) (b), we can see the measure Implication Index goes strongly from place 13th in the 1 rule set to place 9th over all the set of the four rule sets while the measure Rule Interest goes lightly from place 14th to place 13th.
- Conclusion Based on the sensitivity approach, we have ranked the 40 objective interestingness measures in order to find the most interesting rules in a rule set.
- By comparing the number of rules fallen in the most significant interestingness interval (i.e., the last bin in the inversely cumulative histogram) with the number of best rules (i.e., the number of rules having highest interestingness values), the sensitivity values have been determined.
- We have also proposed the sensitivity structure and the average structure to hold the sensitivity values on a single rule set as well as on a set of rule sets.
- The results obtained from the ARQAT tool [9] will provide some important aspects on the behaviors of the objective interestingness measures, as a supplementary view.
- These future results will provide a deeply insight view on the behaviors of interestingness measures on the knowledge represented in the form of association rules..
- INTERESTINGNESS MEASURES.
- Vanthienen, “Post -processing of association rules,” SWP/KDD'00, Proceedings of The Special Workshop on Post-processing in conjunction with ACM KDD .
- Agrawal, “Mining the most interesting rules,” KDD'99, Proceedings of the 5th ACM SIGKDD International Confeference on Knowledge Discovery and Data Mining .
- Matheus, “The interestingness of deviations,” AAAI'94, Knowledge Discovery in Databases Workshop (1994) 25.
- Briand, “ARQAT: an exploratory analysis tool for interestingness measures,” ASMDA'05, Proceedings of the 11th International Symposium on Applied Stochastic Models and Data Analysis .
- Zupan, “Rule evaluation measures: a unifying view,” ILP'99, Proceedings of the 9th International Workshop on Inductive Logic Programming – LNAI .
- Hamilton, “Interestingness measures for data mining: A survey,” ACM Computing Surveys .
- Hilderman, “The Lorenz Dominance Order as a Measure of Interestingness in KDD,” PAKDD'02, Proceedings of the 6th Pacific- Asia Conference on Knowledge Discovery and Data Mining – LNCS .
- Lallich, “On selecting interestingness measures for association rules: user oriented description and multiple criteria decision aid,” the European Journal of Operational Research .
- Briand, “Using information- theoretic measures to assess association rule interestingness,” ICDM'05 Proceedings of the 5th IEEE International Conference on Data Mining, (2005) 66.
- Zighed, “Implication strength of classification rules,” ISMIS'06, Proceedings of the 16th International Symposium on Methodologies for Intelligent Systems – LNAI .
- Fu, “Mining Patterns That Respond to Actions,” ICDM'05, Proceedings of the Fifth IEEE International Conference on Data Mining .
- Dalton, “The measurement of the inequality of incomes,” Economic Journal .
- Marichal, “An axiomatic approach of the discrete Sugeno integral as a tool to aggregate interacting criteria in a qualitative framework,” IEEE Transactions on Fuzzy Systems .
- Kojadinovic, “Unsupervised aggregation of commensurate correlated attributes by means of the Choquet integral and entropy functionals,” International Journal of Intelligent Systems 2007 (In press)..
- unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown unknown