« Home « Chủ đề sổ tay dữ liệu máy tính

Chủ đề : sổ tay dữ liệu máy tính

Có 40+ tài liệu thuộc chủ đề "sổ tay dữ liệu máy tính"

Data Mining and Knowledge Discovery Handbook, 2 Edition part 20

Almuallim H., An Efﬁcient Algorithm for Optimal Pruning of Decision Trees. and Singh V., CLOUDS: A Decision Tree Classiﬁer for Large Datasets, Conference on Knowledge Discovery and Data Mining (KDD-98), August 1998.. Baker E., and Jain A. In Proceedings of the Third International Joint Conference on Pattern Recognition, pages 45-49, San Diego, CA, 1976.. Bratko I., and Bohanec M., Trading...

#Cấu trúc cơ sở dữ liệu #kiến thức máy tính #ứng dụng khai thác dữ liệu #sổ tay dữ liệu máy tính #phương pháp khai thác dữ liệu

Data Mining and Knowledge Discovery Handbook, 2 Edition part 21

tailieu.vn Xem trực tuyến Tải xuống

Polytrees When the topology of a Bayesian network is restricted to a polytree struc- ture — a direct acyclic graph with only one path linking any two nodes in the graph — we can the fact that every node in the network divides the polytree into two disjoint sub-trees. The source of complexity of these algorithms is the identiﬁcation of...

#Cấu trúc cơ sở dữ liệu #kiến thức máy tính #ứng dụng khai thác dữ liệu #sổ tay dữ liệu máy tính #phương pháp khai thác dữ liệu

Data Mining and Knowledge Discovery Handbook, 2 Edition part 22

tailieu.vn Xem trực tuyến Tải xuống

By repeating this procedure for each case in the database, we compute ﬁtted values for each variable Y i , and then deﬁne the blanket residuals by. Lack of signiﬁcant patterns in the residuals r ik and approxi- mate symmetry about 0 will provide evidence in favor of a good ﬁt for the variable Y i , while anomalies in...

#Cấu trúc cơ sở dữ liệu #kiến thức máy tính #ứng dụng khai thác dữ liệu #sổ tay dữ liệu máy tính #phương pháp khai thác dữ liệu

Data Mining and Knowledge Discovery Handbook, 2 Edition part 23

tailieu.vn Xem trực tuyến Tải xuống

Description of the variables used in the analysis. Hoh denotes the Head of the Household. Numbers of adult males, females and children refer to the household.. of the household increases. The dependency of the gender of the household head on the ethnic group shows that Blacks have the smallest probability of having a male head of the household (64%) while...

#Cấu trúc cơ sở dữ liệu #kiến thức máy tính #ứng dụng khai thác dữ liệu #sổ tay dữ liệu máy tính #phương pháp khai thác dữ liệu

Data Mining and Knowledge Discovery Handbook, 2 Edition part 24

tailieu.vn Xem trực tuyến Tải xuống

11.2 Some Deﬁnitions. The ﬁtted value for ˆ y 0. at x 0 can be written as. In practice, the weights decline with distance from x 0 , sometimes abruptly (as in a step function), so that many of the values in S 0 j are often zero. The function linking the response variable y to the predictor x can...

#Cấu trúc cơ sở dữ liệu #kiến thức máy tính #ứng dụng khai thác dữ liệu #sổ tay dữ liệu máy tính #phương pháp khai thác dữ liệu

Data Mining and Knowledge Discovery Handbook, 2 Edition part 25

tailieu.vn Xem trực tuyến Tải xuống

Consider now an application of the generalized additive model. For data de- scribed earlier, Figure 11.3 shows the relationship between number of homicides and the number executions a year earlier, with state and year held constant. Indicator variables are included for each state to adjust for average differences over time in the number of homicides in each state. For example,...

#Cấu trúc cơ sở dữ liệu #kiến thức máy tính #ứng dụng khai thác dữ liệu #sổ tay dữ liệu máy tính #phương pháp khai thác dữ liệu

Data Mining and Knowledge Discovery Handbook, 2 Edition part 26

tailieu.vn Xem trực tuyến Tải xuống

Dasu, T., and T. (2000) Support Vector Machines. Fan, J., and I. Friedman, J., Hastie, T., and R. Freund, Y., and R. (1996) “Experiments with a New Boosting Algorithm,” Ma- chine Learning: Proceedings of the Thirteenth International Conference: 148-156. Hand, D., Manilla, H., and P Smyth (2001) Principle of Data Mining. LeBlanc, M., and R. Tibshirani (1996) “Combining Estimates on...

#Cấu trúc cơ sở dữ liệu #kiến thức máy tính #ứng dụng khai thác dữ liệu #sổ tay dữ liệu máy tính #phương pháp khai thác dữ liệu

Data Mining and Knowledge Discovery Handbook, 2 Edition part 27

tailieu.vn Xem trực tuyến Tải xuống

ε (12.24) The regularization constant C >. The optimization deter- mines a trade-off between model complexity and points lying outside of the tube. The support vectors and the support values of the solution deﬁne the following regression function. b (12.25) There are degrees of freedom for constructing SVR, such as how to penalize or regularize different parts of the vector,...

#Cấu trúc cơ sở dữ liệu #kiến thức máy tính #ứng dụng khai thác dữ liệu #sổ tay dữ liệu máy tính #phương pháp khai thác dữ liệu

Data Mining and Knowledge Discovery Handbook, 2 Edition part 28

tailieu.vn Xem trực tuyến Tải xuống

A very simple exam- ple of such a table is presented as Table 13.1, in which attributes are: Temperature, Headache, Weakness, Nausea, and the decision is Flu. The set of all cases labeled by the same decision value is called a concept. For Table 13.1, case set is a concept of all cases affected by ﬂu (for each case from...

#Cấu trúc cơ sở dữ liệu #kiến thức máy tính #ứng dụng khai thác dữ liệu #sổ tay dữ liệu máy tính #phương pháp khai thác dữ liệu

Data Mining and Knowledge Discovery Handbook, 2 Edition part 29

tailieu.vn Xem trực tuyến Tải xuống

Another rule induction algorithm, developed by R. Many versions of the algorithm have been developed, under different names (Michalski et al., 1986A), (Michalski et al., 1986A).. Let us start by quoting some deﬁnitions from (Michalski et al., 1986A), (Michal- ski et al., 1986A). A seed is a member of the concept, i.e., a positive case. A se- lector is an...

#Cấu trúc cơ sở dữ liệu #kiến thức máy tính #ứng dụng khai thác dữ liệu #sổ tay dữ liệu máy tính #phương pháp khai thác dữ liệu

Data Mining and Knowledge Discovery Handbook, 2 Edition part 30

tailieu.vn Xem trực tuyến Tải xuống

forming categories of entities and assigning individuals to the proper groups within it.. 14.2 Distance Measures. Since clustering is the grouping of similar instances/objects, some sort of measure that can determine whether two objects are similar or dissimilar is required. It is useful to denote the distance between two instances x i and x j as: d(x i ,x j....

#Cấu trúc cơ sở dữ liệu #kiến thức máy tính #ứng dụng khai thác dữ liệu #sổ tay dữ liệu máy tính #phương pháp khai thác dữ liệu

Data Mining and Knowledge Discovery Handbook, 2 Edition part 31

tailieu.vn Xem trực tuyến Tải xuống

Such methods typically require that the number of clusters will be pre-set by the user. Because this is not feasible, certain greedy heuristics are used in the form of iterative optimization. The most well-known criterion is the Sum of Squared Error (SSE), which measures the total squared Euclidian distance of instances to their representative values. The latter option is the...

#Cấu trúc cơ sở dữ liệu #kiến thức máy tính #ứng dụng khai thác dữ liệu #sổ tay dữ liệu máy tính #phương pháp khai thác dữ liệu

Data Mining and Knowledge Discovery Handbook, 2 Edition part 32

tailieu.vn Xem trực tuyến Tải xuống

This method identiﬁes candidate cluster cen- troids by using repeated random samples of the original data. Because of the use of random sampling, the time complexity is O ( n ) for a pattern set of n elements.. This algorithm has a time complexity linear in the number of instances.. All algorithms presented till this point assume that the entire...

#Cấu trúc cơ sở dữ liệu #kiến thức máy tính #ứng dụng khai thác dữ liệu #sổ tay dữ liệu máy tính #phương pháp khai thác dữ liệu

Data Mining and Knowledge Discovery Handbook, 2 Edition part 33

tailieu.vn Xem trực tuyến Tải xuống

15.1.1 Formal Problem Deﬁnition. can be reformulated as an itemset T by a i ∈ T ⇔ t i = 1. We want to use the moti- vation of the introductory example to deﬁne an association explicitly. If the probability of having sausages (S ) or mustard (M ) in the shopping carts of our customers is 10% and 4%,...

#Cấu trúc cơ sở dữ liệu #kiến thức máy tính #ứng dụng khai thác dữ liệu #sổ tay dữ liệu máy tính #phương pháp khai thác dữ liệu

Data Mining and Knowledge Discovery Handbook, 2 Edition part 34

tailieu.vn Xem trực tuyến Tải xuống

P(Y=y)j(X | Y=y), which is the J-value of the rule and is bounded by 0.53 bit. Other measures are conviction (a “directed”, asymmetric lift) (Brin et al., 1997B), certainty factors from MYCIN (Berzal et al., 2001), correlation coefﬁcients from statistics (Tan and Kumar, 2002), Laplace or Gini from rule induction (Clark and Boswell, 1991) or decision tree induction (Breiman, 1996)....

#Cấu trúc cơ sở dữ liệu #kiến thức máy tính #ứng dụng khai thác dữ liệu #sổ tay dữ liệu máy tính #phương pháp khai thác dữ liệu

Data Mining and Knowledge Discovery Handbook, 2 Edition part 35

tailieu.vn Xem trực tuyến Tải xuống

Frequent sets play an essential role in many Data Mining tasks that try to ﬁnd in- teresting patterns from databases, such as association rules, correlations, sequences, episodes, classiﬁers, clusters and many more of which the mining of association rules, as explained in Chapter 14.7.3 in this volume, is one of the most popular prob- lems. The identiﬁcation of sets of...

#Cấu trúc cơ sở dữ liệu #kiến thức máy tính #ứng dụng khai thác dữ liệu #sổ tay dữ liệu máy tính #phương pháp khai thác dữ liệu

Data Mining and Knowledge Discovery Handbook, 2 Edition part 36

tailieu.vn Xem trực tuyến Tải xuống

since a set that is frequent in the complete database must be relatively frequent in one of the parts. Also, the algorithm is highly dependent on the heterogeneity of the database and can generate too many lo- cal frequent sets, resulting in a signiﬁcant decrease in performance. The presented Sampling algorithm picks a random sample from the database, then ﬁnds...

#Cấu trúc cơ sở dữ liệu #kiến thức máy tính #ứng dụng khai thác dữ liệu #sổ tay dữ liệu máy tính #phương pháp khai thác dữ liệu

Data Mining and Knowledge Discovery Handbook, 2 Edition part 37

tailieu.vn Xem trực tuyến Tải xuống

Some others deﬁne syntactical restrictions (e.g., the “length” of the pattern is below a threshold) and checking them does not need any access to the data. We emphasized that the model is however quite general: beside the itemsets or sequences, L can denote, e.g., the language of partitions over a collection of objects or the language of decision trees on...

#Cấu trúc cơ sở dữ liệu #kiến thức máy tính #ứng dụng khai thác dữ liệu #sổ tay dữ liệu máy tính #phương pháp khai thác dữ liệu

Data Mining and Knowledge Discovery Handbook, 2 Edition part 38

tailieu.vn Xem trực tuyến Tải xuống

describe a mining algorithm but rather a pruning technique for non anti-monotonic and non monotonic constraints. Considering a sub-lattice ˚ A of 2 I , the problem is to decide whether this sub-lattice can be pruned. A sub-lattice is characterized by its maximal element M and its minimal element m, i.e., the sub-lattice is the collection of all itemsets S...

#Cấu trúc cơ sở dữ liệu #kiến thức máy tính #ứng dụng khai thác dữ liệu #sổ tay dữ liệu máy tính #phương pháp khai thác dữ liệu

Data Mining and Knowledge Discovery Handbook, 2 Edition part 39

tailieu.vn Xem trực tuyến Tải xuống

The following paragraphs present two algorithms for incorporating link information into search engines: PageRank (Page et al., 1998) and Kleinberg’s Hubs and Authorities (Kleinberg, 1999).. The PageRank algorithm takes a set of interconnected pages and calculates a score for each. Similarly, a pages that is pointed to by numerous other marginally important pages is probably itself important. A more formal...

#Cấu trúc cơ sở dữ liệu #kiến thức máy tính #ứng dụng khai thác dữ liệu #sổ tay dữ liệu máy tính #phương pháp khai thác dữ liệu