You are on page 1of 6

Cy quyt nh

Cy quyt nh
Bi:
Wiki Pedia
Trong l thuyt quyt nh (chng hn qun l ri ro), mt cy quyt nh (ting Anh:
decision tree) l mt th ca cc quyt nh v cc hu qu c th ca n (bao gm
ri ro v hao ph ti nguyn). Cy quyt nh c s dng xy dng mt k hoch
nhm t c mc tiu mong mun. Cc cy quyt nh c dng h tr qu trnh
ra quyt nh. Cy quyt nh l mt dng c bit ca cu trc cy.

Gii thiu chung


Trong lnh vc hc my, cy quyt nh l mt kiu m hnh d bo (predictive model),
ngha l mt nh x t cc quan st v mt s vt/hin tng ti cc kt lun v gi tr
mc tiu ca s vt/hin tng. Mi mt nt trong (internal node) tng ng vi mt
bin; ng ni gia n vi nt con ca n th hin mt gi tr c th cho bin . Mi
nt l i din cho gi tr d on ca bin mc tiu, cho trc cc gi tr ca cc bin
c biu din bi ng i t nt gc ti nt l . K thut hc my dng trong cy
quyt nh c gi l hc bng cy quyt nh, hay ch gi vi ci tn ngn gn l cy
quyt nh.
Hc bng cy quyt nh cng l mt phng php thng dng trong khai ph d liu.
Khi , cy quyt nh m t mt cu trc cy, trong , cc l i din cho cc phn
loi cn cnh i din cho cc kt hp ca cc thuc tnh dn ti phn loi [1]. Mt
cy quyt nh c th c hc bng cch chia tp hp ngun thnh cc tp con da theo
mt kim tra gi tr thuc tnh [1]. Qu trnh ny c lp li mt cch qui cho mi
tp con dn xut. Qu trnh qui hon thnh khi khng th tip tc thc hin vic chia
tch c na, hay khi mt phn loi n c th p dng cho tng phn t ca tp con
dn xut. Mt b phn loi rng ngu nhin (random forest) s dng mt s cy quyt
nh c th ci thin t l phn loi.
Cy quyt nh cng l mt phng tin c tnh m t dnh cho vic tnh ton cc xc
sut c iu kin.
Cy quyt nh c th c m t nh l s kt hp ca cc k thut ton hc v tnh
ton nhm h tr vic m t, phn loi v tng qut ha mt tp d liu cho trc.
D liu c cho di dng cc bn ghi c dng:
1/6

Cy quyt nh

(x, y) = (x1, x2, x3..., xk, y)


Bin ph thuc (dependant variable) y l bin m chng ta cn tm hiu, phn loi hay
tng qut ha. x1, x2, x3 ... l cc bin s gip ta thc hin cng vic

Cc kiu cy quyt nh
Cy quyt nh cn c hai tn khc:
Cy hi quy (Regression tree) c lng cc hm gi c gi tr l s thc thay v c
s dng cho cc nhim v phn loi. (v d: c tnh gi mt ngi nh hoc khong thi
gian mt bnh nhn nm vin)
Cy phn loi (Classification tree), nu y l mt bin phn loi nh: gii tnh (nam hay
n), kt qu ca mt trn u (thng hay thua).

V d thc hnh
Ta s dng mt v d gii thch v cy quyt nh:
David l qun l ca mt cu lc b nh golf ni ting. Anh ta ang c rc ri chuyn
cc thnh vin n hay khng n. C ngy ai cng mun chi golf nhng s nhn vin
cu lc b li khng phc v. C hm, khng hiu v l do g m chng ai n chi,
v cu lc b li tha nhn vin.
Mc tiu ca David l ti u ha s nhn vin phc v mi ngy bng cch da theo
thng tin d bo thi tit on xem khi no ngi ta s n chi golf. thc hin
iu , anh cn hiu c ti sao khch hng quyt nh chi v tm hiu xem c cch
gii thch no cho vic hay khng.
Vy l trong hai tun, anh ta thu thp thng tin v:
Tri (outlook) (nng (sunny), nhiu my (clouded) hoc ma (raining)). Nhit
(temperature) bng F. m (humidity). C gi mnh (windy) hay khng.
V tt nhin l s ngi n chi golf vo hm . David thu c mt b d liu gm
14 dng v 5 ct.

2/6

Cy quyt nh

Sau , gii quyt bi ton ca David, ngi ta a ra mt m hnh cy quyt


nh.

3/6

Cy quyt nh

Cy quyt nh l mt m hnh d liu m ha phn b ca nhn lp (cng l y) theo


cc thuc tnh dng d on. y l mt th c hng phi chu trnh di dng mt
cy. Nt gc (nt nm trn nh) i din cho ton b d liu. Thut ton cy phn loi
pht hin ra rng cch tt nht gii thch bin ph thuc, play (chi), l s dng bin
Outlook. Phn loi theo cc gi tr ca bin Outlook, ta c ba nhm khc nhau: Nhm
ngi chi golf khi tri nng, nhm chi khi tri nhiu my, v nhm chi khi tri ma.
Kt lun th nht: nu tri nhiu my, ngi ta lun lun chi golf. V c mt s ngi
ham m n mc chi golf c khi tri ma.
Tip theo, ta li chia nhm tri nng thnh hai nhm con. Ta thy rng khch hng
khng mun chi golf nu m ln qu 70%.
Cui cng, ta chia nhm tri ma thnh hai v thy rng khch hng s khng chi golf
nu tri nhiu gi.
V y l li gii ngn gn cho bi ton m t bi cy phn loi. David cho phn ln
nhn vin ngh vo nhng ngy tri nng v m, hoc nhng ngy ma gi. V hu nh
s chng c ai chi golf trong nhng ngy . Vo nhng hm khc, khi nhiu ngi s
n chi golf, anh ta c th thu thm nhn vin thi v ph gip cng vic.

4/6

Cy quyt nh

Kt lun l cy quyt nh gip ta bin mt biu din d liu phc tp thnh mt cu


trc n gin hn rt nhiu.

Cc cng thc
Gini impurity
Dng trong thut ton CART (Classification and Regression Trees). N da vo vic
bnh phng cc xc sut thnh vin cho mi th loi ch trong nt. Gi tr ca n tin
n cc tiu (bng 0) khi mi trng hp trong nt ri vo mt th loi ch duy nht.
Gi s y nhn cc gi tr trong {1, 2, ..., m} v gi f(i,j) l tn xut ca gi tr j trong nt
i. Ngha l f(i,j) l t l cc bn ghi vi y=j c xp vo nhm i.

Entropy
Dng trong cc thut ton sinh cy ID3, C4.5 v C5.0. S o ny da trn khi nim
entropy trong l thuyt thng tin (information theory).

u im ca cy quyt nh
So vi cc phng php khai ph d liu khc, cy quyt nh l phng php c mt
s u im:
Cy quyt nh d hiu. Ngi ta c th hiu m hnh cy quyt nh sau khi
c gii thch ngn.
Vic chun b d liu cho mt cy quyt nh l c bn hoc khng cn thit.
Cc k thut khc thng i hi chun ha d liu, cn to cc bin ph
(dummy variable) v loi b cc gi tr rng.
Cy quyt nh c th x l c d liu c gi tr bng s v d liu c gi tr l
tn th loi. Cc k thut khc thng chuyn phn tch cc b d liu ch
gm mt loi bin. Chng hn, cc lut quan h ch c th dng cho cc bin
tn, trong khi mng n-ron ch c th dng cho cc bin c gi tr bng s.

5/6

Cy quyt nh

Cy quyt nh l mt m hnh hp trng. Nu c th quan st mt tnh hung


cho trc trong mt m hnh, th c th d dng gii thch iu kin bng
logic Boolean. Mng n-ron l mt v d v m hnh hp en, do li gii thch
cho kt qu qu phc tp c th hiu c.
C th thm nh mt m hnh bng cc kim tra thng k. iu ny lm cho ta
c th tin tng vo m hnh.
Cy quyt nh c th x l tt mt lng d liu ln trong thi gian ngn. C
th dng my tnh c nhn phn tch cc lng d liu ln trong mt thi
gian ngn cho php cc nh chin lc a ra quyt nh da trn phn
tch ca cy quyt nh.

M rng cy quyt nh thnh th quyt nh


Trong cy quyt nh, mi ng i t nt gc n nt l c tin hnh bng cc php
hi (AND). Trong th quyt nh, c th dng cc php tuyn (OR) kt ni ghp
hai hay nhiu ng li vi nhau.
Phn b ca cy quyt nh l phn tch hnh thi hc (Morphological Analysis).

6/6

You might also like