Professional Documents
Culture Documents
L thuyt xc sut
Cn bn
L thuyt xc sut cho chng ta mt ngn ng m t s ngu nhin (randomness).
i tng c bn nht ca LTXS l cc bin ngu nhin (random variables).
nh ngha mt bin ngu nhin th cn mt hm phn b (distribution function),
qua c th nh ngha c cc khi nim nh trung bnh (mean) v phng sai
(variance). Standard deviation gi l lch chun. Mean v variance l cc phim hm
(functionals), c p dng cho mt hm phn b hoc mt bin ngu nhin. Hm phn
b nu lin tc tuyt i vi mt o chun (?) nh Lebesgue th c th c biu
din bi hm mt (density), theo nh l Radon-Nikodym.
C s ton hc ca l thuyt xc sut l thuyt o (measure theory), nhng vic
chnh ca cc xc sut gia (?) (probablist) l xy dng pht trin cng nhiu loi o o
xc sut cng tt. Ni chuyn vi mt chuyn gia o khng th khng nh ngha
mt i s sigma (sigma-algebra). Ni chuyn vi mt chuyn gia xc sut th rt nhiu
khi khi nim ny n rt k. Cng c chnh ca cc XSG chnh l khi nim c lp
(independence), v mnh m hn l c lp c iu kin (conditional independence).
Cho nn dn ton thng tru LTXS chng qua l thuyt o + c lp. Vy s khc
bit gia mt o xc sut v nhng bin ngu nhin l g? Theo David Aldous th
l s khc bit gia recipe lm bnh v nhng ci bnh. Hiu c s khc bit ny
th mi lm c bc nhy t l thuyt o kh khan sang l thuyt xc sut ti
mt hn.
c lp v hi t
Khi nim c lp cho ta mt lot cc nh lut c bn ca LTXS. Tt c u xoay quanh
hin tng tp trung ca o (concentration of measure). Bt u l lut cc s ln
(c phin bn lut mnh (strong law) v lut yu). Lut gii hn trung tm (Central limit
theorem) nhc rng sample mean (mu trung bnh) c quy lut bnh thng (normal/
1/14
2/14
3/14
M hnh thng k
Cn bn
M hnh thng k (statistical model) cng l m hnh xc sut, s dng t cc nguyn
liu c pht trin cho cc hm phn b v cc qu trnh NN trong LTXS. Ci khc
y l trong m hnh thng k c mt s bin ngu nhin c gn nhn l d liu (data),
nhng bin s ngu nhin m chng ta c th quan st, hoc thu thp c gi tr bng
thc nghim v cc thit b cng ngh. Cho nn trng tm ca vic xy dng m hnh
thng k l lm sao c lng (estimate) /hc (learn) c m hnh ny t d liu, lm
sao c th nh gi c tnh hiu qu (efficiency) hoc tnh ph qut (generalization)
ca m hnh, lm sao c th chn ra c m hnh hu ch (model selection/model
choice).
Tham s
kim sot c phc tp ca m hnh th cng c chnh y l phi tham s ha
(parameterization) m hnh. Cc tham s (parameter) l phn cn li ca m hnh xc
sut m chng ta phi c lng, hc. n y c mt vn nho nh, cc tham s l
mt gi tr khng bit nhng khng ngu nhin, hay bn thn chng l ngu nhin. C
hai cch tip cn vn ny, trng phi tn sut gi d cch u, cn trng phi Bayes
th gi d cch sau. Nu cc tham s l c s chiu hu hn, ta c mt m hnh tham s
(parametric model), nu s chiu l v hn th ta c m hnh phi tham s (nonparametric
model). Nh vy, gi l phi tham s khng c ngha l khng c tham s. Nu tham s
l ngu nhin m li v hn chiu th ngi ta gi m hnh l m hnh phi tham s Bayes
4/14
5/14
mng khng nh ca ngnh thng k hc, thuc trng phi tn sut, tp trung vo cc
phng php m hnh t do (distribution free), qua khng s dng mt m hnh xc
sut c th no, mc d h c gi s l tn ti mt hm phn b to ra cc mu d
liu mt cch c lp. Ch rng iu ny khng c ngha l cc nh tn sut l khch
quan hn cc nh Bayes ch quan, v s gi d tnh c lp ni chung l mnh hn s
gi d tnh c lp iu kin, hay tnh hon chuyn c. C hai cch nhn Bayes v tn
sut u hu ch trong cc ng cnh khc nhau, v v nhiu mt khng c phe hon ton
ng. C hai cch nhn ny u cha cht mu thun trong mnh, c s i chi nhau,
nhng cng c s tng h nhau ging nh bc tranh m-dng trong Kinh Dch vy.
Ta s tip tc soi li quan h ny mi khi c dp.
Phn lp cc m hnh c th v cch tham s ha
Cc m hnh thng k ging nh cc sinh vt trong th gii t nhin, rt a dng v
c th c phn lp, v c th quan st s phc tp tng dn vi qu trnh pht trin
ca ngnh. Trong ngnh hc my th mt s ngi cn gi mt m hnh l mt ci my
(machine), nghe cng ngh, hin i v mi m hn. m t mt m hnh th cn phi
ni cch tham s ha ca chng th no, nn cn rt nhiu khi nim v lexicon. Tham
s ha th no chnh l vn cm v nc mm ca ngi hc thng k.
Vi rt nhiu bin ngu nhin, cn phi nh ra joint distribution (phn b lin hp).
Marginal distribution gi l ? Conditional distribution gi l phn b iu kin.
Covariates gi l ng bin. Trong cng ngh thng l u vo. Features thc ra cng
l ng bin, nhng xut x t hc my, v s gi l c trng.
Trong h m, c hai cch tham s ha. Natural parameterization gi l cch tham s
ha t nhin. Canonical parameterization gi l tham s ha chnh tc? Cn gi l tham
s ha trung bnh (mean parameterization). Hai h tham s k trn c lin h mt thit
vi nhau qua quan h i ngu lin hp (conjugate duality), mt khi nim ca gii tch
li (convex analysis). Trong hnh hc thng tin (information geometry) th hai h tham
s ny c th hiu qua khi nim e-flat manifold v m-flat manifold (?). Normalizing
constant gi l hng s chun ha. trong vt l thng k th khi nim ny cn gi l
partition function hm ngn phn. Cc m hnh thng dng trong vt l l thuyt nh
m hnh Ising, spin glass (?), u l trng hp c bit ca h m. Rt nhiu hm phn
b l trng hp c bit ca h m. c bit quan trng l multivariate Gaussian dch
l Gauss a bin. Mean vector v covariance matrix gi l vector trung bnh v ma trn
hip phng sai.
M hnh h m li l trng hp c bit ca h m hnh xc sut th (graphical
model). Phn bit graphical v graph v graphics th no y? nh ngha m hnh
ny cn potential function (hm tim nng), c nh ngha trn clique (?) ca cc bin
ngu nhin. C hai loi m hnh XSDT. Mt l m hnh th v hng (undirected
graphical model), cng ng ngha vi trng ngu nhin Markov (Markov random
6/14
7/14
ngi theo Bayes cung tn, th cc tham s thng tng ny cng phi ngu nhin
v phi tip tc qu trnh tham s ny n tn Big Bang. iu ny dn n mt h m
hnh a tng (hierarchical model/ multi-level model), rt mnh v rt giu. Tuy c th
coi l mt trng hp ca m hnh XSDT, nhng trng tm v ngun gc rt khc, nn
ta khng nn gp lm mt. (Ch l ta khng th i n tn Big Bang, nn sau vi tng
ca hierarchy th cc nh thng k Bayes cng s mt v dng li. Trn thc t, khi
vai tr ca cc tham s tng rt cao khng cn nhiu trong chuyn chi phi cc biu
hin ca m hnh na). Vic nh ra cch tham s ha cc tham s cn gi l s nh
ra cc prior distribution (phn b tin nghim) cho cc tham s ngu nhin. p dng
cng thc Bayes (Bayes rule) th tnh c posterior distribution, dch l phn b hu
nghim. Conjugate prior th gi l phn b tin nghim lin hp. Tham s ha cho cc
tham s hyper cn gi l s nh ra cc hyperprior (phn b tin nghim thng tng).
Quyt nh la chn prior no (s ch nh tin nghim) ph thuc vo s ging co gia
tin nghim (prior knowledge), thc nghim t d liu (empirical data), v s thun tin
v tnh ton (computational convenience). S dng cc phn b tin nghim lin hp
(pht m y mm!) l mt v d ca s thun tin. S ging co gia tin nghim v
thc nghim chng qua l mt th hin ca dao co Occam, di nhn quan ca trng
phi Bayes.
Dn tn sut th khng thch khi nim tham s hyper cht no, m cho rng cc tham s
phi l khng ngu nhin. V mt m hnh m ni th cch nhn ny l ci tri v hnh,
theo quan im Bayes nhng tham s kiu ny l vn c th coi l ngu nhin theo mt
o Dirac ( o nguyn t atomic measure), mt s rng buc rt cht khng cn
thit. Cho nn, trong lch s m hnh ca cc nh tn sut thng khng giu c bng
m hnh ca cc nh Bayes. Tuy khng nht thit phi l nh vy.
Dao co ca Occam
Nh ng Gt ni l mi chn l u mu xm, cn cy i th mi mi xanh ti. Thay
ch chn l bng ch m hnh, thay ch cy i bng ch d liu quan st c, ta c
mt bin phn cho cc nh thng k. Bc George Box c mt cu ni ting tng t
mi m hnh u sai, ch c nhng m hnh hu ch hay khng. Cho nn ta phi nhn
nhn cc m hnh l cch chng ta xp x th gii thc nghim. V vy ngoi sai s c
lng (estimation error) ca cc tham s, cn c mt dng sai s gi l sai s xp x
(approximation error). M hnh dng ngn ng thng k v cc cu trc ton hc (nh
cc qu trnh stochastic) lm vin gch, nhng li c c lng, iu chnh (update),
v nh gi, phn tch bng d liu tht. Cng c ton hc cng mnh th tnh phc tp
m hnh (model complexity) cng ln, dn n kh nng biu din ca mt m hnh
cng ln, khi sai s xp x s nh, song vic c lng (estimation) t d liu cng c
th ln ln. y chnh l ging co (tradeoff ) gia sai s xp x v sai s c lng. Hin
tng ny gi l ci dao co ca Occam (Occams razor), lun lun m nh v xuyn
sut mi quyt nh trong vic thit k v nh gi mt m hnh hc. S nht l m hnh
overfit d liu (qu rng) Mt nh gi khch quan i vi s hiu qu v tch hu ch
8/14
9/14
xu nht (worst-case analysis). y ch l hai thi cc cho thy s khc bit. Trn
thc t c th kt hp c hai cch tip cn trong vic suy din t d liu.
C mt s vn suy din c th hn, v do c mt s lexicon ring: Point estimation
gi l c lng im (mt khi nim ca TK Tn). Hypothesis testing gi l kim nh
l thuyt (php th l thuyt?). Classification gi l vn phn lp. Clustering gi l
vn chia nhm. Bi ton ranking trong hc my gi l vn phn hng. Supervised
learning gi l hc c nhn, hc c hng dn. Unsupervised learning gi l hc khng
nhn (hc khng c hng dn, hc khng thy). Sequential analysis gi l phn tch
chui/ phn tch tun t (?), m c th c bi ton optimal stopping dch l bi ton dng
ti u. Survival analysis gi l phn tch s sng st (?). Vn change point detection
gi l bi ton pht hin im thay i. Ch l tt c cc vn suy din c th ny
u c th hiu tng qut theo mt trong hai vn suy din (c lng tham s, hoc
d bo), u c th tip cn theo cch nhn Tn hay By, nhng c th s iu chnh
mt cht v cch nh gi ca suy din.
L thuyt quyt nh
Nn tng l thuyt ca suy din thng k chnh l l thuyt quyt nh ca Abraham
Wald. Cn khi nim ri ro (risk). Ri ro Bayes l Bayes risk. Ri ro l k vng ca hm
thit hi/tn tht/thit/mt (loss function). Dn kinh t s dng hm utility (hm tin ch/
tha dng) thay v dng hm thit hi. Mt khi nim tng t l hm reward (?) trong
mn hc reinforcement learning(?), v qu trnh quyt nh Markov.
L thuyt quyt nh l ci chung cho c hai trng phi By v Tn, nhng vi dn
Tn th c nhiu vic phi lo hn. Estimator dch l cch c lng cho mt tham s,
v l mt hm s p dng vo d liu. Nh vy cng ging mt thng k, nh vy c
th coi mt thng k l mt cch c lng th s. Estimate l mt c lng c th
cho mt tham s no . Trong bi ton phn lp th estimator cn gi l mt learning
machine (my hc), estimate s l hm s phn lp (classifier). Trong vn kim nh
l thuyt (hypothesis testing) th ci phi c lng l mt hm s quyt nh (decision
function). D theo nhn quan no th u cn tm c lng theo tiu chun c gi tr ri
ro ti thiu (minimum risk criterion). Nhng ri ro ca anh By th khc vi anh Tn.
K vng tn sut (frequentist expectation) l k vng ca hm mt i vi phn b ca
d liu (o tng) trn c s mt m hnh vi mt tham s c sn. K vng Bayes l
gi tr k vng ca hm mt i vi phn b iu kin ca tham s trn c s d liu
c sn. Ni cch khc, vi anh Tn th d liu l ngu nhin, vi anh By th tham s
l ngu nhin. Nu ly k vng ca k vng tn sut i vi phn b ca tham s, hoc
ly k vng ca k vng Bayes i vi phn b ca d liu th ta cng nhn c Ri
ro Bayes!
Mt s hm thit hi thng dng: Hm thit 0-1. Khi Ri ro Bayes gi l Li Bayes
(Bayes error). Hm thit bnh phng (square loss). Hm thit m (exponential loss).
10/14
11/14
lng thng dng, a nng bc nht trong ngnh thng k (t nht l vi nhn quan tn
sut). Vi cc m hnh tham s th cch c lng ny c m bo bi tnh nht qun
(consistency) m hnh s c c lng chnh xc nu s d liu tin n v hn.
Ti sao hm mt li l hm logarithm ca mt m khng phi l mt hm s no
khc? y l mt v d ca s diu k bt ng ca ton hc cu tr li truy ra khi
nim c lp, khi nim tp trung ca o trong xc sut, v tnh li trong gii tch
(v hnh hc). Nguyn tc kh nng (likelihood principle) cho rng hm kh nng l mt
thng k y (sufficient statistics). Nguyn tc ny ph sn trong ng cnh phi tham
s.
Regularization/Penalization/Shrinkage. Vi s c lng cc m hnh phi tham s th
ch da vo d liu (thng qua hm kh nng (likelihood) hoc tng qut hn, hm
ri ro thc nghim) khng . Cn phi c s iu chnh trong vic ly cc i/cc
tiu thng qua khi nim regularization (kim sot), cn gi l penalization (sot pht).
Regularized empirical risk gi l ri ro thc nghim c kim sot. Khi nim kim sot,
sot pht bt ngun t mt pht hin bt ng ca Charles Stein v shrinkage estimator
(cch c lng co). Cho nn nhiu khi ngi ta cng gi nhm c lng ny l c
lng co. dng mt s lng d liu hu hn m c lng cc i lng (tham s)
v hn hoc c s chiu ln (cho d s d liu c ln n u v tin dn n v hn
i chng na) th vn phi c s kim sot trong c lng, v khng th da hon ton
vo d liu thc nghim c. Theo nhn quan Bayes th iu ny chnh l s ging co
gia thc nghim v tin nghim. Co (shrinkage) y chnh l co v tin nghim.
Phng php phn tch hu nghim/ hc Bayes. Phng php phn tch hu nghim (a
posteriori analysis), c th l cch suy din hu nghim (posterior inference), suy din
Bayes (Bayesian inference), hc Bayes (Bayesian learning), u m t cng mt cch
c lng theo trng phi Bayes. l thay v ngi ta c lng tham s (khng
ngu nhin) nh trong trng phi tn sut, ngi ta s tnh hm phn b hu nghim
cho tham s thng qua cng thc Bayes. Cch ny mu mc phn vic chnh y
l ch nh ra phn b tin nghim ra sao, v tnh ton phn b hu nghim th no (v
phi tnh tch phn rt phc tp v mt tnh ton). Ch rng cch c lng maximum
likelihood chng qua l tnh mt (mode) ca phn b hu nghim, nu phn b tin
nghim c chn l phn b u (uniform distribution). Trong phn tch Bayes, c
bit l vi m hnh tham s, th khng phi lo lng g v vic kim sot (regularization).
Nhng nu phn b tin nghim l mt qu trnh ngu nhin (trong m hnh phi tham
s) th vn phi lo lng v chuyn kim sot tnh phc tp ca tin nghim (complexity
of prior distribution). Mt cng c l sensitivity analysis (phn tch tnh nhy cm) ca
phn b cho tham s.
Phng php Bayes thc nghim (empirical Bayes). Phng php ny c th xem cch
c lng tn sut cho m hnh a tng. M hnh a tng l mt cng c l tng trong
vic kim sot phc tp ca cc m hnh cho tham s.
12/14
Cc vn suy din c th hn
Hypothesis testing. Trong kim nh gi thuyt c mt s khi nim quan trng: Null
hypothesis gi l ? Alternative hypothesis? C hai loi li: Li loi mt (type-1 error)
v li loi hai (type-2 error). Cn gi l t l li dng tnh (false positive) v li m
tnh (false negative) trong nh gi cc treatment (?) trong y hc. Trong cng ngh th
type-1 error gi l false alarm error rate (?), type-2 error chnh l misdetection error rate
(?). Tt c cc loi ri ny u l hm ri ro i vi hm thit 0-1. Cch c lng trong
kim nh gi thuyt gi l mt hm quyt nh. V ngi ta s dng hm quyt nh
thc hin php th (test) cho gi thuyt. Mt php th c nh gi thng qua cc bo
m v gii hn ca cc li k trn. S ging co gia li loi mt v loi hai c biu
din bng ROC curve (ng cong ROC). Cc khi nim lin h cn c significance
(?). Confidence interval dch l ? p-value dch l gi tr p. Power ca php th gi l sc
mnh. Nu ch c hai gi thuyt so snh th hm quyt nh ti u chnh phi da
vo likelihood ratio (phn s kh nng). Likelihood ratio test gi l php th da vo
phn s kh nng. Cng c nh gi sc mnh ca mt php th l thng k gii hn
(asymptotic statistics).
Kim nh gi thuyt xut pht t thng k tn sut, do cng ca Neyman v Pearson.
Khi nim ny rt phn trc quan, v phi i n Wald mi thng nht cch suy din
ny vi cch hnh thc suy din kiu khc trong thng k. Nu tip cn theo nhn quan
Bayes th KDGT kh l n gin, khng khc g vic c lng mt m hnh l bao.
Cn khi nim phn b tin nghim cho cc gi thuyt. Khi nim Bayes factor s c
dch l ?
Sequential analysis. Trong phn tch tun t (sequential analysis) th c s ging co ca
li Bayes v thi gian tr (delay time) ca quyt nh v gi thuyt. Khi nim th thng
dng l sequential likelihood ratio test (php th da theo chui phn s kh nng).
Cng c l thuyt nh gi sc mnh ca php th l cc phn tch v thi gian dng,
phn tch cc loi thi im vt bin, v.v. trong l thuyt xc sut v qu trnh Markov.
Classification/regression/ranking. Trong bi ton phn lp th ngi ta gi mt cch c
lng phn lp l mt my hc (learning machine). Tham s cn c lng y
gi l mt hm phn loi (classifier). C th tip cn vn ny trn c s m hnh
tham s hoc m hnh phi tham s. hc c my (m hnh) thng i hi nhiu
tnh ton, ch khng phi cc thng k n gin nh trong kim nh gi thuyt c in.
Cho nn dn n nhng quan tm v vn hiu qu ca cc gii thut hc/ c lng.
Cch hc/ c lng, v mt tnh ton, c lexicon ring l training (vic luyn my).
D liu cn cho vic hun luyn gi l d liu hun luyn (training data). Php th mt
hm phn loi vi d liu mi gi l testing. D liu th chnh l test data Nu c hai
lp phn loi th hm phn loi ti u phi da vo likelihood ratio, rt ging nh
trong kim nh gi thuyt. Mt khc bit cn bn gia bi ton phn lp vi bi ton
kim nh l thuyt l ch ny: Ci u phi th gi thuyt cho tng mu mt. Ci sau
13/14
14/14