You are on page 1of 14

Thut ng ngnh xc sut thng k v hc my

Thut ng ngnh xc sut


thng k v hc my
Bi:
Cao Xun Hiu

L thuyt xc sut
Cn bn
L thuyt xc sut cho chng ta mt ngn ng m t s ngu nhin (randomness).
i tng c bn nht ca LTXS l cc bin ngu nhin (random variables).
nh ngha mt bin ngu nhin th cn mt hm phn b (distribution function),
qua c th nh ngha c cc khi nim nh trung bnh (mean) v phng sai
(variance). Standard deviation gi l lch chun. Mean v variance l cc phim hm
(functionals), c p dng cho mt hm phn b hoc mt bin ngu nhin. Hm phn
b nu lin tc tuyt i vi mt o chun (?) nh Lebesgue th c th c biu
din bi hm mt (density), theo nh l Radon-Nikodym.
C s ton hc ca l thuyt xc sut l thuyt o (measure theory), nhng vic
chnh ca cc xc sut gia (?) (probablist) l xy dng pht trin cng nhiu loi o o
xc sut cng tt. Ni chuyn vi mt chuyn gia o khng th khng nh ngha
mt i s sigma (sigma-algebra). Ni chuyn vi mt chuyn gia xc sut th rt nhiu
khi khi nim ny n rt k. Cng c chnh ca cc XSG chnh l khi nim c lp
(independence), v mnh m hn l c lp c iu kin (conditional independence).
Cho nn dn ton thng tru LTXS chng qua l thuyt o + c lp. Vy s khc
bit gia mt o xc sut v nhng bin ngu nhin l g? Theo David Aldous th
l s khc bit gia recipe lm bnh v nhng ci bnh. Hiu c s khc bit ny
th mi lm c bc nhy t l thuyt o kh khan sang l thuyt xc sut ti
mt hn.
c lp v hi t
Khi nim c lp cho ta mt lot cc nh lut c bn ca LTXS. Tt c u xoay quanh
hin tng tp trung ca o (concentration of measure). Bt u l lut cc s ln
(c phin bn lut mnh (strong law) v lut yu). Lut gii hn trung tm (Central limit
theorem) nhc rng sample mean (mu trung bnh) c quy lut bnh thng (normal/
1/14

Thut ng ngnh xc sut thng k v hc my

Gaussian) khi s mu tin n v hn. Cc nh lut ny u c s dng cc khi nim


hi t (convergence) trong gii tch. Hi t gn chc (almost sure), hi t v phn b
hoc v lut (convergence in distribution/ in law). Ngoi lut s ln cn c lut cc s
nh (hay lut cc hin tng him c law of rare events), cho ta bit khi no th mu
trung bnh c quy lut Poisson. Khng phi ngu nhin, Gaussian v Poisson l hai hm
phn b cn bn nht l nhng vin gch cho ton b lu i XS.
Khi nim c lp v c lp c iu kin l nhng cht keo gn kt cc bin xc
sut vi nhau, qua cho ta cc hm xc sut cho cc vt th ton hc c cu trc
phc tp hn. Mt dng c lp c iu kin hay dng l tnh cht Markov. Ngoi
cht keo c lp, cn c mt cht keo na rt hu dng, l tnh hon chuyn c
(exchangeability). Nu tnh c lp l nn tng cho cc phng php suy din tn s
(frequentist) , th tnh hon chuyn c li l c s nn tng cho cc phng php suy
din Bayesian. Tnh hon chuyn c ang c m rng ra thnh hon chuyn tng
phn (partial exchangeability), mt khi nim quan trng phi trin cc o cho cc
vt th t hp (combinatorial object) ri rc v phc tp.
Qu trnh ngu nhin
LTXS pht trin rt nhiu hm phn b khng ch cho cc bin xc sut scalar (?) n
gin, m ngi ta cn sng to ra cc hm phn b cho cc cu trc ton hc phc tp,
nhiu chiu hn. Chng ta bt u ni chuyn n hm phn b cho nhng tp cc hm
s o c (measurable functions), v hm phn b cho cc o ngu nhin (random
measures). Hm phn b cho cc vt th v hn chiu ny gi chung l cc qu trnh
ngu nhin (stochastic processes). Cch thc khng nh s tn ti l qua nh l ca
bc Kolmogorov, cho php ta hiu v cc hm phn b cho khng gian v hn chiu t
cc iu kin nht qun (consistency) ca o cho cc cylinder sets. y l cch
chng ta xy dng c cc hm phn b cho qu trnh Gauss (Gaussian processes), qu
trnh Dirichlet (Dirichlet process), v.v.
Mt cch hu hiu xy dng mt qu trnh stochastic l quay li vi khi nim c
lp, v y khi nim ny n gii hn. Cng c y l nhn vo php bin i Fourier
(Fourier transform) ca cc hm phn b. Theo ngn ng XS th khi nim ny gi l
hm tnh cch (characteristic function). y khi nim c lp ti gii hn th ta cn
khi nim cc hm phn b kh phn v hn (infinitely divisible). Khi nim tip theo l
cc hm phn b n nh (stable distribution). Gauss v Poisson chnh l hai hm phn
b n nh khng phi l ngu nhin nu chng ta quay v cc lut s ln v s
nh nhc trn. Max-stable l mt h phn b cc i n nh.
Cc qu trnh ngu nhin c tnh cht gia tng c lp (independent increment) gi l
qu trnh Lvy. Tng qut hn mt cht l cc o hon ton c lp (completely
random measures). nh l biu din Lvy-Khintchine cho chng ta bit r hm tnh
cch ca cc qu trnh stochastic ny l g, thng qua o Lvy (Levy measure). Chn

2/14

Thut ng ngnh xc sut thng k v hc my

o Lvy thch hp (beta, gamma, v.v.) th ta s c mt qu trnh stochastic tng


ng. nh l ny cho ta thy ti sao Gauss v Poisson li tr thnh cc vin gch ch ca
cc lu i xc sut s: Theo nh l Lvy-It, da trn biu din L-K th tt c cc
qu trnh Lvy u c th c decompose (phn r) (phn tch) thnh tng ca ba qu
trnh stochastic c lp, mt l qu trnh Wiener (mt dng qu trnh Gauss), vi qu
trnh phc hp (compound) Poisson, v mt l qu trnh martingale.
Rt kh tng tng cc tp con o c ca sigma i s i vi cc qu trnh ngu
nhin. Thay v hnh dung sigma i s (recipe lm bnh) th ta c th m t nhng ci
bnh. Nu qu trnh c lit k bi tham s thi gian, th mt ci bnh y c th
hiu l mt li mu (sample path). Vi mt s qu trnh ngu nhin th c th m t cch
to mu t mt qu trnh ngu nhin bng phng php nht mu t gi Plya (Plyas
urn). Rt nhiu qu trnh ngu nhin c th c m t bng biu din b gy (stickbreaking representation). Theo biu din ny th cn cc nguyn t (atom) v cc mu
gy (stick-breaking weight). Cch thc b gy v nht nguyn t u da theo c s ca
c lp c iu kin, mt cht keo k diu cho php ta m t cc cu trc phc tp bng
cc nguyn liu gin n hn.
c quan tm hng u l biu hin ca gi tr k vng (expectation) ca mt vt
th xc sut. Lin quan l khi nim k vng iu kin (conditional expectation), bn
thn n cng l mt bin ngu nhin. Mt cng c quan trng l khi nim martingale.
Martingale c th c m t di dng mt qu trnh NN, tm gi l qu trnh nh
bc(?). Cn khi nim filtration (h thng lc). Ngoi ra ta cn c submartingale,
supermartingale v semimartingale (?). Nh cc cng c ny m ta c th tm hiu cc
khi nim xc sut hu ch nh thi im dng (stopping time), thi im chm (hitting
time), thi gian/thi im vt bin (boundary crossing time).
Mt h qu trnh NN rt thng dng l qu trnh Markov (Markov process). nh ngha
trn c s hch xc sut chuyn dch (transition probability kernel), v khi nim h
thng lc. Cn khi nim subordinator (?), mt dng qu trnh Lvy quan trng. Local
time c dch l thi gian a phng. Qu trnh Markov cho thi gian ri rc cn gi
l chui Markov (hoc xch Markov). Lin qua n chui Markov l l thuyt ergodic
(?). Irreducibility dch l bt kh quy. Mt vn c quan tm l thi gian ha tan
(mixing time) ca chui Markov. iu kin cn cho chui Markov c ha tan v mt
trng thi phn b bt dch (phn b dng) (stationary distribution) l ergodicity, tha
mn phng trnh cn bng chi tit (detailed balance). Chui Markov nh ngha cho
khng gian ri rc (dn lattice chng hn) th s tr thnh qu trnh i b ngu nhin
(random walk). Gi lattice l dn thin l rt hay, th phi phn bit vi dn nho th
no y. Khi nim coupling trong chui Markov dch l s cp i. Coupling from the
past? Qu n gin, cp nhau t qu kh! Time-homogeneous Markov process gi l
qu trnh Markov ng bin.

3/14

Thut ng ngnh xc sut thng k v hc my

Ni n qu trnh ta thng ngh n thi gian c th l cc qu trnh NN thng


c hiu l tp hp cc hm phn b nht qun (consistent) c lit k bi mt
tham s ch thi gian. Khng nht thit phi nh vy. M rng khi nim tham s thi
gian ra mt khng gian bt k (v d khng gian Euclidean, dn, hoc khng gian phiEuclidean), th ta c qu trnh NN tng qut hn. Markov random fields s c gi l
trng ngu nhin Markov. Gaussian random field l trng ngu nhin Gauss. Poisson
point process gi l qu trnh im Poisson (li qu trnh, nhng k thc phi gi l
trng Poisson mi phi!) . Spatial process l qu trnh khng gian (?). Spatiotemporal
process gi l qu trnh khng-thi gian. Khi nim phase transition rt hay trong trng
ngu nhin Markov ca mt dn v hn, ta s dch l hin tng chuyn pha.
Mt dng qu trnh NN kh hay ho gi l empirical process (qu trnh thc nghim).
Thng c nghin cu tm hiu v tnh hiu qu ca cc phng php suy din
thng k, thay v dng m t mt qu trnh ngu nhin trong t nhin. S ni mc
sau.
Cc khi nim quan trng khc: percolation, excursion, optional stopping

M hnh thng k
Cn bn
M hnh thng k (statistical model) cng l m hnh xc sut, s dng t cc nguyn
liu c pht trin cho cc hm phn b v cc qu trnh NN trong LTXS. Ci khc
y l trong m hnh thng k c mt s bin ngu nhin c gn nhn l d liu (data),
nhng bin s ngu nhin m chng ta c th quan st, hoc thu thp c gi tr bng
thc nghim v cc thit b cng ngh. Cho nn trng tm ca vic xy dng m hnh
thng k l lm sao c lng (estimate) /hc (learn) c m hnh ny t d liu, lm
sao c th nh gi c tnh hiu qu (efficiency) hoc tnh ph qut (generalization)
ca m hnh, lm sao c th chn ra c m hnh hu ch (model selection/model
choice).
Tham s
kim sot c phc tp ca m hnh th cng c chnh y l phi tham s ha
(parameterization) m hnh. Cc tham s (parameter) l phn cn li ca m hnh xc
sut m chng ta phi c lng, hc. n y c mt vn nho nh, cc tham s l
mt gi tr khng bit nhng khng ngu nhin, hay bn thn chng l ngu nhin. C
hai cch tip cn vn ny, trng phi tn sut gi d cch u, cn trng phi Bayes
th gi d cch sau. Nu cc tham s l c s chiu hu hn, ta c mt m hnh tham s
(parametric model), nu s chiu l v hn th ta c m hnh phi tham s (nonparametric
model). Nh vy, gi l phi tham s khng c ngha l khng c tham s. Nu tham s
l ngu nhin m li v hn chiu th ngi ta gi m hnh l m hnh phi tham s Bayes
4/14

Thut ng ngnh xc sut thng k v hc my

(Bayesian nonparametric model). iu ny khng c ngha lm vic vi cc m hnh


dng ny l theo trng phi Bayes, mc d trn thc t th phn ln nhng ngi pht
trin m hnh phc tp ni chung v m hnh phi tham s Bayes ni ring li c nhn
quan Bayes. Song khng nht thit phi vy.
y v thng tin
Mt cng c quan trng trong vic tham s ha l khi nim thng k y (sufficient
statistics). hiu khi nim ny phi hiu khi nim thng k l g. Mt thng k l
mt hm s c p dng vo cc d liu (cng tr nhn chia kiu g cng c). Lin
h vi khmt th thng k chnh l u ra (output) ca mt gii thut s dng d liu nh
l u vo. Cn thng k y i vi mt m hnh l nhng thng k cha ng mi
thng tin c th c c t d liu v cc tham s ca m hnh. Ngha l nu vt ht
d liu i, ch cn gia li cc thng k y , vn khng b mt thng tin g v m
hnh. y c l l mt trong nhng khi nim p nht ca ton b thng k hc. Sau
khi quyt nh c thng k y ri ngi ta c th bit c rng d liu phi l
mu ca mt hm phn b c mt cch tham s ha nht nh, qua mt nh l biu din
phn tch Fisher-Neyman (Fisher-Neyman factorization theorem). Nhc thm khi nim
thng k y l mt khi nim c tnh l thuyt thng tin (information-theoretic), c
th pht biu bng tnh c lp c iu kin v cc khi nim entropy.
Mt lot cc m hnh p c th c ng vin t khi nim cn v kiu ny.
M hnh h m (exponential family) l m hnh to ra d liu ngu nhin nht c th
c, nu cc thng k y c cho. M hnh xc sut th (probabilistic
graphical model) l m hnh duy nht tha mn cc rng buc v c lp c iu kin
cho cc bin ngu nhin, theo nh l Hammersley-Clifford. Nu cc bin ngu nhin
c gi d l hon chuyn c, th chng bt buc phi c m t bi mt m
hnh trn/ m hnh hn hp (mixture model), theo nh l ni ting ca de Finetti. Nu
cc bin ngu nhin c hm phn b khng thay i k c khi b bin i trc chun
(orthornomal transformation) th chng bt buc phi c m t bi mt elliptically
contoured distribution (phn b c ng cong lp), kiu nh Gauss a bin vy.
Nhn quan Bayes v tn sut
Cc m hnh thng k cho ta keo dnh gn kt cc d liu vi nhau, v l i tng
trung tm ca ngnh thng k. Nhng trong lch s v n tn by gi, cc m hnh vn
c trng phi Bayes cho n nng nhit hn l trng phi tn sut, bi v s l
thuc vo mt m hnh thng k lm cho ngi ta lin tng n s l thuc vo tin
nghim (prior knowledge) qu nhiu, v do thiu i s khch quan. c bit trong
trng phi Bayes c mt nhnh gi l Bayes ch quan (subjective Bayes) v Bayes
khch quan. Nhng ngi theo Bayes ch quan cho rng, nu ta c nhng nim tin ch
quan (subjective belief) nht nh v d liu, th ta s s dng mt m hnh xc sut
tng ng, do cc nh l kiu nh ca de Finetti v Hammersley-Clifford k trn. Mt

5/14

Thut ng ngnh xc sut thng k v hc my

mng khng nh ca ngnh thng k hc, thuc trng phi tn sut, tp trung vo cc
phng php m hnh t do (distribution free), qua khng s dng mt m hnh xc
sut c th no, mc d h c gi s l tn ti mt hm phn b to ra cc mu d
liu mt cch c lp. Ch rng iu ny khng c ngha l cc nh tn sut l khch
quan hn cc nh Bayes ch quan, v s gi d tnh c lp ni chung l mnh hn s
gi d tnh c lp iu kin, hay tnh hon chuyn c. C hai cch nhn Bayes v tn
sut u hu ch trong cc ng cnh khc nhau, v v nhiu mt khng c phe hon ton
ng. C hai cch nhn ny u cha cht mu thun trong mnh, c s i chi nhau,
nhng cng c s tng h nhau ging nh bc tranh m-dng trong Kinh Dch vy.
Ta s tip tc soi li quan h ny mi khi c dp.
Phn lp cc m hnh c th v cch tham s ha
Cc m hnh thng k ging nh cc sinh vt trong th gii t nhin, rt a dng v
c th c phn lp, v c th quan st s phc tp tng dn vi qu trnh pht trin
ca ngnh. Trong ngnh hc my th mt s ngi cn gi mt m hnh l mt ci my
(machine), nghe cng ngh, hin i v mi m hn. m t mt m hnh th cn phi
ni cch tham s ha ca chng th no, nn cn rt nhiu khi nim v lexicon. Tham
s ha th no chnh l vn cm v nc mm ca ngi hc thng k.
Vi rt nhiu bin ngu nhin, cn phi nh ra joint distribution (phn b lin hp).
Marginal distribution gi l ? Conditional distribution gi l phn b iu kin.
Covariates gi l ng bin. Trong cng ngh thng l u vo. Features thc ra cng
l ng bin, nhng xut x t hc my, v s gi l c trng.
Trong h m, c hai cch tham s ha. Natural parameterization gi l cch tham s
ha t nhin. Canonical parameterization gi l tham s ha chnh tc? Cn gi l tham
s ha trung bnh (mean parameterization). Hai h tham s k trn c lin h mt thit
vi nhau qua quan h i ngu lin hp (conjugate duality), mt khi nim ca gii tch
li (convex analysis). Trong hnh hc thng tin (information geometry) th hai h tham
s ny c th hiu qua khi nim e-flat manifold v m-flat manifold (?). Normalizing
constant gi l hng s chun ha. trong vt l thng k th khi nim ny cn gi l
partition function hm ngn phn. Cc m hnh thng dng trong vt l l thuyt nh
m hnh Ising, spin glass (?), u l trng hp c bit ca h m. Rt nhiu hm phn
b l trng hp c bit ca h m. c bit quan trng l multivariate Gaussian dch
l Gauss a bin. Mean vector v covariance matrix gi l vector trung bnh v ma trn
hip phng sai.
M hnh h m li l trng hp c bit ca h m hnh xc sut th (graphical
model). Phn bit graphical v graph v graphics th no y? nh ngha m hnh
ny cn potential function (hm tim nng), c nh ngha trn clique (?) ca cc bin
ngu nhin. C hai loi m hnh XSDT. Mt l m hnh th v hng (undirected
graphical model), cng ng ngha vi trng ngu nhin Markov (Markov random

6/14

Thut ng ngnh xc sut thng k v hc my

fields). Mt l m hnh th c hng (directed graphical model), cn gi l mng


Bayes (Bayesian network) ca Pearl. Trong mng Bayes c khi nim nt cha v nt
con. Khi nim moralization gi l ly nhau. Mt s trng hp thng dng ca mng
Bayes c th k n m hnh cy xc sut T (tree-structured graphical model), m hnh
a cy (polytree) nhng c l gi l cy a cng thch hp, m hnh Markov n (hidden
Markov), m hnh lc Kalman (Kalman filter), mi trn Kalman (Kalman smoothing)
Latent/hidden variables gi l cc bin n. Naive Bayes tm gi l Bayes th ngy,
hoc By ng. Mng Bayes cho cc dng d liu tun t (sequential data) cn gi l
dynamic Bayes net (?).
Mt s m hnh tham s khc phi k n: M hnh hi quy tuyn tnh, mng n
ron (neural network), m hnh cy quyt nh (decision tree), m hnh hp xng
(ensemble), m hnh hi quy logit (logistic regression), m hnh tuyn tnh tng qut
(generalized linear model), m hnh mng tin, mng tin su (deep belief net). v.v. Nhng
m hnh kiu ny thng p dng vo cc vn suy din c th hn, c bit trong
bi ton phn lp (classification) v hi quy (regression). C mt s cch phn loi na:
Trong hc my th cc m hnh d trn hm phn b xc sut lin hp thng gi l m
hnh sinh mu (generative model), nhng cng c mt s m hnh p dng cho cc vn
lin quan n xc sut iu kin th gi l m hnh phn bit (discriminative model).
Ci sau hay c dng cho cc kiu suy din c bit hn nh bi ton phn lp, bi
ton phn hng, v.v.
Mt m hnh bao gm c tham s c s chiu hu hn v tham s c s chiu v hn
thng gi l m hnh bn tham s (semiparametric model). Mt v d tiu biu l m
hnh hi quy Cox (Cox regression model) trong bi ton phn tch sng st v phn tch
s kin lch s (survival analysis/ event history analysis). Time to event data dch l d
liu s kin. Trong m hnh ny, thnh phn tham s hu hn gn lin vi nhng ng
bin (covariates) quan tm, thnh phn tham s v hn l cng t vong/li c bn
(baseline hazard intensity). i khi h cc m hnh bn tham s c gp chung vo h
cc m hnh phi tham s.
H cc m hnh phi tham s Bayes c ly t cc qu trnh ngu nhin k trn. Infinite
mixture model gi l m hnh trn/ hn hp v hn. C cc qu trnh m cht m thc:
Qu trnh nh hng Tu (Chinese restaurant process), qu trnh bp ph n (Indian
buffet process). Qu trnh coelescence gi l g? Vi dn tn sut th nhiu khi cc m
hnh phi tham s ch l tp cc hm quen thuc trong gii tch hm. V d lp Sobolev
(Sobolev class), lp Besov, khng gian Hilbert nhn t sinh (reproducing kernel Hilbert
space), lp smoothing splines (?), v.v. Dn Bayes s lun lun ni v cc hm phn b
( o) cho cc hm s kiu ny.
Dn Bayes cn c mt vic l phi tham s ha cc tham s. Theo cch nhn Bayes,
cc tham s cng ngu nhin, phi c gi d bi mt hm phn b khc. Cc tham
s ca hm ny s l hyperparameter (tham s tng trn/ tham s thng tng?). Nu l

7/14

Thut ng ngnh xc sut thng k v hc my

ngi theo Bayes cung tn, th cc tham s thng tng ny cng phi ngu nhin
v phi tip tc qu trnh tham s ny n tn Big Bang. iu ny dn n mt h m
hnh a tng (hierarchical model/ multi-level model), rt mnh v rt giu. Tuy c th
coi l mt trng hp ca m hnh XSDT, nhng trng tm v ngun gc rt khc, nn
ta khng nn gp lm mt. (Ch l ta khng th i n tn Big Bang, nn sau vi tng
ca hierarchy th cc nh thng k Bayes cng s mt v dng li. Trn thc t, khi
vai tr ca cc tham s tng rt cao khng cn nhiu trong chuyn chi phi cc biu
hin ca m hnh na). Vic nh ra cch tham s ha cc tham s cn gi l s nh
ra cc prior distribution (phn b tin nghim) cho cc tham s ngu nhin. p dng
cng thc Bayes (Bayes rule) th tnh c posterior distribution, dch l phn b hu
nghim. Conjugate prior th gi l phn b tin nghim lin hp. Tham s ha cho cc
tham s hyper cn gi l s nh ra cc hyperprior (phn b tin nghim thng tng).
Quyt nh la chn prior no (s ch nh tin nghim) ph thuc vo s ging co gia
tin nghim (prior knowledge), thc nghim t d liu (empirical data), v s thun tin
v tnh ton (computational convenience). S dng cc phn b tin nghim lin hp
(pht m y mm!) l mt v d ca s thun tin. S ging co gia tin nghim v
thc nghim chng qua l mt th hin ca dao co Occam, di nhn quan ca trng
phi Bayes.
Dn tn sut th khng thch khi nim tham s hyper cht no, m cho rng cc tham s
phi l khng ngu nhin. V mt m hnh m ni th cch nhn ny l ci tri v hnh,
theo quan im Bayes nhng tham s kiu ny l vn c th coi l ngu nhin theo mt
o Dirac ( o nguyn t atomic measure), mt s rng buc rt cht khng cn
thit. Cho nn, trong lch s m hnh ca cc nh tn sut thng khng giu c bng
m hnh ca cc nh Bayes. Tuy khng nht thit phi l nh vy.
Dao co ca Occam
Nh ng Gt ni l mi chn l u mu xm, cn cy i th mi mi xanh ti. Thay
ch chn l bng ch m hnh, thay ch cy i bng ch d liu quan st c, ta c
mt bin phn cho cc nh thng k. Bc George Box c mt cu ni ting tng t
mi m hnh u sai, ch c nhng m hnh hu ch hay khng. Cho nn ta phi nhn
nhn cc m hnh l cch chng ta xp x th gii thc nghim. V vy ngoi sai s c
lng (estimation error) ca cc tham s, cn c mt dng sai s gi l sai s xp x
(approximation error). M hnh dng ngn ng thng k v cc cu trc ton hc (nh
cc qu trnh stochastic) lm vin gch, nhng li c c lng, iu chnh (update),
v nh gi, phn tch bng d liu tht. Cng c ton hc cng mnh th tnh phc tp
m hnh (model complexity) cng ln, dn n kh nng biu din ca mt m hnh
cng ln, khi sai s xp x s nh, song vic c lng (estimation) t d liu cng c
th ln ln. y chnh l ging co (tradeoff ) gia sai s xp x v sai s c lng. Hin
tng ny gi l ci dao co ca Occam (Occams razor), lun lun m nh v xuyn
sut mi quyt nh trong vic thit k v nh gi mt m hnh hc. S nht l m hnh
overfit d liu (qu rng) Mt nh gi khch quan i vi s hiu qu v tch hu ch
8/14

Thut ng ngnh xc sut thng k v hc my

ca mt m hnh l tnh d bo ca n, v ni chung th li d bo thng c chn


bi hai dng sai s ni trn. Lin quan n cc khi nim xp x: Model misspecification
gi l s ch nh m hnh khng chun. Khi nim model identifiability gi l tnh kh
nhn din m hnh. Parameter identifiability l tnh kh nhn din ca tham s.
Tm tt: joint probability, marginal probability, conditional probability, model
identifiability, model mis-specification, model choice, model selection, parameter
identifiability, consistency, parametric model, nonparametric, exponential family,
curved exponential family, graphical model, hierarchical model, mixture model, hidden
markov model, copula model, latent/hidden variables, nonparametric Bayesian model,
density, intensity measure, analysis of variance, functional data, curve data, prior
distribution, posterior distribution, a priori, a posteriori, sufficient statistics, order
statistics, mean parameterization, canonical parameterization, normalizing constant,
log-partition function, mean function, covariance function, covariates, features,
conjugate prior, conjugacy

Cc phng php suy din thng k


Tng quan
Cn phn bit suy din thng k (statistical inference) vi suy din xc sut
(probabilistic inference). Ci sau ch l s tnh ton cc xc sut iu kin trn c s
m hnh xc sut. Cn SDTK l suy din trn c s m hnh thng k vi s hin din
ca s liu. C hai vn chnh, mt l suy din v tham s, hay cn gi l c lng
v tham s (parameter estimation), v d bo (prediction). Vi nhn quan Bayes th suy
din thng k cn gi l suy din Bayes, v mt ton hc th khng khc g suy din xc
sut v c tham s v d liu u c m t bng bin ngu nhin. Cho nn v mt khi
nim th n gin, mu mc. Vi nhn quan tn sut th cch tip cn n cc vn
suy din thng k kh khn hn v mt khi nim, v i hi cc cch tip cn khng
mu mc. Trong hc my th vn c lng v tham s cn gi l hc.
Nu nh trong vn xc nh m hnh th quan im Bayes v quan im tn sut
c tnh tng h nhau (v d, anh By ni vi vi anh Tn: Ti mt ri, cho php ci
tham s hyper ca ti l khng ngu nhin nh v anh Tn ni vi anh By: Cho
ti gi tham s ca anh l bin n nh), th trong vn suy din, hai quan im ny
xung khc nhau quyt lit bt phn thng bi. Quan im ca By l: i vi vn
c lng tham s th ch suy din iu kin vo d liu c sn (conditioning on data),
v marginalize out/ integrate out (?) cc tham s ngu nhin trong vic d bo. Quan
im ca Tn l: i vi vn c lng tham s th phi suy din cho c d liu tng
tng (imaginary data, v dng c lng plug-in (?) trong vic d bo. Tiu chun
ca By l lc quan, quan tm nhiu n phn tch trng hp trung bnh (average-case
analysis). Tiu chun ca Tn rt bi quan, ch trng nhiu hn n phn tch tnh hung

9/14

Thut ng ngnh xc sut thng k v hc my

xu nht (worst-case analysis). y ch l hai thi cc cho thy s khc bit. Trn
thc t c th kt hp c hai cch tip cn trong vic suy din t d liu.
C mt s vn suy din c th hn, v do c mt s lexicon ring: Point estimation
gi l c lng im (mt khi nim ca TK Tn). Hypothesis testing gi l kim nh
l thuyt (php th l thuyt?). Classification gi l vn phn lp. Clustering gi l
vn chia nhm. Bi ton ranking trong hc my gi l vn phn hng. Supervised
learning gi l hc c nhn, hc c hng dn. Unsupervised learning gi l hc khng
nhn (hc khng c hng dn, hc khng thy). Sequential analysis gi l phn tch
chui/ phn tch tun t (?), m c th c bi ton optimal stopping dch l bi ton dng
ti u. Survival analysis gi l phn tch s sng st (?). Vn change point detection
gi l bi ton pht hin im thay i. Ch l tt c cc vn suy din c th ny
u c th hiu tng qut theo mt trong hai vn suy din (c lng tham s, hoc
d bo), u c th tip cn theo cch nhn Tn hay By, nhng c th s iu chnh
mt cht v cch nh gi ca suy din.
L thuyt quyt nh
Nn tng l thuyt ca suy din thng k chnh l l thuyt quyt nh ca Abraham
Wald. Cn khi nim ri ro (risk). Ri ro Bayes l Bayes risk. Ri ro l k vng ca hm
thit hi/tn tht/thit/mt (loss function). Dn kinh t s dng hm utility (hm tin ch/
tha dng) thay v dng hm thit hi. Mt khi nim tng t l hm reward (?) trong
mn hc reinforcement learning(?), v qu trnh quyt nh Markov.
L thuyt quyt nh l ci chung cho c hai trng phi By v Tn, nhng vi dn
Tn th c nhiu vic phi lo hn. Estimator dch l cch c lng cho mt tham s,
v l mt hm s p dng vo d liu. Nh vy cng ging mt thng k, nh vy c
th coi mt thng k l mt cch c lng th s. Estimate l mt c lng c th
cho mt tham s no . Trong bi ton phn lp th estimator cn gi l mt learning
machine (my hc), estimate s l hm s phn lp (classifier). Trong vn kim nh
l thuyt (hypothesis testing) th ci phi c lng l mt hm s quyt nh (decision
function). D theo nhn quan no th u cn tm c lng theo tiu chun c gi tr ri
ro ti thiu (minimum risk criterion). Nhng ri ro ca anh By th khc vi anh Tn.
K vng tn sut (frequentist expectation) l k vng ca hm mt i vi phn b ca
d liu (o tng) trn c s mt m hnh vi mt tham s c sn. K vng Bayes l
gi tr k vng ca hm mt i vi phn b iu kin ca tham s trn c s d liu
c sn. Ni cch khc, vi anh Tn th d liu l ngu nhin, vi anh By th tham s
l ngu nhin. Nu ly k vng ca k vng tn sut i vi phn b ca tham s, hoc
ly k vng ca k vng Bayes i vi phn b ca d liu th ta cng nhn c Ri
ro Bayes!
Mt s hm thit hi thng dng: Hm thit 0-1. Khi Ri ro Bayes gi l Li Bayes
(Bayes error). Hm thit bnh phng (square loss). Hm thit m (exponential loss).

10/14

Thut ng ngnh xc sut thng k v hc my

Hm thit logit (logistic loss). Surrogate loss s c dch l hm thit th ch (?). so


snh cc cch c lng (estimator) khc nhau ngi ta c th dng tiu chun Bayes
(thng qua vic so snh Ri ro Bayes). Dn tn sut s hay dng tiu chun minimax,
mn t l thuyt tr chi (m cuc chi y l gia nh thng k v Tri ch ng
Tri bit chn l (m hnh ng l g, v ng tri mi ln ra tay s nh ra mt mu d
liu). Cn mt s phm cht cho cc cch c lng, nh khi nim unbiasedness (?),
admissibility (?), consistency (nht qun), invariance (bt bin phng sai), efficiency
(hiu qu), superefficiency (siu hiu qu). Dn Bayes ch quan khng quan tm n
my ci chun ny, v h c nim tin son st vo tin nghim ri, v suy din Bayes
bng cch tnh phn b hu nghim l xong. Tuy vy phng php suy din Bayes ch
quan c nhiu tnh cht l thuyt rt tt. Suy din da trn c s ca phn b hu nghim
c chng minh l ti u theo tiu chun Ri ro Bayes. Dn Bayes khch quan th
khng qu t tin nh dn Bayes ch quan, nn h mun phn b tin nghim phi c
nhng phm cht tt. Tnh nht qun hu nghim (posterior consistency) l mt phm
cht quan trng.
Cc cch c lng/hc thng k
Ti t vi vin gch y. Khi no ri s vit dn dn. Bn no c nh hng ng gp
tng paragraph vo cc mc sau (hoc cc mc cha ghi) xin cho bit. c lng hay
hc y vn trn c s mt h m hnh nh sn. Cn vn kh hn l chn m hnh
(model selection), so snh cc m hnh, c bit gia cc m hnh c phc tp khc
hn nhau. Kim nh gi thuyt l mt dng rt c bit ca la chn gia cc m hnh,
song vn c th hiu gn trong phm vi c lng.
Empirical risk minimization. Ri ro c nh ngha trn c s hm phn b ca m
hnh (chn l ch c Tri mi bit). Ch c th tip cn n m hnh ny thng qua
qu trnh thc nghim (empirical process). Ni cch khc, ri ro phi c c lng
bng ri ro thc nghim (empirical risk). Hu ht cc cch c lng ca phe Tn sut
u dng tnh ri ro thc nghim cc tiu (empirical risk minimization (ERM)). Mt
lexicon ng ngha l M-estimation (c lng M), M c ngha l maximization hoc
minimization. Cch c lng da vo moment (moment-based estimation/ moment
matching) thc ra cng c th c ng vin v lin h vi cch c lng ri ro thc
nghim cc i. Mt vn au u cho cch c lng ri ro cc tiu l phi chn
hm mt g? C mt s tn ring: Nu hm mt l hm bnh phng, th ta c phng
php bnh phng cc tiu (least square) rt thng dng trong hi quy.
Maximum likelihood v nguyn tc likelihood. Nu m hnh thng k ch nh ra mt
hm phn b cho d liu, th ta c khi nim likelihood (kh nng?). y l hm s
ca tham s, nhng c li l ngu nhin v c nh ngha trn c s d liu ngu
nhin. Likelihood chnh l mt v d tiu biu (nht) ca ri ro thc nghim. Hm mt
tng ng y l hm logarithm ca mt . Maximum likelihood dch l cch c
lng kh nng cc i (?), mt pht kin v i ca Ronald Fisher. y l cch c

11/14

Thut ng ngnh xc sut thng k v hc my

lng thng dng, a nng bc nht trong ngnh thng k (t nht l vi nhn quan tn
sut). Vi cc m hnh tham s th cch c lng ny c m bo bi tnh nht qun
(consistency) m hnh s c c lng chnh xc nu s d liu tin n v hn.
Ti sao hm mt li l hm logarithm ca mt m khng phi l mt hm s no
khc? y l mt v d ca s diu k bt ng ca ton hc cu tr li truy ra khi
nim c lp, khi nim tp trung ca o trong xc sut, v tnh li trong gii tch
(v hnh hc). Nguyn tc kh nng (likelihood principle) cho rng hm kh nng l mt
thng k y (sufficient statistics). Nguyn tc ny ph sn trong ng cnh phi tham
s.
Regularization/Penalization/Shrinkage. Vi s c lng cc m hnh phi tham s th
ch da vo d liu (thng qua hm kh nng (likelihood) hoc tng qut hn, hm
ri ro thc nghim) khng . Cn phi c s iu chnh trong vic ly cc i/cc
tiu thng qua khi nim regularization (kim sot), cn gi l penalization (sot pht).
Regularized empirical risk gi l ri ro thc nghim c kim sot. Khi nim kim sot,
sot pht bt ngun t mt pht hin bt ng ca Charles Stein v shrinkage estimator
(cch c lng co). Cho nn nhiu khi ngi ta cng gi nhm c lng ny l c
lng co. dng mt s lng d liu hu hn m c lng cc i lng (tham s)
v hn hoc c s chiu ln (cho d s d liu c ln n u v tin dn n v hn
i chng na) th vn phi c s kim sot trong c lng, v khng th da hon ton
vo d liu thc nghim c. Theo nhn quan Bayes th iu ny chnh l s ging co
gia thc nghim v tin nghim. Co (shrinkage) y chnh l co v tin nghim.
Phng php phn tch hu nghim/ hc Bayes. Phng php phn tch hu nghim (a
posteriori analysis), c th l cch suy din hu nghim (posterior inference), suy din
Bayes (Bayesian inference), hc Bayes (Bayesian learning), u m t cng mt cch
c lng theo trng phi Bayes. l thay v ngi ta c lng tham s (khng
ngu nhin) nh trong trng phi tn sut, ngi ta s tnh hm phn b hu nghim
cho tham s thng qua cng thc Bayes. Cch ny mu mc phn vic chnh y
l ch nh ra phn b tin nghim ra sao, v tnh ton phn b hu nghim th no (v
phi tnh tch phn rt phc tp v mt tnh ton). Ch rng cch c lng maximum
likelihood chng qua l tnh mt (mode) ca phn b hu nghim, nu phn b tin
nghim c chn l phn b u (uniform distribution). Trong phn tch Bayes, c
bit l vi m hnh tham s, th khng phi lo lng g v vic kim sot (regularization).
Nhng nu phn b tin nghim l mt qu trnh ngu nhin (trong m hnh phi tham
s) th vn phi lo lng v chuyn kim sot tnh phc tp ca tin nghim (complexity
of prior distribution). Mt cng c l sensitivity analysis (phn tch tnh nhy cm) ca
phn b cho tham s.
Phng php Bayes thc nghim (empirical Bayes). Phng php ny c th xem cch
c lng tn sut cho m hnh a tng. M hnh a tng l mt cng c l tng trong
vic kim sot phc tp ca cc m hnh cho tham s.

12/14

Thut ng ngnh xc sut thng k v hc my

Cc vn suy din c th hn
Hypothesis testing. Trong kim nh gi thuyt c mt s khi nim quan trng: Null
hypothesis gi l ? Alternative hypothesis? C hai loi li: Li loi mt (type-1 error)
v li loi hai (type-2 error). Cn gi l t l li dng tnh (false positive) v li m
tnh (false negative) trong nh gi cc treatment (?) trong y hc. Trong cng ngh th
type-1 error gi l false alarm error rate (?), type-2 error chnh l misdetection error rate
(?). Tt c cc loi ri ny u l hm ri ro i vi hm thit 0-1. Cch c lng trong
kim nh gi thuyt gi l mt hm quyt nh. V ngi ta s dng hm quyt nh
thc hin php th (test) cho gi thuyt. Mt php th c nh gi thng qua cc bo
m v gii hn ca cc li k trn. S ging co gia li loi mt v loi hai c biu
din bng ROC curve (ng cong ROC). Cc khi nim lin h cn c significance
(?). Confidence interval dch l ? p-value dch l gi tr p. Power ca php th gi l sc
mnh. Nu ch c hai gi thuyt so snh th hm quyt nh ti u chnh phi da
vo likelihood ratio (phn s kh nng). Likelihood ratio test gi l php th da vo
phn s kh nng. Cng c nh gi sc mnh ca mt php th l thng k gii hn
(asymptotic statistics).
Kim nh gi thuyt xut pht t thng k tn sut, do cng ca Neyman v Pearson.
Khi nim ny rt phn trc quan, v phi i n Wald mi thng nht cch suy din
ny vi cch hnh thc suy din kiu khc trong thng k. Nu tip cn theo nhn quan
Bayes th KDGT kh l n gin, khng khc g vic c lng mt m hnh l bao.
Cn khi nim phn b tin nghim cho cc gi thuyt. Khi nim Bayes factor s c
dch l ?
Sequential analysis. Trong phn tch tun t (sequential analysis) th c s ging co ca
li Bayes v thi gian tr (delay time) ca quyt nh v gi thuyt. Khi nim th thng
dng l sequential likelihood ratio test (php th da theo chui phn s kh nng).
Cng c l thuyt nh gi sc mnh ca php th l cc phn tch v thi gian dng,
phn tch cc loi thi im vt bin, v.v. trong l thuyt xc sut v qu trnh Markov.
Classification/regression/ranking. Trong bi ton phn lp th ngi ta gi mt cch c
lng phn lp l mt my hc (learning machine). Tham s cn c lng y
gi l mt hm phn loi (classifier). C th tip cn vn ny trn c s m hnh
tham s hoc m hnh phi tham s. hc c my (m hnh) thng i hi nhiu
tnh ton, ch khng phi cc thng k n gin nh trong kim nh gi thuyt c in.
Cho nn dn n nhng quan tm v vn hiu qu ca cc gii thut hc/ c lng.
Cch hc/ c lng, v mt tnh ton, c lexicon ring l training (vic luyn my).
D liu cn cho vic hun luyn gi l d liu hun luyn (training data). Php th mt
hm phn loi vi d liu mi gi l testing. D liu th chnh l test data Nu c hai
lp phn loi th hm phn loi ti u phi da vo likelihood ratio, rt ging nh
trong kim nh gi thuyt. Mt khc bit cn bn gia bi ton phn lp vi bi ton
kim nh l thuyt l ch ny: Ci u phi th gi thuyt cho tng mu mt. Ci sau

13/14

Thut ng ngnh xc sut thng k v hc my

ch phi th gi thuyt mt ln cho c m ng. C rt nhiu phng php phn lp,


vi cc m hnh tham s v phi tham s, v cc gii thut hc/c lng rt phong ph.
Kinh in th c linear discriminant analysis (phn tch phn bit tuyn tnh), logistic
regression (hi quy logit). Hin i hn th c mng n ron (neural network), radiant
basis network (?), support vector machines (?),
Bi ton hi quy (regression analysis) tng t nh bi ton phn lp, khc y l cn
phi c lng/hc phng trnh hi quy (thay v hm phn loi). Hm phn loi ch c
gi tr ri rc, cn phng trnh hi quy thng tnh ra cc gi tr lin tc. Bi ton phn
cp gn ging bi ton phn loi ch hm phn loi cng c gi tr ri rc (v khng
phi nh phn), nhng d liu hun luyn cc mu v s so snh gia cc cp ch khng
phi nhn lp (cp).

14/14

You might also like