Professional Documents
Culture Documents
Tam Quan Trong Cua Toan Trong Tin Hoc
Tam Quan Trong Cua Toan Trong Tin Hoc
H T Bo
Vin Cng ngh Thng tin, Vin KH & CN Vit Nam Japan Advanced Institute of Science and Technology
1
Ton hc v Tin hc
Ton hc v tin hc
Vi lnh vc tiu biu ca ton hc trong tin hc Ton hc trong x l ngn ng t nhin Ton hc v Web Ton hc trong sinh hc Ton hc trong hc my (machine learning)
2
Ton hc v Tin hc
Phn mm ng dng
software
Phn mm h thng
Cray Y-MP8E
K ngh phn mm DOS, Windows, C s d liu Mac OS, UNIX, Pascal, Linux C, C++, Thut ton Java, Ton hc cho tin hc
hardware
PC computers
Mng my tnh
Truyn thng
Ton hc v Tin hc
gii quyt cc vn ca tin hc v ng dng tin hc, tm v dng cc l thuyt v cng c ton hc.
Ton hc v Tin hc
Ton hc ri rc i vi tin hc
Logic (lp lun t ng, AI, h thng minh) Set theory (tp m, tp th, tnh ton vi thng tin khng , khng chnh xc, ) Number theory (bo mt thng tin) Combinatorics (hnh hc t hp, ) Graph theory (mng, sinh hc, vt l, ) Digital geometry and digital topology (phn tch nh, etc.)
5
Algorithmics (phng php tnh ton) Computability (gii hn l thuyt v thc t ca thut ton) Probability and Markov chains (x l ting ni, sinh hc, ) Linear algebra (phn tch d liu, ...) Probability (ngu nhin) Proofs (k ngh phn mm, AI) etc.
Ton hc v Tin hc
Ton hc v Tin hc
+
Inference
(logic ton hc, )
Logic mnh , logic tn t, logic khng chun (modal, temporal, non-monotonic, fuzzy, high order, ... logic) Th d: My t ng dng tam on lun (syllosism)
Elephants are bigger than dogs Dogs are bigger than mice Therefore Elephants are bigger than mice
7
Ton hc v Tin hc
U ={ ,
,} , ,
R = {color, shape }
,
}{ ,
,
}}
}}
}, { }, {
}, { }}
Eq. class = {{
X ={
8
}, {
}, {
}, X * = {
}, X * = { ,
Ton hc v Tin hc
23 bi ton ca th k 20
Ti i hi Ton hc Th gii ln th hai (Paris, thng Nm 1900), Hilbert nu ra 23 bi ton, thch thc cc nh ton hc ton th gii gii trong th k 20. 12 bi ton c gii ton b, 8 bi ton c gii tng phn, 3 bi vn cha c li gii.
Ton hc v Tin hc
7 bi ton ca th k 21
4 gi chiu Th t ngy 24 thng 5 nm 2000, Vin Ton hc Clay cng b v thch thc 7 bi ton ca th k 21 (1 triu $ cho mi li gii). Bi ton s 1: P versus NP Su bi ton khc: 1. The Hodge Conjecture 2. The Poincar Conjecture (solved 2006) 3. The Riemann Hypothesis 4. Yang-Mills Existence and Mass Gap 5. Navier-Stokes Existence and Smoothness 6. The Birch and Swinnerton-Dyer Conjecture.
10
Ton hc v Tin hc
Bi ton P versus NP
Nu ai hi rng liu 13.717.421 c l tch ca hai s nh hn khng, bn s cm thy rt kh tr li l ng hay sai. Nu ngi bo bn rng s ny c th l tch ca 3607 v 3803, bn c th kim tra iu ny tht d dng. Xc nh xem vi mt bi ton cho trc, liu c tn ti mt li gii c th kim chng nhanh (bng my tnh chng hn), nhng li cn rt nhiu thi gian gii t u (nu khng bit li gii)? C rt nhiu bi ton nh vy. Cha ai c th chng minh c rng, vi bt k bi ton no nh vy, thc s cn rt nhiu thi gian gii. C th ch n gin l chng ta vn cha tm ra c cch gii chng nhanh chng. Stephen Cook pht biu bi ton P versus NP vo nm 1971.
11
Ton hc v Tin hc
Bi ton P versus NP
Bi ton SAT: cho trc mt mch in t Boolean, liu c cc cch chn inputs sao cho output l True hay khng? Inputs ca cc mch in t Boolean (vi cng AND, OR v NOT) hoc l T (true) hoc l F (false). Mi cng nhn mt s cc inputs, v outputs gi tr logic tng hp c.
12
Ti khng tm ni cho chef mt thut ton hiu qu. u c ti do ny c v m u qu.
Ton hc v Tin hc
Bi ton P versus NP
Ti khng th tm c mt thut ton hiu qu bi v khng th c mt thut ton no nh vy. Ti khng th tm c mt thut ton hiu qu bi v tt c nhng ngi ni ting ny cng khng tm c n.
nu bn chng minh c SAT l intractable (chng minh intractability c th kh nh vic tm li gii hiu qu)
13
Ton hc v Tin hc
n = 20
0.0002 second 0.002 second 0.02 second 3.2 second 1.0 second 58 minutes
n = 30
0.0003 second 0.003 second 0.03 second 24.3 second 17.9 second 6.5 years
n = 40
0.0004 second 0.004 second 0.04 second 1.7 minutes 12.7 days 3855 centuries
n = 50
0.0005 second 0.005 second 0.05 second 5.2 minutes 35.7 years 2x108 centuries
n = 60
0.0006 second 0.006 second 0.06 second 13.0 minutes 336 centuries 1.3x1013 centuries
n n2 n3 n5 2n 3n 15
Ton hc v Tin hc
Scalable algorithms
iu g xy ra khi d liu ln, N>106? Cn thut ton heuristic nhanh, gn ng
Fast CMB analysis by Szapudi et al (2001)
17
Ton hc v Tin hc
Khi nim: mt m (cipher: letter, bit), m (code: word or phrase meaning), kha (key). Rt nhiu k thut:
Symmetric-key cryptography (cng mt key cho nhn/gi) Public-key cryptography 1976 (public/private key) da trn number theory (integer factorization) Cryptanalysis (tm im yu, khng an ton trong mt h dng mt m) Cryptographic primitives Cryptographic protocols
(Kha l mt on tin (piece of information) kim sot hot ng ca mt thut ton to mt m, ch k in t, nh quyn hn (encryption, digital signature, authentication)).
18
Ton hc v Tin hc
Public-key cryptography
integer factorization l qu trnh phn tch mt ch s a hp thnh tch ca cc c s nh hn sao cho khi nhn cc c s ny ta c s ban u. BSI team: Nov. 2005, RSA-640 (193 decimal digits, 640 bits) on 80 AMD Opteron CPUs computer. Kh nht trong cc bi ton ny l trng hp c s l cc s nguyn t c chn ngu nhin vi cng ln. Vn cha c thut ton thi gian a thc phn tch mt s ln b-bit thnh tch ca hai s nguyn t c cng kch thc. Mt trong cc bi ton ln cha gii c trong KHMT v l thuyt s l vic tm mt thut ton nhn t ha cc s trong thi gian a thc.
19
Ton hc v Tin hc
L m hnh ca cc hnh vi to nn bi mt s hu hn cc trng thi (states), cc chuyn i trng thi (transitions), v cc hnh ng. M hnh ton hc: (,S,s0,,F), Vic thc hin mt phn cng my tnh i hi mt cng-t (register) cha cc bin trng thi, mt khi (block) mch logic xc nh s chuyn trng thi, v mt khi th hai cc mch logic xc nh output ca mt FSM.
20
Ton hc v Tin hc
Machine learning Xy cc h my tnh c th hc nh con ngi (mt s kh nng). ICML since 1982 ECML since 1989 ECML/PKDD since 2001
Data mining Tm tri thc mi v hu ch t nhng c s d liu ln. ACM KDD since 1995, PKDD and PAKDD since 1997, IEEE ICDM and SIAM DM since 2000, etc.
21
Ton hc v Tin hc
Machine learning Xy cc h my tnh c th hc nh con ngi (mt s kh nng). ICML since 1982 ECML since 1989
ACM KDD since 1995, PKDD and PAKDD since beats IEEE ICDM 1997: ECML/PKDD since 2001 Deep Blue1997, Kasparov (12.2006: Deep Fritz beets Kramnik) and SIAM DM since 1997: Robot cup 2000, etc.
2005: DARPA Grand Challenge
22
Ton hc v Tin hc
23
Ton hc v Tin hc
gii quyt cc vn ca tin hc v ng dng tin hc, tm v dng cc l thuyt v cng c ton hc.
Ton hc v Tin hc
24
Ton hc v tin hc
Vi lnh vc tiu biu ca ton hc trong tin hc Ton hc trong x l ngn ng t nhin Ton hc v Web Ton hc trong sinh hc Ton hc trong hc my (machine learning)
25
Ton hc v Tin hc
text
Shallow parsing
The woman will give Mary a book POS tagging The/Det woman/NN will/MD give/VB Mary/NNP a/Det book/NN chunking
Syntactic Analysis
Grammatical Relation Finding Named Entity Recognition Word Sense Disambiguation
[The/Det woman/NN]NP [will/MD give/VB]VP [Mary/NNP]NP [a/Det book/NN]NP subject relation finding
Semantic Analysis
Reference Resolution
Discourse Analysis
meaning
26
i-object
object
ng gi i nhanh qu
Ton hc v Tin hc
Xu th trong NLP
Trainable parsers 1990s2000s: Statistical learning algorithms, evaluation, corpora Trainable FSMs 1980s: Standard resources and tasks Penn Treebank, WordNet, MUC 1970s: Kernel (vector) spaces clustering, information retrieval (IR) 1960s: Representation Transformation Finite state machines (FSM) and Augmented transition networks (ATNs) 1960s: Representationbeyond the word level lexical features, tree structures, networks
27
Ton hc v Tin hc
some ML/Stat
no ML/Stat
Ton hc v Tin hc
28
... ...
... ...
... Ot-1 Ot
Ot+1
... Ot-1 Ot
Ot+1
... Ot-1 Ot
... Ot+1
29
1.6 meters
3500 3300 3100 2900 2700 2500 2300 2100 1900 1700 1500 1300 1100 900 700 500 300 100 -100
30 Mb 30MB
JAISTs CRAY XT3
Linux OS, 180 AMD Opteron 2.4GHz CPUs, 8Gb RAM/CPU, total:
1.44Tb RAM
60
70
80
30
90 10 0 11 0 12 0 13 0 14 0 15 0
10
20
30
40
50
Ton hc v tin hc
Vi lnh vc tiu biu ca ton hc trong tin hc Ton hc trong x l ngn ng t nhin Ton hc v Web Ton hc trong sinh hc Ton hc trong hc my (machine learning)
31
Ton hc v Tin hc
32
Ton hc v Tin hc
Web quan trng v lin quan nht trn Web (nh trong Google)
33
Pw = w
y P c tnh da trn link structure ca Internet. Vn c bn l lm sao thit lp xc ng c link structure ca ton b Web, i.e., ma trn P.
34
Ton hc v Tin hc
A C
A A B
1/2 1/2
B
1/2
C 0 1 0
0
1/2
36
Ton hc v Tin hc
Tm xp x cc vc t ring ca P (power method) Vic tnh ton (matrix-vector multiplications) phi thc hin trn cc h my tnh song song ln.
37
Ton hc v Tin hc
PageRank: Example
w1 2 w = 1 2 2 w3 0
1
0
1 2
0 w1 w 1 2 0 w3
a = b = c =
38
1 1 1
1 3/2 1/2
5/4 1 3/4
39
Ton hc v Tin hc
Ton hc v tin hc
Vi lnh vc tiu biu ca ton hc trong tin hc Ton hc trong x l ngn ng t nhin Ton hc v Web Ton hc trong sinh hc Ton hc trong hc my (machine learning)
40
Ton hc v Tin hc
Problems in bioinformatics
1400 Chemicals
10,000 Proteins
30,000 Genes
41
Ton hc v Tin hc
Applications: Applications:
Liu P c trong c s d liu T? Liu P c trong c s d liu Xc nh v tr ca P trong T. Xc nh v tr ca P trong T. Liu c th dng P nh mt nguyn Liu c th dng P nh mt nguyn t ca T? t ca T? P c tng ng vi g trong T? P c tng ng vi g trong T? P c b hng bi T? P c b hng bi T? Liu prefix(P) = suffix(T)? Liu prefix(P) = suffix(T)? Xc nh cc lp sau trc (tandem) Xc nh cc lp sau trc (tandem) ca P trong T. ca P trong T.
43
Ton hc v Tin hc
Bi ton c bn nht ca tin sinh hc Cc dy c sp thng c dng cu trc hoc chc nng Cho nhiu gi nu cu trc v chc nng ca mt trong cc dy c sp thng bit
Output
Cch sp thng dy ti u
ATTGCGC ATTGCGC C
ATTGCGC ATCCGC
44
n = s cc dy
on nhn gene
Gene prediction
L bi ton quan trng ca tin sinh hc v hin c nhiu thut ton cho on nhn gene da trn cc gene bit nh d liu hun luyn. Mt k thut ton nhn gene ph bin l Hidden Markov Models (HMMs).
(given the genomic DNA sequence, can we tell where the genes are?)
46
Ton hc v Tin hc
D on tng tc proteins
DNA RNA protein Sequence Structure Function
DNA Sequence
Gene
Protein Structure
Function
48
Probabilistic Graph Models Probabilistic Models of Evolution Stochastic Grammars and Linguistics Kernel methods
49
Ton hc v Tin hc
Ton hc v tin hc
Vi lnh vc tiu biu ca ton hc trong tin hc Ton hc trong x l ngn ng t nhin Ton hc v Web Ton hc trong sinh hc Ton hc trong hc my (machine learning)
50
Ton hc v Tin hc
:X = R2 H = R3
52
2 ( x1 , x2 ) a ( x1 , x2 , x12 + x2 )
Ton hc v Tin hc
X l tp cc oligonucleotides, S gm ba oligonucleoides. Thng thng, mi oligonucleotide c biu din bi mt dy cc ch ci. Trong kernel methods, S c biu din nh mt ma trn ca o s tng t gia tng cp phn t.
53
Ton hc v Tin hc
Feature space F
inverse map -1 (x)
(x1) ... (x2) (xn) (xn-1)
xn-1 xn
k(xi,xj) = (xi).(xj)
nh x d liu trong X vo mt (high-dimensional) khng gian vect, gi l feature space F, bi mt hm trn tng cp phn t x. Tm cc dng tuyn tnh (linear pattern) trong F vi cc thut ton quan bit (tnh tan trn kernel matrix). Khi dng nh x ngc, cc dng tuyn tnh trong F c th tm c cc dng tng ng phc tp trong X. Vic ny c ngm nh thc hin ch bi ni tch (inner products) trong F (kernel trick) xc nh bi mt hm kernel
54
Ton hc v Tin hc
Feature space F
inverse map -1 (x)
(x1) ... (x2) (xn) (xn-1)
xn-1 xn
k(xi,xj) = (xi).(xj)
Linear algebra, probability/statistics, functional analysis, optimization Mercer theorem: Mi hm xc nh dng u c th vit nh ni tch trong mt khng gian feature no y. Kernel trick: Dng ma trn kernel thay v dng ni tch trong khng gian feature space. Representer theorem:
Every minimizer of min{C ( f , {xi , yi }) + ( f
f H m H
) admits
55
Ton hc v Tin hc
i 0 i, i 1+ yi (wTxi + b) 0
Input space
Feature space
56
Ton hc v Tin hc
Kt lun C mi quan h su sc, rng ln gia ton hc v tin hc. Hu ht cc vn ca khoa hc my tnh u cn n ton hc hin i mc cao hoc rt cao.
57
Ton hc v Tin hc