Professional Documents
Culture Documents
Báo Cáo Môn Machine Learning
Báo Cáo Môn Machine Learning
Hc My
Thnh vin:
Ging vin hng dn:
Trn ng Trnh - 5100324
TS. Nguyn Thanh Hin
Nguyn Th M Dung - 51003238
Th 2, 09/05/2016
Contents
1
Chapter 1
1.1.2 R l g?
Ni mt cch ngn gn, R l mt phn mm s dng cho phn tch thng k v v biu . Tht ra, v bn
cht, R l ngn ng my tnh a nng, c th s dng cho nhiu mc tiu khc nhau, t tnh ton n gin,
ton hc gii tr (recreational mathematics), tnh ton ma trn (matrix), n cc phn tch thng k phc
tp. V l mt ngn ng, cho nn ngi ta c th s dng R pht trin thnh cc phn mm chuyn mn
cho mt vn tnh ton c bit.
1.2 Ti v Ci t R
- s dng R, vic u tin l chng ta phi ci t R trong my tnh ca mnh. lm vic ny,
ta phi truy nhp vo mng v vo website c tn l Comprehensive R Archive Network (CRAN) sau y:
https://cran.r-project.org/
2
1.3 Vn phm Ngn ng R
- Vn phm chung ca R l mt lnh (command) hay function. M l hm th phi c tham s, cho nn
theo sau hm l nhng tham s m chng ta phi cung cp:
- bit mt hm cn c nhng thng s no, chng ta dng lnh args(x), (args vit tt ch arguments)
m trong x l mt hm chng ta cn bit.
x == 5 : x bng 5
x != 5 : x khng bng 5
y < x : y nh hn x
x > y : x ln hn y
z <= 7 : z nh hn hoc bng 7
p >= 1 : p ln hn hoc bng 1
is.na(x) : C phi x l bin s trng khng (missing value)
A & B : A v B (AND)
A | B : A hoc B (OR)
! : Khng l (NOT)
- Vi R, tt c cc cu ch hay lnh sau k hiu # u khng c hiu ng, v # l k hiu dnh cho ngi
s dng thm vo cc ghi ch.
- Mt iu quan trng cn lu l R phn bit mu t vit hoa v vit thng. Cho nn My.object khc
vi my.object.
1.3.2 H tr trong R
Ngoi lnh args() R cn cung cp lnh help() ngi s dng c th hiu vn phm ca tng hm. Chng
hn nh mun bit hm lm c nhng thng s (arguments) no, chng ta ch n gin lnh:
> help(lm) hay > ?lm
3
Chapter 2
- c thnh lp vo nm 2010, Kaggle l nn tng trc tuyn phc v cho vic t chc cc cuc thi khai
thc d liu v xy dng m hnh d bo. Mt cng ty no c th phi hp vi Kaggle a ln mng
mt m d liu cng vi bi ton t hng cng ng cc nh khoa hc ca site ny xut gii php.
- im quan trng l cc th sinh" c quyn chnh sa ti lui gii php ca mnh, thc y h v cng
ng n lc tm kim gii php tt hn cho n tn hn cht.
- mi cng ty nh MasterCard, Pfizer, Allstate, Facebook v c NASA u tham gia t chc cuc thi
trn Kaggle. V d nh Cng ty General Electric ti tr cuc thi vit phn mm thit lp ng bay hiu
qu hn cho hng hng khng; hay cng ty Practice Fusion (chuyn v cng ngh sc khe) ti tr mt
cuc thi khc nhm xc nh cc bnh nhn b bnh tiu ng loi 2 da trn h s y t
- Gii thng cho gii php thng cuc trong khong t 3.000 n 250.000 USD. C bit c gii thng tr
gi n 3 triu USD c Heritage Provider Network trao thng.
- Mi ngi u c c hi. Bt k th sinh no, d c xa xi cch tr n u i na u c th nh
gi ti nng ca mnh so vi nhng ngi ng u cng lnh vc. Hn na, trong cc din n ca
Kaggle, cc th sinh c th trao i v trau di k nng. Mt lp trnh vin gii c th tng th hng
nhanh chng bng cch ghi im tt trong hai hoc ba cuc thi.
- Th hng Kaggle tr thnh mt thc o quan trng trong gii khoa hc d liu. Cc cng ty nh
American Express v New York Times bt u lit k th hng Kaggle nh mt chng ch cn thit
trong qung co tm kim nhn ti ca mnh. N khng ch l huy hiu m cn l ch s v nng lc, c
ngha quan trng v gi tr hn cc tiu chun truyn thng v trnh v chuyn mn. Bng cp t cc
trng i hc danh ting v l lch lm vic ti nhng cng ty tn tui nh IBM c th khng c ngha
bng im s Kaggle. Ni cch khc, cng vic c th o m v th hng ca bn trn th trng gi
tr hn ni bn lm vic. Bn CV (Curriculum Vitae l lch lm vic) ri s khng cn cn na?
- Kaggle to nn mt loi th trng lao ng mi, ni m k nng c tch bch khi nhng y nhim
th khng tin cy l bng cp v l lch. y thc s l bc thay i ln.
4
Chapter 3
- Thng thng khi ng tuyn vic lm, ngi s dng lao ng thng b qua vic cp n mc lng.
V khi mt c nhn tm kim mt cng vic, iu ny t ra mt tnh hung kh x, lm h c nguy c
lng ph thi gian qu bu vo mt cng vic vi mc lng thp, hoc b qua qung co vi nguy c b
qua mt c hi vic lm tuyt vi.
- Adzuna l mt cng ty Rao vt Anh vi a s cc qung co v vic lm. V hn mt na trong s
qung co khng lit k mc lng. cung cp dch v tt hn, Adzuna mun cung cp mt s c
tnh v mc lng cho cng vic khi m nh tuyn dng khng lit k. kt thc iu ny, Adzuna
t chc cuc thi Kaggle vi mc tiu nng cao s d on mc lng ca cng vic.
- M hnh thnh cng s kt hp mt s phn tch v tc ng ca vic a cc t kha hoc cm t khc
nhau, cng nh cch s dng trng d liu c cu trc ging nh a im, thi gian hoc cng ty. Mt
s d liu c cu trc hin th c suy ra bi cc quy trnh ring ca Adzuna, da vo ni qung co n
t u hoc ni dung ca n, v c th khng ng nhng li l i din ca cc d liu thc t.
5
Chapter 4
4.1 c d liu
Kaggle cung cp tt c d liu dng .csv nn ta cn c vo R bng phng thc read.csv
trn ta thy:
1. full_time, part_time, contract, permanent l nhng thuc tnh c trong d liu train.
2. Nhng con s th hin tn s xut hin ca thuc tnh .
6
CHAPTER 4. JOB SALARY PREDICTION CHY TRN R
Gn tn s vo Top Sources
7
CHAPTER 4. JOB SALARY PREDICTION CHY TRN R
8
CHAPTER 4. JOB SALARY PREDICTION CHY TRN R
9
CHAPTER 4. JOB SALARY PREDICTION CHY TRN R
10
CHAPTER 4. JOB SALARY PREDICTION CHY TRN R
1. Residual: phn d (hay cn gi l phn khc bit gia gi tr thc t v tin on). Ta k vng n
gn bng 0 bi ton chnh xc hn. Nhng vn dao ng t Min -> Max.
2. Residual Standard Error: c tnh bng cch 0.16 = 0.4 v 244664 l con s thuc tnh c trong
tp d liu train.
3. Multiple R-squared: th hin c 32,69% dao ng ca ton b thuc tnh.
s2 0.1627
R2 = = 0.3266
s2
5. Df (degree of freedom): bc t do
569.91
F = = 3502.58
0.1627
To output
Xut output
11
CHAPTER 4. JOB SALARY PREDICTION CHY TRN R
THE END.
12
CHAPTER 4. JOB SALARY PREDICTION CHY TRN R
13