๋”ฅ๋Ÿฌ๋‹/Today I learned :

[๋”ฅ๋Ÿฌ๋‹] ์ดˆ์ŒํŒŒ ๊ด‘๋ฌผ ๋ฐ์ดํ„ฐ : ๊ณผ์ ํ•ฉ ํ”ผํ•˜๊ธฐ

์ฃผ์˜ ๐Ÿฑ 2021. 3. 24. 15:08
728x90
๋ฐ˜์‘ํ˜•

sonar.csv

import pandas as pd
df = pd.read_csv(’../dataset/sonar.csv’, header=None)
print(df.info())

 

Range Index: 208 entries,0 to 207

Data columns (total 61 columns):

0

208

non-null

float64

1

208

non-null

float64

59

208

non-null

float64

60

208

non-null

object

Dtypes: float64(60), object(1)

memory usage: 99.2+ KB

 

Index๊ฐ€ 208๊ฐœ์ด๋ฏ€๋กœ ์ด ์ƒ˜ํ”Œ์˜ ์ˆ˜๋Š” 208๊ฐœ์ด๊ณ , ์ปฌ๋Ÿผ ์ˆ˜๊ฐ€ 61๊ฐœ์ด๋ฏ€๋กœ 60๊ฐœ์˜ ์†์„ฑ๊ณผ 1๊ฐœ์˜ ํด๋ž˜์Šค๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Œ
๋ชจ๋“  ์ปฌ๋Ÿผ์ด ์‹ค์ˆ˜ํ˜•(float64)์ธ๋ฐ, ๋งจ ๋งˆ์ง€๋ง‰ ์ปฌ๋Ÿผ๋งŒ ๊ฐ์ฒดํ˜•์ธ ๊ฒƒ์œผ๋กœ ๋ณด์•„,
๋งˆ์ง€๋ง‰์— ๋‚˜์˜ค๋Š” ์ปฌ๋Ÿผ์€ ํด๋ž˜์Šค์ด๋ฉฐ ๋ฐ์ดํ„ฐํ˜• ๋ณ€ํ™˜์ด ํ•„์š”ํ•œ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Œ

 

from keras.models import Sequential
from keras.layers.core import Dense
from sklearn.preprocessing import LabelEncoder

import pandas as pd
import numpy
import tensorflow as tf

# seed ๊ฐ’ ์„ค์ •
numpy.random.seed(3)
tf.random.set_seed(3)

# ๋ฐ์ดํ„ฐ ์ž…๋ ฅ
df = pd.read_csv('../dataset/sonar.csv', header=None)
'''
# ๋ฐ์ดํ„ฐ ๊ฐœ๊ด„ ๋ณด๊ธฐ
print(df.info())

# ๋ฐ์ดํ„ฐ์˜ ์ผ๋ถ€๋ถ„ ๋ฏธ๋ฆฌ ๋ณด๊ธฐ
print(df.head())
'''
dataset = df.values
X = dataset[:,0:60]
Y_obj = dataset[:,60]

# ๋ฌธ์ž์—ด ๋ณ€ํ™˜
e = LabelEncoder()
e.fit(Y_obj)
Y = e.transform(Y_obj)

# ๋ชจ๋ธ ์„ค์ •
model = Sequential()
model.add(Dense(24,  input_dim=60, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# ๋ชจ๋ธ ์ปดํŒŒ์ผ
model.compile(loss='mean_squared_error',
            optimizer='adam',
            metrics=['accuracy'])

# ๋ชจ๋ธ ์‹คํ–‰
model.fit(X, Y, epochs=200, batch_size=5)

# ๊ฒฐ๊ณผ ์ถœ๋ ฅ
print("\n Accuracy: %.4f" % (model.evaluate(X, Y)[1]))

Accuracy: 1.0000

 

์ •๋ง๋กœ 100% ์ •ํ™•๋„์˜ ๋ชจ๋ธ์ด ๋งŒ๋“ค์–ด์ง„ ๊ฒƒ์ผ๊นŒ?

 

๊ณผ์ ํ•ฉ(overfitting) : ๋ชจ๋ธ์ด ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹ ์•ˆ์—์„œ๋Š” ์ผ์ • ์ˆ˜์ค€ ์ด์ƒ์˜ ์˜ˆ์ธก ์ •ํ™•๋„๋ฅผ ๋ณด์ด์ง€๋งŒ, ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ์ ์šฉํ•˜๋ฉด ์ž˜ ๋งž์ง€ ์•Š๋Š” ๊ฒƒ

์™„์ „ํžˆ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ์ ์šฉํ•˜๋ฉด ์ด ์„ ์„ ํ†ตํ•ด ์ •ํ™•ํžˆ ๋‘ ๊ทธ๋ฃน์œผ๋กœ ๋‚˜๋ˆ„์ง€ ๋ชปํ•˜๊ฒŒ ๋œ๋‹ค

 

์ดˆ๋ก์ƒ‰ ์„ ์€ ๊ณผ์ ํ•ฉ๋œ ๋ชจ๋ธ์„, ๊ฒ€์€์ƒ‰ ์„ ์€ ์ผ๋ฐ˜ ๋ชจ๋ธ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.

๊ณผ์ ํ•ฉ์€ ์ธต์ด ๋„ˆ๋ฌด ๋งŽ๊ฑฐ๋‚˜ ๋ณ€์ˆ˜๊ฐ€ ๋ณต์žกํ•ด์„œ ๋ฐœ์ƒํ•˜๊ธฐ๋„ ํ•˜๊ณ  ํ…Œ์ŠคํŠธ์…‹๊ณผ ํ•™์Šต์…‹์ด ์ค‘๋ณต๋  ๋•Œ ์ƒ๊ธฐ๊ธฐ๋„ ํ•œ๋‹ค. 

 


๊ณผ์ ํ•ฉ์„ ๋ฐฉ์ง€ํ•˜๋ ค๋ฉด ?

 

1.  ํ•™์Šต์„ ํ•˜๋Š” ๋ฐ์ดํ„ฐ์…‹๊ณผ ์ด๋ฅผ ํ…Œ์ŠคํŠธํ•  ๋ฐ์ดํ„ฐ์…‹์„ ์™„์ „ํžˆ ๊ตฌ๋ถ„ํ•œ ๋‹ค์Œ ํ•™์Šต๊ณผ ๋™์‹œ์— ํ…Œ์ŠคํŠธ๋ฅผ ๋ณ‘ํ–‰ํ•˜๋ฉฐ ์ง„ํ–‰

 

 

์˜ˆ๋ฅผ ๋“ค์–ด, ๋ฐ์ดํ„ฐ์…‹์ด ์ด 100๊ฐœ์˜ ์ƒ˜ํ”Œ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋‹ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‘ ๊ฐœ์˜ ์…‹์œผ๋กœ ๋‚˜๋ˆ•๋‹ˆ๋‹ค.

70๊ฐœ ์ƒ˜ํ”Œ์€ ํ•™์Šต์…‹์œผ๋กœ

30๊ฐœ ์ƒ˜ํ”Œ์€ ํ…Œ์ŠคํŠธ์…‹์œผ๋กœ

 

์‹ ๊ฒฝ๋ง์„ ๋งŒ๋“ค์–ด 70๊ฐœ์˜ ์ƒ˜ํ”Œ๋กœ ํ•™์Šต์„ ์ง„ํ–‰ํ•œ ํ›„ ์ด ํ•™์Šต์˜ ๊ฒฐ๊ณผ๋ฅผ ์ €์žฅ=‘๋ชจ๋ธ

๋ชจ๋ธ์€ ๋‹ค๋ฅธ ์…‹์— ์ ์šฉํ•  ๊ฒฝ์šฐ ํ•™์Šต ๋‹จ๊ณ„์—์„œ ๊ฐ์ธ๋˜์—ˆ๋˜ ๊ทธ๋Œ€๋กœ ๋‹ค์‹œ ์ˆ˜ํ–‰, ๋”ฐ๋ผ์„œ ๋‚˜๋จธ์ง€ 30๊ฐœ์˜ ์ƒ˜ํ”Œ๋กœ ์‹คํ—˜ํ•ด์„œ ์ •ํ™•๋„๋ฅผ ์‚ดํŽด๋ณด๋ฉด ํ•™์Šต์ด ์–ผ๋งˆ๋‚˜ ์ž˜ ๋˜์—ˆ๋Š”์ง€๋ฅผ ์•Œ ์ˆ˜ ์žˆ์Œ

๋”ฅ๋Ÿฌ๋‹ ๊ฐ™์€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ถฉ๋ถ„ํžˆ ์กฐ์ ˆํ•˜์—ฌ ๊ฐ€์žฅ ๋‚˜์€ ๋ชจ๋ธ์ด ๋งŒ๋“ค์–ด์ง€๋ฉด, ์ด๋ฅผ ์‹ค์ƒํ™œ์— ๋Œ€์ž…ํ•˜์—ฌ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์ด ๋ฐ”๋กœ ๋จธ์‹ ๋Ÿฌ๋‹์˜ ๊ฐœ๋ฐœ ์ˆœ์„œ

 

 

๊ทธ๋Ÿฐ๋ฐ ์ง€๊ธˆ๊นŒ์ง€ ์šฐ๋ฆฌ๋Š” ํ…Œ์ŠคํŠธ์…‹์„ ๋งŒ๋“ค์ง€ ์•Š๊ณ  ํ•™์Šตํ•ด ์™”์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ๋„ ๋งค๋ฒˆ ์šฐ๋ฆฌ๋Š” ์ •ํ™•๋„(Accuracy)๋ฅผ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์—ˆ์ง€์š”. ์–ด์งธ์„œ ๊ฐ€๋Šฅํ–ˆ์„๊นŒ์š”? ์ง€๊ธˆ๊นŒ์ง€ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•ด ์ •ํ™•๋„๋ฅผ ์ธก์ •ํ•œ ๊ฒƒ์€ ๋ฐ์ดํ„ฐ์— ๋“ค์–ด์žˆ๋Š” ๋ชจ๋“  ์ƒ˜ํ”Œ์„ ๊ทธ๋Œ€๋กœ ํ…Œ์ŠคํŠธ์— ํ™œ์šฉํ•œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.

ํ•™์Šต์— ์‚ฌ์šฉ๋œ ์ƒ˜ํ”Œ์€ ํ…Œ์ŠคํŠธ์— ์“ธ ์ˆ˜ ์—†์œผ๋ฏ€๋กœ ํ•™์Šต ๋‹จ๊ณ„์—์„œ ํ…Œ์ŠคํŠธํ•  ์ƒ˜ํ”Œ์€ ์ž๋™์œผ๋กœ ๋นผ๊ณ , ์ด๋ฅผ ํ…Œ์ŠคํŠธํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋ชจ์•„ ์ •ํ™•๋„๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์ด์ง€์š”. ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•์€ ๋น ๋ฅธ ์‹œ๊ฐ„์— ๋ชจ๋ธ ์„ฑ๋Šฅ์„ ํŒŒ์•…ํ•˜๊ณ  ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€ ์ค๋‹ˆ๋‹ค.

 

ํ•˜์ง€๋งŒ ๋จธ์‹ ๋Ÿฌ๋‹์˜ ์ตœ์ข… ๋ชฉ์ ์€ ๊ณผ๊ฑฐ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ† ๋Œ€๋กœ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.  ์ฆ‰, ํ…Œ์ŠคํŠธ์…‹์„ ๋งŒ๋“ค์–ด ์ •ํ™•ํ•œ ํ‰๊ฐ€๋ฅผ ๋ณ‘ํ–‰ํ•˜๋Š” ๊ฒƒ์ด ๋งค์šฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ํ•™์Šต์…‹๋งŒ ๊ฐ€์ง€๊ณ  ํ‰๊ฐ€ํ• ๋•Œ, ์ธต์„ ๋”ํ•˜๊ฑฐ๋‚˜ ์—ํฌํฌ(epoch) ๊ฐ’์„ ๋†’์—ฌ ์‹คํ–‰ ํšŸ์ˆ˜๋ฅผ ๋Š˜๋ฆฌ๋ฉด ์ •ํ™•๋„๊ฐ€ ๊ณ„์†ํ•ด์„œ ์˜ฌ๋ผ๊ฐˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹๋งŒ์œผ๋กœ ํ‰๊ฐ€ํ•œ ์˜ˆ์ธก ์„ฑ๊ณต๋ฅ ์ด ํ…Œ์ŠคํŠธ์…‹์—์„œ๋„ ๊ทธ๋Œ€๋กœ ๋‚˜ํƒ€๋‚˜์ง€๋Š” ์•Š์Šต๋‹ˆ๋‹ค. ์ฆ‰, ํ•™์Šต์ด ๊นŠ์–ด์ ธ์„œ ํ•™์Šต์…‹ ๋‚ด๋ถ€์—์„œ์˜ ์„ฑ๊ณต๋ฅ ์€ ๋†’์•„์ ธ๋„ ํ…Œ์ŠคํŠธ์…‹์—์„œ๋Š” ํšจ๊ณผ๊ฐ€ ์—†๋‹ค๋ฉด ๊ณผ์ ํ•ฉ์ด ์ผ์–ด๋‚˜๊ณ  ์žˆ๋Š” ๊ฒƒ์ด์ง€์š”. ์ด๋ฅผ ๊ทธ๋ž˜ํ”„๋กœ ํ‘œํ˜„ํ•˜๋ฉด 

ํ•™์Šต์ด ๊ณ„์†๋˜๋ฉด ํ•™์Šต์…‹์—์„œ์˜ ์ •ํ™•๋„๋Š” ๊ณ„์† ์˜ฌ๋ผ๊ฐ€์ง€๋งŒ, ํ…Œ์ŠคํŠธ์…‹์—์„œ๋Š” ๊ณผ์ ํ•ฉ์ด ๋ฐœ์ƒ!

ํ•™์Šต์„ ์ง„ํ–‰ํ•ด๋„ ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ๊ฐ€ ๋” ์ด์ƒ ์ข‹์•„์ง€์ง€ ์•Š๋Š” ์ง€์ ์—์„œ ํ•™์Šต์„ ๋ฉˆ์ถฐ์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ์˜ ํ•™์Šต ์ •๋„๊ฐ€ ๊ฐ€์žฅ ์ ์ ˆํ•œ ๊ฒƒ์œผ๋กœ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

์˜ˆ์ œ์— ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šต์…‹๊ณผ ํ…Œ์ŠคํŠธ์…‹์œผ๋กœ ๋‚˜๋ˆ„๋Š” ์˜ˆ์ œ๋ฅผ ๋งŒ๋“ค์–ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๋ถˆ๋Ÿฌ์˜จ X ๋ฐ์ดํ„ฐ์™€ Y ๋ฐ์ดํ„ฐ์—์„œ ๊ฐ๊ฐ ์ •ํ•ด์ง„ ๋น„์œจ(%)๋งŒํผ ๊ตฌ๋ถ„ํ•˜์—ฌ ํ•œ ๊ทธ๋ฃน์€ ํ•™์Šต์— ์‚ฌ์šฉํ•˜๊ณ  ๋‹ค๋ฅธ ํ•œ ๊ทธ๋ฃน์€ ํ…Œ์ŠคํŠธ์— ์‚ฌ์šฉํ•˜๊ฒŒ ํ•˜๋Š” ํ•จ์ˆ˜๊ฐ€ sklearn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ train_test_split() ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ•™์Šต์…‹๊ณผ ํ…Œ์ŠคํŠธ์…‹์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•™์Šต์…‹์„ 70%, ํ…Œ์ŠคํŠธ์…‹์„ 30%๋กœ ์„ค์ •ํ–ˆ์„ ๋•Œ์˜ ์˜ˆ์ž…๋‹ˆ๋‹ค.

from sklearn.model_selection import train_test_split
  
# ํ•™์Šต์…‹๊ณผ ํ…Œ์ŠคํŠธ์…‹์˜ ๊ตฌ๋ถ„
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=seed)

๊ทธ๋ฆฌ๊ณ  ๋ชจ๋ธ์„ ์‹คํ–‰ํ•˜๋Š” ๋ถ€๋ถ„์—์„œ ์œ„์—์„œ ๋งŒ๋“ค์–ด์ง„ ํ•™์Šต์…‹์œผ๋กœ ํ•™์Šต์„, ํ…Œ์ŠคํŠธ์…‹์œผ๋กœ ํ…Œ์ŠคํŠธ๋ฅผ ํ•˜๊ฒŒ ํ•˜๋ ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

model.fit(X_train, Y_train, epochs=130, batch_size=5)
  
# ํ…Œ์ŠคํŠธ์…‹์— ๋ชจ๋ธ ์ ์šฉ
print("\n Test Accuracy: %.4f" % (model.evaluate(X_test, Y_test)[1]))

 

โ€‹ํ•™์Šต์ด ๋๋‚œ ํ›„ ํ…Œ์ŠคํŠธํ•ด ๋ณธ ๊ฒฐ๊ณผ๊ฐ€ ๋งŒ์กฑ์Šค๋Ÿฌ์šธ ๋•Œ ์ด๋ฅผ ๋ชจ๋ธ๋กœ ์ €์žฅํ•˜์—ฌ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์•ž์„œ ํ•™์Šตํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋ชจ๋ธ๋กœ ์ €์žฅํ•˜๋ ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

from keras.models import load_model
model.save(‘my_model.h5’)

๋ถˆ๋Ÿฌ์˜ค๊ธฐ:

model = load_model(‘my_model.h5’)

 

์ „์ฒด ์ฝ”๋“œ

from keras.models import Sequential, load_model
from keras.layers.core import Dense
from sklearn.preprocessing import LabelEncoder
 
import pandas as pd
import numpy
import tensorflow as tf
  
# seed ๊ฐ’ ์„ค์ •
seed = 0
numpy.random.seed(seed)
tf.random.set_seed(3)
 
df = pd.read_csv('../dataset/sonar.csv', header=None)
 
dataset = df.values
X = dataset[:,0:60]
Y_obj = dataset[:,60]
 
e = LabelEncoder()
e.fit(Y_obj)
Y = e.transform(Y_obj)
  
# ํ•™์Šต์…‹๊ณผ ํ…Œ์ŠคํŠธ์…‹์„ ๋‚˜๋ˆ”
from sklearn.model_selection import train_test_split
 
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=seed)
 
model = Sequential()
model.add(Dense(24, input_dim=60, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
 
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
 
model.fit(X_train, Y_train, epochs=130, batch_size=5)
model.save('my_model.h5')   # ๋ชจ๋ธ์„ ์ปดํ“จํ„ฐ์— ์ €์žฅ
 
del model  # ํ…Œ์ŠคํŠธ๋ฅผ ์œ„ํ•ด ๋ฉ”๋ชจ๋ฆฌ ๋‚ด์˜ ๋ชจ๋ธ์„ ์‚ญ์ œ
model = load_model('my_model.h5') # ๋ชจ๋ธ์„ ์ƒˆ๋กœ ๋ถˆ๋Ÿฌ์˜ด
 
print("\n Test Accuracy: %.4f" % (model.evaluate(X_test, Y_test)[1]))  # ๋ถˆ๋Ÿฌ์˜จ ๋ชจ๋ธ๋กœ ํ…Œ์ŠคํŠธ ์‹คํ–‰

Test Accuracy: 0.8095


k๊ฒน ๊ต์ฐจ ๊ฒ€์ฆ

 

 ๋”ฅ๋Ÿฌ๋‹ ํ˜น์€ ๋จธ์‹ ๋Ÿฌ๋‹ ์ž‘์—…์„ ํ•  ๋•Œ ๋Š˜ ์–ด๋ ค์šด ๋ฌธ์ œ ์ค‘ ํ•˜๋‚˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ถฉ๋ถ„ํžˆ ํ…Œ์ŠคํŠธํ•˜์˜€์–ด๋„ ๋ฐ์ดํ„ฐ๊ฐ€ ์ถฉ๋ถ„์น˜ ์•Š์œผ๋ฉด ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋‚ด๊ธฐ๊ฐ€ ์–ด๋ ต๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์•ž์„œ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋ฐ์ดํ„ฐ์˜ ์•ฝ 70%๋ฅผ ํ•™์Šต์…‹์œผ๋กœ ์จ์•ผ ํ–ˆ์œผ๋ฏ€๋กœ ํ…Œ์ŠคํŠธ์…‹์€ ๊ฒจ์šฐ ์ „์ฒด ๋ฐ์ดํ„ฐ์˜ 30%์— ๊ทธ์ณค์Šต๋‹ˆ๋‹ค. ์ด ์ •๋„ ํ…Œ์ŠคํŠธ๋งŒ์œผ๋กœ๋Š” ์‹ค์ œ๋กœ ์–ผ๋งˆ๋‚˜ ์ž˜ ์ž‘๋™ํ•˜๋Š”์ง€ ํ™•์‹ ํ•˜๊ธฐ๋Š” ์‰ฝ์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

 

์ด๋Ÿฌํ•œ ๋‹จ์ ์„ ๋ณด์™„ํ•˜๊ณ ์ž ๋งŒ๋“  ๋ฐฉ๋ฒ•์ด ๋ฐ”๋กœ k๊ฒน ๊ต์ฐจ ๊ฒ€์ฆ(k-fold cross validation)์ž…๋‹ˆ๋‹ค.

k๊ฒน ๊ต์ฐจ ๊ฒ€์ฆ์ด๋ž€ ๋ฐ์ดํ„ฐ์…‹์„ ์—ฌ๋Ÿฌ ๊ฐœ๋กœ ๋‚˜๋ˆ„์–ด ํ•˜๋‚˜์”ฉ ํ…Œ์ŠคํŠธ์…‹์œผ๋กœ ์‚ฌ์šฉํ•˜๊ณ  ๋‚˜๋จธ์ง€๋ฅผ ๋ชจ๋‘ ํ•ฉํ•ด์„œ ํ•™์Šต์…‹์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋ฐ์ดํ„ฐ์˜ 100%๋ฅผ ํ…Œ์ŠคํŠธ์…‹์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด, 5๊ฒน ๊ต์ฐจ ๊ฒ€์ฆ(5-fold cross validation):

 

sklearn์˜ StratifiedKFold() : ๋ฐ์ดํ„ฐ๋ฅผ ์›ํ•˜๋Š” ์ˆซ์ž๋งŒํผ ์ชผ๊ฐœ ๊ฐ๊ฐ ํ•™์Šต์…‹๊ณผ ํ…Œ์ŠคํŠธ์…‹์œผ๋กœ ์‚ฌ์šฉ๋˜๊ฒŒ ๋งŒ๋“œ๋Š” ํ•จ์ˆ˜

from sklearn.model_selection import StratifiedKFold
n_fold = 10
skf = StratifiedKFold(n_splits=n_fold, shuffle=True, random_state=seed)

 

10๊ฐœ์˜ ํŒŒ์ผ๋กœ ์ชผ๊ฐœ ํ…Œ์ŠคํŠธํ•˜๋Š” 10๊ฒน ๊ต์ฐจ ๊ฒ€์ฆ.

 n_fold์˜ ๊ฐ’์„ 10์œผ๋กœ ์„ค์ •ํ•œ ๋’ค StratifiedKFold() ํ•จ์ˆ˜์— ์ ์šฉ.

๊ทธ ๋‹ค์Œ ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ณ  ์‹คํ–‰ํ•˜๋Š” ๋ถ€๋ถ„์„ for ๊ตฌ๋ฌธ์œผ๋กœ ๋ฌถ์–ด n_fold๋งŒํผ ๋ฐ˜๋ณต

for train, test in skf.split(X, Y):
    model = Sequential()
    model.add(Dense(24, input_dim=60, activation=‘relu’))
    model.add(Dense(10, activation=‘relu’))
    model.add(Dense(1, activation=‘sigmoid’))
    model.compile(loss=‘mean_squared_error’, optimizer=‘adam’, metrics=[‘accuracy’])
    model.fit(X[train], Y[train], epochs=100, batch_size=5)

 

์ •ํ™•๋„(Accuracy)๋ฅผ ๋งค๋ฒˆ ์ €์žฅํ•˜์—ฌ ํ•œ ๋ฒˆ์— ๋ณด์—ฌ์ค„ ์ˆ˜ ์žˆ๊ฒŒ accuracy ๋ฐฐ์—ด์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค

 

accuracy = []
 
for train, test in skf.split(X, Y):
    (์ค‘๋žต)
    k_accuracy = "%.4f" % (model.evaluate(X[test], Y[test])[1])
    accuracy.append(k_accuracy)
 
print("\n %.f fold accuracy:" % n_fold, accuracy)

 

์ „์ฒด ์ฝ”๋“œ

from keras.models import Sequential
from keras.layers.core import Dense
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
 
import numpy
import pandas as pd
import tensorflow as tf
  
# seed ๊ฐ’ ์„ค์ •
seed = 0
numpy.random.seed(seed)
tf.set_random_seed(seed)
 
df = pd.read_csv('../dataset/sonar.csv', header=None)
 
dataset = df.values
X = dataset[:,0:60]
Y_obj = dataset[:,60]
 
e = LabelEncoder()
e.fit(Y_obj)
Y = e.transform(Y_obj)
  
# 10๊ฐœ์˜ ํŒŒ์ผ๋กœ ์ชผ๊ฐฌ
n_fold = 10
skf = StratifiedKFold(n_splits=n_fold, shuffle=True, random_state=seed)
  
# ๋นˆ accuracy ๋ฐฐ์—ด
accuracy = []
  
# ๋ชจ๋ธ์˜ ์„ค์ •, ์ปดํŒŒ์ผ, ์‹คํ–‰
for train, test in skf.split(X, Y):
    model = Sequential()
    model.add(Dense(24, input_dim=60, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
    model.fit(X[train], Y[train], epochs=100, batch_size=5)
    k_accuracy = "%.4f" % (model.evaluate(X[test], Y[test])[1])
    accuracy.append(k_accuracy)
  
# ๊ฒฐ๊ณผ ์ถœ๋ ฅ
print("\n %.f fold accuracy:" % n_fold, accuracy)

 

์‹คํ–‰ ๊ฒฐ๊ณผ

10 fold accuracy: ['0.8182', '0.7143', '0.8095', '0.8095', '0.7619', '0.8095', '0.8571', '0.9500', '0.7500', '0.8000']

10๋ฒˆ์˜ ํ…Œ์ŠคํŠธ๊ฐ’ ์ถœ๋ ฅ๋จ

๋ฐ˜์‘ํ˜•