๋”ฅ๋Ÿฌ๋‹/Today I learned :

[๋”ฅ๋Ÿฌ๋‹] ๋‹ค์ค‘ ๋ถ„๋ฅ˜ ๋ฌธ์ œ : ๋ถ“๊ฝƒ(Iris) ํ’ˆ์ข… ๋ถ„๋ฅ˜

์ฃผ์˜ ๐Ÿฑ 2021. 3. 23. 17:20
728x90
๋ฐ˜์‘ํ˜•

 

iris.csv์˜ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ

์ƒ˜ํ”Œ ์ˆ˜: 150

 

์†์„ฑ ์ˆ˜: 4

- ์ •๋ณด 1: ๊ฝƒ๋ฐ›์นจ ๊ธธ์ด (sepal length, ๋‹จ์œ„: cm)

- ์ •๋ณด 2: ๊ฝƒ๋ฐ›์นจ ๋„ˆ๋น„ (sepal width, ๋‹จ์œ„: cm)

- ์ •๋ณด 3: ๊ฝƒ์žŽ ๊ธธ์ด (petal length, ๋‹จ์œ„: cm)

- ์ •๋ณด 4: ๊ฝƒ์žŽ ๋„ˆ๋น„ (petal width, ๋‹จ์œ„: cm)

 

ํด๋ž˜์Šค: Iris-setosa, Iris-versicolor, Iris-virginica

ํด๋ž˜์Šค๊ฐ€ 3๊ฐœ

-  ์ฐธ(1)๊ณผ ๊ฑฐ์ง“(0) = ์ดํ•ญ ๋ถ„๋ฅ˜(binary classification)์™€ ๋‹ค๋ฆ„ , ์—ฌ๋Ÿฌ ๊ฐœ ์ค‘์— ์–ด๋–ค ๊ฒƒ์ด ๋‹ต์ธ์ง€๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฌธ์ œ

-  ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋‹ต ์ค‘ ํ•˜๋‚˜๋ฅผ ๊ณ ๋ฅด๋Š” ๋ถ„๋ฅ˜ ๋ฌธ์ œ = ๋‹ค์ค‘ ๋ถ„๋ฅ˜(multi classification)

 


์ƒ๊ด€๋„ ๊ทธ๋ž˜ํ”„

import pandas as pd
df = pd.read_csv(’../dataset/iris.csv’, names = [“sepal_length”, “sepal_width”, “petal_length”, “petal_width”, “species”])
print(df.head())

 

pairplot( ) :๋ฐ์ดํ„ฐ๋ฅผ ํ•œ๋ˆˆ์— ๋ณด๋Š” ๊ทธ๋ž˜ํ”„ ์ถœ๋ ฅ

import seaborn as sns
import matplotlib.pyplot as plt

sns.pairplot(df, hue=‘species’);
plt.show()

๊ฝƒ์žŽ๊ณผ ๊ฝƒ๋ฐ›์นจ์˜ ํฌ๊ธฐ์™€ ๋„ˆ๋น„๊ฐ€ ํ’ˆ์ข…๋ณ„๋กœ ์ฐจ์ด๊ฐ€ ์žˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

 


Keras ๋กœ ํ’ˆ์ข… ์˜ˆ์ธก

์›-ํ•ซ ์ธ์ฝ”๋”ฉ(one-hot-encoding)

 

 

Iris-setosa, Iris-virginica ๋“ฑ ๋ฐ์ดํ„ฐ ์•ˆ์— ๋ฌธ์ž์—ด์ด ํฌํ•จ๋˜์–ด ์žˆ์„ ๋•Œ -- numpy๋ณด๋‹ค๋Š” pandas๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์™€ X์™€ Y ๊ฐ’์„ ๊ตฌ๋ถ„

 

df = pd.read_csv(’../dataset/iris.csv’, names = [“sepal_length”, “sepal_width”, “petal_length”, “petal_width”, “species”])

# ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜
dataset = df.values
X = dataset[:,0:4].astype(float)
Y_obj = dataset[:,4]

 

Y ๊ฐ’=๋ฌธ์ž์—ด, ์ˆซ์ž๋กœ ๋ฐ”๊ฟ” ์ฃผ๋ ค๋ฉด ํด๋ž˜์Šค ์ด๋ฆ„์„ ์ˆซ์ž ํ˜•ํƒœ๋กœ ๋ฐ”๊ฟ” ์ฃผ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์ด๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ํ•จ์ˆ˜๊ฐ€ sklearn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ LabelEncoder() ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.

 

from sklearn.preprocessing import LabelEncoder

# ๋ฌธ์ž์—ด์„ ์ˆซ์ž๋กœ ๋ณ€ํ™˜
e = LabelEncoder()
e.fit(Y_obj)
Y = e.transform(Y_obj)

 array(['Iris-setosa', 'Iris-versicolor','Iris-virginica'])๊ฐ€ array([1,2,3])๋กœ ๋ฐ”๋€๋‹ˆ๋‹ค.

 

 

 

ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๋ ค๋ฉด Y ๊ฐ’์ด ์ˆซ์ž 0๊ณผ 1๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์ด ์กฐ๊ฑด์„ ๋งŒ์กฑ์‹œํ‚ค๋ ค๋ฉด tf.keras.utils.categorical() ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด์— ๋”ฐ๋ผ Y ๊ฐ’์˜ ํ˜•ํƒœ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ณ€ํ˜•๋ฉ๋‹ˆ๋‹ค.

from tensorflow.keras.utils import np_utils
 
Y_encoded = tf.keras.utils.to_categorical(Y)

์ด๋ ‡๊ฒŒ ํ•˜๋ฉด array([1,2,3])๊ฐ€ ๋‹ค์‹œ array([[1., 0., 0.], [0., 1., 0.],[ 0., 0., 1.]])๋กœ ๋ฐ”๋€๋‹ˆ๋‹ค.

์ด์ฒ˜๋Ÿผ ์—ฌ๋Ÿฌ ๊ฐœ์˜ Y ๊ฐ’์„ 0๊ณผ 1๋กœ๋งŒ ์ด๋ฃจ์–ด์ง„ ํ˜•ํƒœ๋กœ ๋ฐ”๊ฟ” ์ฃผ๋Š” ๊ธฐ๋ฒ•์„ ์›-ํ•ซ ์ธ์ฝ”๋”ฉ(one-hot-encoding)์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

 


ํ™œ์„ฑํ™”ํ•จ์ˆ˜ ์†Œํ”„ํŠธ๋งฅ์Šค(Softmax)

 

 

#๋ชจ๋ธ ์„ค์ •
model = Sequential()
model.add(Dense(16, input_dim=4, activation=‘relu’))
model.add(Dense(3, activation=‘softmax’))

 

๋จผ์ € ์ตœ์ข… ์ถœ๋ ฅ ๊ฐ’์ด 3๊ฐœ ์ค‘ ํ•˜๋‚˜์—ฌ์•ผ ํ•˜๋ฏ€๋กœ ์ถœ๋ ฅ์ธต์— ํ•ด๋‹นํ•˜๋Š” Dense์˜ ๋…ธํŠธ ์ˆ˜๋ฅผ 3์œผ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

 

์†Œํ”„ํŠธ๋งฅ์Šค๋Š” ์ดํ•ฉ์ด 1์ธ ํ˜•ํƒœ๋กœ ๋ฐ”๊ฟ”์„œ ๊ณ„์‚ฐํ•ด ์ฃผ๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.

ํ•ฉ๊ณ„๊ฐ€ 1์ธ ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜ํ•˜๋ฉด ํฐ ๊ฐ’์ด ๋‘๋“œ๋Ÿฌ์ง€๊ฒŒ ๋‚˜ํƒ€๋‚˜๊ณ  ์ž‘์€ ๊ฐ’์€ ๋” ์ž‘์•„์ง‘๋‹ˆ๋‹ค.

์ด ๊ฐ’์ด ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ๋ฅผ ์ง€๋‚˜ [1., 0., 0.]์œผ๋กœ ๋ณ€ํ•˜๊ฒŒ ๋˜๋ฉด ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ์›-ํ•ซ ์ธ์ฝ”๋”ฉ ๊ฐ’, ์ฆ‰ ํ•˜๋‚˜๋งŒ 1์ด๊ณ  ๋‚˜๋จธ์ง€๋Š” ๋ชจ๋‘ 0์ธ ํ˜•ํƒœ๋กœ ์ „ํ™˜

 

 

๋‹ค์ค‘ ๋ถ„๋ฅ˜์— ์ ์ ˆํ•œ ์˜ค์ฐจ ํ•จ์ˆ˜์ธ categorical_crossentropy๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , ์ตœ์ ํ™” ํ•จ์ˆ˜๋กœ adam์„ 

๊ทธ๋ฆฌ๊ณ  ์ „์ฒด ์ƒ˜ํ”Œ์ด 50ํšŒ ๋ฐ˜๋ณต๋  ๋•Œ๊นŒ์ง€ ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜๋˜ ํ•œ ๋ฒˆ์— ์ž…๋ ฅ๋˜๋Š” ๊ฐ’์€ 1๊ฐœ๋กœ

 

์ „์ฒด ์ฝ”๋“œ

# ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜
dataset = df.values
X = dataset[:,0:4].astype(float)
Y_obj = dataset[:,4]

# ๋ฌธ์ž์—ด์„ ์ˆซ์ž๋กœ ๋ณ€ํ™˜
e = LabelEncoder()
e.fit(Y_obj)
Y = e.transform(Y_obj)
Y_encoded = tf.keras.utils.to_categorical(Y)

# ๋ชจ๋ธ์˜ ์„ค์ •
model = Sequential()
model.add(Dense(16,  input_dim=4, activation='relu'))
model.add(Dense(3, activation='softmax'))

# ๋ชจ๋ธ ์ปดํŒŒ์ผ
model.compile(loss='categorical_crossentropy',
            optimizer='adam',
            metrics=['accuracy'])

# ๋ชจ๋ธ ์‹คํ–‰
model.fit(X, Y_encoded, epochs=50, batch_size=1)

# ๊ฒฐ๊ณผ ์ถœ๋ ฅ
print("\n Accuracy: %.4f" % (model.evaluate(X, Y_encoded)[1]))

- 0s 1ms/sample - loss: 0.1042 - accuracy: 0.9867

Accuracy: 0.9867

 

์˜ˆ์ธก ์ •ํ™•๋„: 98.67%

์ด๋Š” 148๊ฐœ์˜ ์ƒ˜ํ”Œ์„ ํ•œ ๋ฒˆ์”ฉ ํ…Œ์ŠคํŠธํ•œ ๊ฒฐ๊ณผ 146๊ฐœ์˜ ํ’ˆ์ข…์„ ์ •ํ™•ํžˆ ๋งžํžˆ๋Š” ํ™•๋ฅ 

๋ฐ˜์‘ํ˜•