๋”ฅ๋Ÿฌ๋‹/Today I learned :

[๋”ฅ๋Ÿฌ๋‹] ์„ ํ˜• ํšŒ๊ท€ ์ ์šฉํ•˜๊ธฐ

์ฃผ์˜ ๐Ÿฑ 2021. 3. 26. 14:02
728x90
๋ฐ˜์‘ํ˜•

๋ฐ์ดํ„ฐ ํ™•์ธ

import pandas as pd

df = pd.read_csv(”../dataset/housing.csv”, delim_whitespace=True, header=None)
print(df.info())

Range Index:506 entries,0 to 505

Data columns (total 14 columns):

0

506

non-null

float64

1

506

non-null

float64

13

506

non-null

float64

Dtypes: float64(12), int64(2)

memory usage: 55.4 KB

Index 506๊ฐœ= ์ด ์ƒ˜ํ”Œ์˜ ์ˆ˜๋Š” 506๊ฐœ

์ปฌ๋Ÿผ 14๊ฐœ= 13๊ฐœ์˜ ์†์„ฑ๊ณผ 1๊ฐœ์˜ ํด๋ž˜์Šค

 

0

1

2

3

12

13

0

0.00632

18.0

2.31

0

4.98

24.0

1

0.02731

0

7.07

0

9.14

21.6

2

0.02729

0

7.07

0

4.03

34.7

3

0.03237

0

2.18

0

2.94

33.4

4

0.06905

0

2.18

0

5.33

36.2

0 CRIM : ์ธ๊ตฌ 1์ธ๋‹น ๋ฒ”์ฃ„ ๋ฐœ์ƒ ์ˆ˜

1 ZN : 25,000ํ‰๋ฐฉ ํ”ผํŠธ ์ด์ƒ์˜ ์ฃผ๊ฑฐ ๊ตฌ์—ญ ๋น„์ค‘

2 INDUS : ์†Œ๋งค์—… ์™ธ ์ƒ์—…์ด ์ฐจ์ง€ํ•˜๋Š” ๋ฉด์  ๋น„์œจ

3 CHAS : ์ฐฐ์Šค๊ฐ• ์œ„์น˜ ๋ณ€์ˆ˜(1: ๊ฐ• ์ฃผ๋ณ€ / 0: ์ด์™ธ)

4 NOX: ์ผ์‚ฐํ™”์งˆ์†Œ ๋†๋„

5 RM: ์ง‘์˜ ํ‰๊ท  ๋ฐฉ ์ˆ˜

6 AGE: 1940๋…„ ์ด์ „์— ์ง€์–ด์ง„ ๋น„์œจ

7 DIS: 5๊ฐ€์ง€ ๋ณด์Šคํ„ด ์‹œ ๊ณ ์šฉ ์‹œ์„ค๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ
8 RAD: ์ˆœํ™˜๊ณ ์†๋„๋กœ์˜ ์ ‘๊ทผ ์šฉ์ด์„ฑ
9 TAX: $10,000๋‹น ๋ถ€๋™์‚ฐ ์„ธ์œจ ์ด๊ณ„
10 PTRATIO: ์ง€์—ญ๋ณ„ ํ•™์ƒ๊ณผ ๊ต์‚ฌ ๋น„์œจ
11 B: ์ง€์—ญ๋ณ„ ํ‘์ธ ๋น„์œจ
12 LSTAT : ๊ธ‰์—ฌ๊ฐ€ ๋‚ฎ์€ ์ง์—…์— ์ข…์‚ฌํ•˜๋Š” ์ธ๊ตฌ ๋น„์œจ(%)

13 ๊ฐ€๊ฒฉ(๋‹จ์œ„ $1,000)

 


์„ ํ˜• ํšŒ๊ท€ ์‹คํ–‰ -  ๋งˆ์ง€๋ง‰์— ์ฐธ๊ณผ ๊ฑฐ์ง“์„ ๊ตฌ๋ถ„, ์ถœ๋ ฅ์ธต์— ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์ง€์ •ํ•  ํ•„์š”๋„ ์—†์Šต๋‹ˆ๋‹ค.

 

model = Sequential()
model.add(Dense(30, input_dim=13, activation=‘relu’))
model.add(Dense(6, activation=‘relu’))
model.add(Dense(1))

๋ชจ๋ธ์˜ ํ•™์Šต์ด ์–ด๋Š ์ •๋„ ๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ์˜ˆ์ธก ๊ฐ’๊ณผ ์‹ค์ œ ๊ฐ’์„ ๋น„๊ตํ•˜๋Š” ๋ถ€๋ถ„์„ ์ถ”๊ฐ€

Y_prediction = model.predict(X_test).flatten()
for i in range(10):
    label = Y_test[i]
    prediction = Y_prediction[i]
    print(“์‹ค์ œ๊ฐ€๊ฒฉ: {:.3f}, ์˜ˆ์ƒ๊ฐ€๊ฒฉ: {:.3f}”.format(label, prediction))

flatten() ํ•จ์ˆ˜ : ๋ฐ์ดํ„ฐ ๋ฐฐ์—ด 1์ฐจ์›์œผ๋กœ ๋ฐ”๊ฟ” ์ฝ๊ธฐ ์‰ฝ๊ฒŒ ํ•ด ์ฃผ๋Š” ํ•จ์ˆ˜

 range(‘์ˆซ์ž’)๋Š” 0๋ถ€ํ„ฐ ‘์ˆซ์ž-1’๋งŒํผ ์ฐจ๋ก€๋Œ€๋กœ ์ฆ๊ฐ€ํ•˜๋ฉฐ ๋ฐ˜๋ณต๋˜๋Š” ๊ฐ’์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

( range(10)์€ [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]  )

 

from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
 
import numpy
import pandas as pd
import tensorflow as tf
  
# seed ๊ฐ’ ์„ค์ •
seed = 0
numpy.random.seed(seed)
tf.random.set_seed(3)
 
df = pd.read_csv("../dataset/housing.csv", delim_whitespace=True, header=None)
 
dataset = df.values
X = dataset[:,0:13]
Y = dataset[:,13]
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=seed)
 
model = Sequential()
model.add(Dense(30, input_dim=13, activation='relu'))
model.add(Dense(6, activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X_train, Y_train, epochs=200, batch_size=10)
  
# ์˜ˆ์ธก ๊ฐ’๊ณผ ์‹ค์ œ ๊ฐ’์˜ ๋น„๊ต
Y_prediction = model.predict(X_test).flatten()
for i in range(10):
    label = Y_test[i]
    prediction = Y_prediction[i]
    print("์‹ค์ œ๊ฐ€๊ฒฉ: {:.3f}, ์˜ˆ์ƒ๊ฐ€๊ฒฉ: {:.3f}".format(label, prediction))
์‹คํ–‰๊ฒฐ๊ณผ
์‹ค์ œ๊ฐ€๊ฒฉ: 22.600, ์˜ˆ์ƒ๊ฐ€๊ฒฉ: 20.133
์‹ค์ œ๊ฐ€๊ฒฉ: 50.000, ์˜ˆ์ƒ๊ฐ€๊ฒฉ: 24.157
์‹ค์ œ๊ฐ€๊ฒฉ: 23.000, ์˜ˆ์ƒ๊ฐ€๊ฒฉ: 28.158
์‹ค์ œ๊ฐ€๊ฒฉ: 8.300, ์˜ˆ์ƒ๊ฐ€๊ฒฉ: 13.419
์‹ค์ œ๊ฐ€๊ฒฉ: 21.200, ์˜ˆ์ƒ๊ฐ€๊ฒฉ: 22.280
์‹ค์ œ๊ฐ€๊ฒฉ: 19.900, ์˜ˆ์ƒ๊ฐ€๊ฒฉ: 23.254
์‹ค์ œ๊ฐ€๊ฒฉ: 20.600, ์˜ˆ์ƒ๊ฐ€๊ฒฉ: 20.012
์‹ค์ œ๊ฐ€๊ฒฉ: 18.700, ์˜ˆ์ƒ๊ฐ€๊ฒฉ: 26.365
์‹ค์ œ๊ฐ€๊ฒฉ: 16.100, ์˜ˆ์ƒ๊ฐ€๊ฒฉ: 18.521
์‹ค์ œ๊ฐ€๊ฒฉ: 18.600, ์˜ˆ์ƒ๊ฐ€๊ฒฉ: 11.163

 

 

์‹ค์ œ๊ฐ€๊ฒฉ๊ณผ ์˜ˆ์ƒ๊ฐ€๊ฒฉ์ด ๋น„๋ก€ํ•˜์—ฌ ๋ณ€ํ™”

 

 

 

 

๋ฐ˜์‘ํ˜•