๋จธ์‹ ๋Ÿฌ๋‹

[๋จธ์‹ ๋Ÿฌ๋‹4] Logistic Regression ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ pyhton

์ฃผ์˜ ๐Ÿฑ 2022. 11. 30. 14:19
728x90
๋ฐ˜์‘ํ˜•

binary classification์€ ์ข…๋ฅ˜๊ฐ€ 2๊ฐœ๋กœ ๋‚˜๋‰˜์–ด์ง„ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๊ณ  ์ด๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค. 

์ด ๋ฌธ์ œ๋Š” ์˜ˆ์ธก ๊ฐ’์ด ์—ฐ์†์ ์ธ ๊ฐ’์ด ์•„๋‹Œ 0 ๋˜๋Š” 1์ž…๋‹ˆ๋‹ค.

 

์˜ˆ์‹œ

์ด๋ฉ”์ผ : ์ŠคํŒธ์ธ๊ฐ€ / ์•„๋‹Œ๊ฐ€?

์˜จ๋ผ์ธ ๊ฑฐ๋ž˜:  Fraudulent Financial Statement (FFS)์ธ๊ฐ€ / ์•„๋‹Œ๊ฐ€?

์ข…์–‘ : ์•…์„ฑ์ข…์–‘(์•”)์ธ๊ฐ€ / ์–‘์„ฑ์ธ๊ฐ€?

 

์ด๋•Œ๋Š” ์šฐ๋ฆฌ์˜ ์˜ˆ์ธก ๊ฐ’์„ ํ™•๋ฅ  ๊ฐ’์œผ๋กœ ๋งŒ๋“  ๋‹ค์Œ์— ํ™•๋ฅ  ๊ฐ’์ด ์šฐ๋ฆฌ์˜ ๊ธฐ์ค€๋ณด๋‹ค ๋†’์œผ๋ฉด 1, ์•„๋‹ˆ๋ฉด 0์œผ๋กœ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ‘ธ๋Š” ๋ฐฉ๋ฒ•์„ logistic regression์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

 

๋‚˜๋ˆ„๋Š” ์ข…๋ฅ˜๊ฐ€ 3๊ฐœ์ด์ƒ์ด๋ฉด - multi classification 

 

 

Logistic regression์„ ์ง„ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ถœ๋ ฅ ๊ฐ’์„ 0๊ณผ 1์˜ ๊ฐ’์œผ๋กœ ๋งž์ถฐ์ฃผ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด์„œ ์šฐ๋ฆฌ๋Š” logistic function ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. Logistic function์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

Logistic regression์„ ์ง„ํ–‰ํ•  ๋•Œ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ๐‘ฅ, ์‹ค์ œ class ๊ฐ’์„ ๐‘ฆ, ์˜ˆ์ธก๋œ ์ถœ๋ ฅ ๊ฐ’์„ ๐‘ฆฬ‚ ๋ผ๊ณ  ํ•˜๋ฉด ๐‘ฅ๋Š” ๋‘๊ฐ€์ง€ ๋ณ€ํ™˜์„ ๊ฑฐ์ณ์„œ ๐‘ฆฬ‚ ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.

์šฐ๋ฆฌ์˜ ๋ชฉํ‘œ๋Š” ๐‘ฆฬ‚ ๊ฐ€ ์‹ค์ œ ๐‘ฆ์™€ ๊ฐ€์žฅ ๊ฐ€๊น๊ฒŒ ๋˜๋„๋ก ํ•˜๋Š” ๐‘ค์™€ ๐‘๋ฅผ ์ฐพ๋Š” ๊ฒƒ ์ž…๋‹ˆ๋‹ค.

Logistic function์„ ์ฝ”๋“œ๋ฅผ ํ†ตํ•ด์„œ ์•Œ์•„๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

 

import sympy
import numpy

from matplotlib import pyplot
%matplotlib inline

z = sympy.Symbol('z', real=True)

logistic = 1/(1+ sympy.exp(-z))
sympy.plotting.plot(logistic);

์œ„ ๊ทธ๋ž˜ํ”„๋ฅผ ๋ณด๋ฉด ๐‘ง=0 ์ผ ๋•Œ ์ถœ๋ ฅ ๊ฐ’์ด 0.5๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์–‘์ˆ˜ ์ผ ๋•Œ๋Š” 1์— ๊ฐ€๊นŒ์›Œ์ง€๊ณ  ์Œ์ˆ˜์ผ ๋•Œ๋Š” 0์œผ๋กœ ๊ฐ€๊นŒ์›Œ์ง€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ๐‘ง๊ฐ’์„ 0๊ณผ 1 ์‚ฌ์ด๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์ด์ œ ๋ฐ์ดํ„ฐ๋ฅผ ์ง์ ‘ ๋งŒ๋“ค์–ด์„œ ์ง„ํ–‰ํ•ด ๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

 

# synthetic data
x_data = numpy.linspace(-5, 5, 100)
w = 2
b = 1
numpy.random.seed(0)
z_data = w * x_data + b + numpy.random.normal(size=len(x_data))
y_data = 1 / (1+ numpy.exp(-z_data))

pyplot.scatter(x_data, y_data, alpha=0.4);

nosie๋ฅผ ์กฐ๊ธˆ ์ถ”๊ฐ€ํ•ด ๋ฐ์ดํ„ฐ ์ƒ์„ฑ

์ด์ œ ์‹ค์ œ class ๊ฐ’์„ ์ •ํ•ด์ฃผ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. Numpy ํŒจํ‚ค์ง€ ์•ˆ์˜ where ํ•จ์ˆ˜๋กœ 0.5 ๋ณด๋‹ค ํฐ ๊ฐ’์„ 1, ์ž‘์€ ๊ฐ’์„ 0์œผ๋กœ class๋ฅผ ๋ถ€์—ฌํ•ด ์ฃผ๊ฒ ์Šต๋‹ˆ๋‹ค.

y_data = numpy.where(y_data >= 0.5, 1, 0)
pyplot.scatter(x_data, y_data, alpha=0.4);

1.2 Logistic loss function

๐‘ฆฬ‚ ๊ฐ€ ์‹ค์ œ ๐‘ฆ์™€ ๊ฐ€์žฅ ๊ฐ€๊น๊ฒŒ ๋˜๋„๋ก ํ•˜๋Š” ๐‘ค์™€ ๐‘๋ฅผ ์ฐพ์œผ๋ ค๋ฉด ์šฐ๋ฆฌ๋Š” cost function์„ ์ •์˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

Linear regression ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ๋•Œ๋Š” mean square error๋ฅผ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

ํ•˜์ง€๋งŒ logistic regression์— ์ ์šฉํ•˜๋ฉด ๋ฌธ์ œ๊ฐ€ ์ƒ๊น๋‹ˆ๋‹ค.

๊ธฐ์กด์˜ linear regression์—์„œ์˜ mean square error ์—์„œ๋Š”

์˜ ํ˜•ํƒœ๋ฅผ ์ด๋ฃจ๊ณ  ์žˆ์–ด์„œ convex ํ•œ ํ˜•ํƒœ๋ฅผ ์ด๋ฃจ๊ณ  ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋Ÿฐ๋ฐ logistic function์„ ํฌํ•จํ•œ logistic regression์—์„œ๋Š”

๐œŽ์ธ logistic function ๋•Œ๋ฌธ์— ๋”์ด์ƒ convex ํ•œ ํ˜•ํƒœ๊ฐ€ ์•„๋‹™๋‹ˆ๋‹ค. ์˜ˆ์‹œ๋ฅผ ํ†ตํ•ด์„œ ์™œ convex๊ฐ€ ์•„๋‹Œ์ง€ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

๊ฐ„๋‹จํ•œ ์˜ˆ์‹œ๋ฅผ ์œ„ํ•ด ๐‘ค=1,๐‘=0์ผ ๋•Œ 3๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ด์„œ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

(๐‘ฅ,๐‘ฆ):(1,2),(20,1),(5,5) ์ผ ๋•Œ cost function์„ ๊ทธ๋ž˜ํ”„๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

badloss = (2 - 1/(1+ sympy.exp(-z)))**2 + \
          (-1 - 1/(1+ sympy.exp(-20*z)))**2  + \
          (5 - 1/(1+ sympy.exp(-5*z)))**2
badloss

sympy.plotting.plot(badloss, xlim=(-1,1));

Gradient descent ๋ฐฉ์‹์œผ๋กœ ์œ„ cost function์˜ ์ตœ์†Ÿ๊ฐ’์„ ๊ตฌํ•˜๊ฒŒ ๋˜๋ฉด ์ค‘๊ฐ„์— ๊ธฐ์šธ๊ธฐ๊ฐ€ 0์ธ ์ง€์ ์—์„œ ๋ฉˆ์ถ”๊ฒŒ ๋˜๊ณ , ์šฐ๋ฆฌ๋Š” ์›ํ•˜๋Š” ๊ฐ€์žฅ ์ž‘์€ ๊ฐ’์— ๋„๋‹ฌํ•˜์ง€ ๋ชปํ•˜๊ณ  local minimum์— ๋„๋‹ฌํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.


 

๊ทธ๋ž˜์„œ mean square error๋ง๊ณ  ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์„ ์ฐพ๊ธฐ ์œ„ํ•ด cost function์˜ ์˜๋ฏธ๋ฅผ ๋‹ค์‹œ ํ•œ๋ฒˆ ์ƒ๊ฐํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์šฐ๋ฆฌ๊ฐ€ ์–ด๋–ค ๊ฐ’์„ ์˜ˆ์ธกํ•  ๋•Œ ๋งŽ์ด ํ‹€๋ ธ๋‹ค๋ฉด, ์˜ˆ์ธกํ•˜๋Š”๋ฐ ์“ฐ์ธ ๋ณ€์ˆ˜๋“ค์„ ๋งŽ์ด ๋ฐ”๊พธ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ์— ๋น„ํ•ด ์กฐ๊ธˆ ํ‹€๋ ธ๋‹ค๋ฉด, ์ด๋ฏธ ์ž˜ ์˜ˆ์ธกํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋ณ€์ˆ˜๋“ค์„ ์กฐ๊ธˆ ๋ฐ”๊พธ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋งŽ์ด ๋ฐ”๊พธ๊ณ , ์กฐ๊ธˆ ๋ฐ”๊พธ๋Š” ๊ฒƒ์€ ๊ธฐ์šธ๊ธฐ์˜ ํฌ๊ธฐ๊ฐ€ ์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์›๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•ด์„œ linear regression์—์„œ๋Š” square error๋ฅผ ์“ฐ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ด ์›๋ฆฌ๋ฅผ logistic regression์—๋„ ์ ์šฉํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

๐‘ง=๐‘ค๐‘ฅ+๐‘ ์ผ ๋•Œ cost function ๐ฟ์„ b์— ๋Œ€ํ•ด์„œ ๋ฏธ๋ถ„์„ ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. Chain rule์„ ์‚ฌ์šฉํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

a, y = sympy.symbols('a y', real=True)
dLda = (y-a)/a/(1-a)
dLda

L = sympy.integrate(dLda, a)
L

sympy.simplify(L)

์—ฌ๊ธฐ์„œ ๐‘Ž=๐œŽ(๐‘ง)์ด๊ธฐ ๋•Œ๋ฌธ์— ๐‘Ž<1์ด ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ log ์•ˆ์˜ ๊ฐ’์ด ์Œ์ˆ˜๊ฐ€ ๋˜๋ฉด ์•ˆ๋˜๊ธฐ ๋•Œ๋ฌธ์— ์‹์„ ๋ณ€ํ™˜ํ•ด ์ค๋‹ˆ๋‹ค.

L = -y*sympy.log(a) + (y-1)*sympy.log(1-a)
L

์šฐ๋ฆฌ๊ฐ€ ๊ตฌํ•œ cost function์€

์ด์ œ ์‹ค์ œ๋กœ ์ฐจ์ด๊ฐ€ ํด ๋•Œ $L$๊ฐ’์ด ์ปค์ง€๋Š”์ง€ ์•Œ์•„๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
๋จผ์ € ๋งŒ์•ฝ ๐‘ฆ=1์ด๋ผ๋ฉด ๐ฟ=log(๐‘Ž)๋งŒ ๋‚จ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ž˜ํ”„๋กœ ํ‘œํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
sympy.plotting.plot(-sympy.log(a), xlim=(0,1));

์‹ค์ œ class ๊ฐ€ 1์ผ ๋•Œ ์˜ˆ์ธก ๊ฐ’์ด 0์— ๊ฐ€๊นŒ์›Œ์ง€๋ฉด cost function๊ฐ’์ด ์ปค์ง€๊ณ , 1์— ๊ฐ€๊นŒ์›Œ์ง€๋ฉด cost function์ด ์ž‘์•„์ง€๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์šฐ๋ฆฌ๊ฐ€ ์›๋ž˜ ๋ชฉํ‘œํ–ˆ๋˜ ๊ฒƒ๊ณผ ๋™์ผํ•ฉ๋‹ˆ๋‹ค.

์ด์ œ ๐‘ฆ=0์ด๋ผ๋ฉด ๐ฟ=log(1๐‘Ž) ๋งŒ ๋‚จ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ๋˜ํ•œ ๊ทธ๋ž˜ํ”„๋กœ ํ‘œํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

sympy.plotting.plot(-sympy.log(1-a), xlim=(0,1));

์ด๋ฒˆ์—๋„ ์˜ˆ์ธก๊ฐ’์ด ์‹ค์ œ ๊ฐ’์ด๋ž‘ ๊ฐ€๊นŒ์›Œ์ง€๋ฉด cost function๊ฐ’์ด ์ž‘์•„์ง€๊ณ  ๋ฉ€์–ด์ง€๋ฉด ์ปค์ง€๊ฒŒ ๋จ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 


1.3 Find the parameters using autograd function

์ด์ œ logistic regression์˜ ์ „์ฒด์ ์ธ ๊ณผ์ •์„ ์ฝ”๋“œ๋ฅผ ํ†ตํ•ด์„œ ์•Œ์•„๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

 

์ง€๊ธˆ๊นŒ์ง€ diff ๋ฅผ ํ†ตํ•ด์„œ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ตฌํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์‹์ด ๋ณต์žกํ•ด์งˆ์ˆ˜๋ก ์†๋„๊ฐ€ ๋Š๋ ค์ง€๊ธฐ ๋•Œ๋ฌธ์— autograd๋ฅผ ํ†ตํ•ด์„œ ๊ตฌํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

# import the autograd-wrapped version of numpy
from autograd import numpy

# import the gradient calculator
from autograd import grad 

# note: the namespace numpy is the autograd wrapper to NumPy

def logistic(z):
    '''The logistic function'''
    return 1 / (1 + numpy.exp(-z))
    
def logistic_model(params, x):
    '''A prediction model based on the logistic function composed with wx+b
    Arguments:
       params: array(w,b) of model parameters
       x :  array of x data'''
    w = params[0]
    b = params[1]
    z = w * x + b
    y = logistic(z)
    return y

def log_loss(params, model, x, y):
    '''The logistic loss function
    Arguments:
       params: array(w,b) of model parameters
       model:  the Python function for the logistic model
       x, y:   arrays of input data to the model'''
    y_pred = model(params, x)
    return -numpy.mean(y * numpy.log(y_pred) + (1-y) * numpy.log(1 - y_pred))
    
# get a function to compute the gradient of the logistic loss
gradient = grad(log_loss)

์ด ๋–„ grad ํ•จ์ˆ˜๋Š” ๋ณ€์ˆ˜ ๊ฐœ์ˆ˜๋งŒํผ output์„ ๋งŒ๋“ค๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๐‘ค,๐‘ 2๊ฐœ์˜ ๋ณ€์ˆ˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋จผ์ € ์˜ˆ์‹œ๋กœ ๋žœ๋ค์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•œ ๋ณ€์ˆ˜๋ฅผ ๋„ฃ์–ด์„œ ๊ธฐ์šธ๊ธฐ๊ฐ’์„ ๊ตฌํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

numpy.random.seed(0)
params = numpy.random.rand(2)
gradient(params, logistic_model, x_data, y_data)

์ด๋ ‡๊ฒŒ 2๊ฐœ์˜ ๋ณ€์ˆ˜์— ๋Œ€ํ•ด์„œ ๊ฐ๊ฐ ๊ธฐ์šธ๊ธฐ ๊ฐ’์„ ๋ฐ˜ํ™˜ํ•ด์ค๋‹ˆ๋‹ค.

์ด๋ฒˆ์— gradient descent ๋ฅผ ์ง„ํ–‰ํ•  ๋•Œ๋Š” ์ƒˆ๋กœ์šด ์กฐ๊ฑด์„ ์ถ”๊ฐ€ํ•ด์„œ ์ง„ํ–‰ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ์ •ํ•œ ๋ฐ˜๋ณต ์ˆ˜ ์™ธ์˜ ๊ธฐ์šธ๊ธฐ ๊ฐ’์ด 0์— ๊ฐ€๊นŒ์›Œ์ง€๋ฉด ๋”์ด์ƒ ๋ฐ˜๋ณต์„ ํ•˜์ง€ ์•Š๋Š” ์กฐ๊ฑด์„ ์ถ”๊ฐ€ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. 0์— ๊ฐ€๊นŒ์šด ๊ฐ’์„ ์„ค์ •ํ•œ ๋’ค ๊ทธ๊ฒƒ๋ณด๋‹ค ์ž‘์•„์ง€๋ฉด while ๋ฌธ์ด ๋ฉˆ์ถ”๋„๋ก ์„ค์ •ํ•˜์—ฌ์„œ gradient descent ๋ฅผ ์ง„ํ–‰ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

max_iter = 5000
i = 0
descent = numpy.ones(len(params))

while numpy.linalg.norm(descent) > 0.001 and i < max_iter:

    descent = gradient(params, logistic_model, x_data, y_data)
    params = params - descent * 0.01
    i += 1


print('Optimized value of w is {} vs. true value: 2'.format(params[0]))
print('Optimized value of b is {} vs. true value: 1'.format(params[1]))
print('Exited after {} iterations'.format(i))


pyplot.scatter(x_data, y_data, alpha=0.4)
pyplot.plot(x_data, logistic_model(params, x_data), '-r');

๋นจ๊ฐ„์ƒ‰ ๊ณก์„ ์ด ์šฐ๋ฆฌ์˜ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

์ด์ œ ๊ธฐ์ค€๊ฐ’์„ ์ •ํ•˜๊ณ  ๊ทธ๊ฒƒ๋ณด๋‹ค ํฌ๋ฉด 1, ์ž‘์œผ๋ฉด 0์œผ๋กœ ๋ถ„๋ฅ˜๋ฅผ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

์ด๋ฒˆ์—๋Š” 0.5๋กœ ์„ค์ •ํ•ด์„œ ์ง„ํ–‰ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

def decision_boundary(y):
    return 1 if y >= .5 else 0

๋ชจ๋“  ์ ์„ ํ•จ์ˆ˜์— ๋„ฃ์–ด์•ผ ํ•˜๋Š”๋ฐ ํ•˜๋‚˜์”ฉ ๋„ฃ์œผ๋ฉด ๋ฐ˜๋ณต๋ฌธ์„ ๋Œ์•„์•ผํ•ด์„œ ์˜ค๋ž˜๊ฑธ๋ฆฌ๊ธฐ ๋•Œ๋ฌธ์— numpy์˜ vectorize ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

decision_boundary = numpy.vectorize(decision_boundary)

def classify(predictions):
    '''
    Argument:
    predictions, an array of values between 0 and 1
    
    Returns: 
    classified, an array of 0 and 1 values'''

    return decision_boundary(predictions).flatten()

pyplot.scatter(x_data, y_data, alpha=0.4,
               label='true value')
pyplot.scatter(x_data, classify(logistic_model(params, x_data)), alpha=0.4, 
               label='prediciton')

pyplot.legend();

๊ฑฐ์˜ ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋“ค์„ ์ •ํ™•ํ•˜๊ฒŒ ์˜ˆ์ธกํ•œ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ฐ˜์‘ํ˜•