๋ฐ˜์‘ํ˜•

๋”ฅ๋Ÿฌ๋‹/Today I learned : 50

[๋”ฅ๋Ÿฌ๋‹] collections ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ Counter ํด๋ž˜์Šค

collections ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ Counter ํด๋ž˜์Šค ์นด์šดํ„ฐ ์ฒ˜๋ฆฌ ( ์ˆซ์ž ์„ธ๋Š” ์ฒ˜๋ฆฌ) ํ•จ์ˆ˜ ์ œ๊ณต from collections import Counter list = ['a','b','c','a','a','c'] ๋ฐฐ์—ด list์˜ ์š”์†Œ ์ถœํ˜„ ์ˆ˜๋ฅผ ์„ธ์„œ ์ถœ๋ ฅ ์ด ๋•Œ ๊ฒฐ๊ณผ๋Š” ๋”•์…”๋„ˆ๋ฆฌ ์ž๋ฃŒํ˜• (key:value) counter = counter(list) print(counter) Counter({'a' : 3, 'c' : 2, 'b' : 1}) ์ถœํ˜„ ์ˆœ์„œ๊ฐ€ ๋†’์€ ์ˆœ๋Œ€๋กœ ํ”„๋ฆฐํŠธ most_common์˜ (๋งค๊ฐœ๋ณ€์ˆ˜ n)์„ ์ž…๋ ฅํ•˜๋ฉด ์ƒ์œ„ n ๊ฐœ์˜ ํ‚ค์™€ ๊ฐ’ ๋ฆฌํ„ด ์•„๋ฌด๊ฒƒ๋„ ์ž…๋ ฅํ•˜์ง€ ์•Š์œผ๋ฉด ์ „์ฒด ๋ฆฌํ„ด for elem, cnt in counter.most_common(): print(elem,cnt) a 3 c 2 b 1

[๋”ฅ๋Ÿฌ๋‹] itertools ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

itertools ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋ฐ˜๋ณต ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ ํ•จ์ˆ˜ ์ œ๊ณต import itertools list = [1,2,3,4,5] ์กฐํ•ฉ list์š”์†Œ์˜ ์Œ์„ ์ถ”์ถœํ•˜์—ฌ ์ถœ๋ ฅํ•œ๋‹ค for x in itertools.combinations(list,2): print(x) (1,2) (1,3) (1,4) (1,5) (2,3) (2,4) (2,5) (3,4) (3,5) (4,5) ํ•˜๋‚˜์˜ ์—ฐ์†๋œ ๋ฐฐ์—ด๋กœ ๊ฒฐํ•ฉ list์— a,b,c๋ฅผ ๊ฒฐํ•ฉํ•˜๊ณ  ์š”์†Œ ๊ฐ’ ํ”„๋ฆฐํŠธ for x in itertools.chain(list,['a','b'.'c']): print(x) 1 2 3 4 5 a b c

[๋”ฅ๋Ÿฌ๋‹] Numpy ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

Numpy ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ˆซ์ž ๊ฒŒ์‚ฐ, ๋ฐฐ์—ด ๋‹ค๋ฃจ๋Š” ํ–‰๋ ฌ ์—ฐ์‚ฐ import numpy as np array = np.array([[1,2,3],[4,5,6],[7,8,9]]) print('array=' ,array) print('์š”์†Œ์˜ ์ž๋ฃŒํ˜• : ',array.dtype) print('์š”์†Œ ์ˆ˜ : ',array.size) print('์ฐจ์› ์ˆ˜ : ',array.ndim) print('๊ฐ ์ฐจ์›์˜ ์š”์†Œ ์ˆ˜ : ',array.shape) div_array = array/2 print('๋ฐฐ์—ด ์ „์ฒด ์š”์†Œ๋ฅผ 2๋กœ ๋‚˜๋ˆ„๊ธฐ: ',div_array) div_array1 = array[0][0]/2 print('๋ฐฐ์—ด์˜ ์ฒซ๋ฒˆ์งธ ์š”์†Œ๋ฅผ 2๋กœ ๋‚˜๋ˆ„๊ธฐ: ',div_array1) array= [[1,2,3],[4,5,6],[7,8,9]..

๋”ฅ๋Ÿฌ๋‹์„ ์ด์šฉํ•œ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ NLP

[[ 0 0 1 2] [ 0 0 0 3] [ 4 5 6 7] [ 0 8 9 10] [ 0 11 12 13] [ 0 0 0 14] [ 0 0 0 15] [ 0 0 16 17] [ 0 0 18 19] [ 0 0 0 20]]์ž์—ฐ์–ด = ์šฐ๋ฆฌ๊ฐ€ ํ‰์†Œ์— ๋งํ•˜๋Š” ์Œ์„ฑ์ด๋‚˜ ํ…์ŠคํŠธ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ(Natural Language Processing, NLP) : ์ž์—ฐ์–ด๋ฅผ ์ปดํ“จํ„ฐ๊ฐ€ ์ธ์‹ํ•˜๊ณ  ์ฒ˜๋ฆฌํ•˜๋Š” ๊ฒƒ ํ…์ŠคํŠธ ์ „์ฒ˜๋ฆฌ ๊ณผ์ • ํ† ํฐํ™”(tokenization) : ์ž…๋ ฅ๋œ ํ…์ŠคํŠธ๋ฅผ ์ž˜๊ฒŒ ๋‚˜๋ˆ„๋Š” ๊ณผ์ • keras, text ๋ชจ๋“ˆ์˜ text_to_word_sequence() ํ•จ์ˆ˜ : ๋ฌธ์žฅ์„ ๋‹จ์–ด ๋‹จ์œ„๋กœ ๋‚˜๋ˆ” from tensorflow.keras.preprocessing.text import text_to_word_sequence text ..

[๋”ฅ๋Ÿฌ๋‹] ์ด๋ฏธ์ง€ ์ธ์‹ , ์ปจ๋ณผ๋ฃจ์…˜ ์‹ ๊ฒฝ๋ง(CNN)

MNIST ๋ฐ์ดํ„ฐ์…‹ - ๋ฏธ๊ตญ ๊ตญ๋ฆฝํ‘œ์ค€๊ธฐ์ˆ ์›(NIST)์ด ๊ณ ๋“ฑํ•™์ƒ๊ณผ ์ธ๊ตฌ์กฐ์‚ฌ๊ตญ ์ง์› ๋“ฑ์ด ์“ด ์†๊ธ€์”จ๋ฅผ ์ด์šฉํ•ด ๋งŒ๋“  ๋ฐ์ดํ„ฐ๋กœ ๊ตฌ์„ฑ - 70,000๊ฐœ์˜ ๊ธ€์ž ์ด๋ฏธ์ง€์— ๊ฐ๊ฐ 0๋ถ€ํ„ฐ 9๊นŒ์ง€ ์ด๋ฆ„ํ‘œ๋ฅผ ๋ถ™์ธ ๋ฐ์ดํ„ฐ์…‹ ์†๊ธ€์”จ ์ด๋ฏธ์ง€๋ฅผ ๋ช‡ %๋‚˜ ์ •ํ™•ํžˆ ๋งž์ถœ ์ˆ˜ ์žˆ๋Š”๊ฐ€? MNIST ๋ฐ์ดํ„ฐ๋Š” ์ผ€๋ผ์Šค๋ฅผ ์ด์šฉํ•ด ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ๋‹ค. mnist.load_data() ํ•จ์ˆ˜ : ์‚ฌ์šฉํ•  ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ X : ๋ถˆ๋Ÿฌ์˜จ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ Y_class : ์ด ์ด๋ฏธ์ง€์— 0~9๊นŒ์ง€ ๋ถ™์ธ ์ด๋ฆ„ํ‘œ • ํ•™์Šต์— ์‚ฌ์šฉ๋  ๋ถ€๋ถ„: X_train, Y_class_train • ํ…Œ์ŠคํŠธ์— ์‚ฌ์šฉ๋  ๋ถ€๋ถ„: X_test, Y_class_test from keras.datasets import mnist (X_train, Y_class_train), (X_test, Y_c..

[๋”ฅ๋Ÿฌ๋‹] ์„ ํ˜• ํšŒ๊ท€ ์ ์šฉํ•˜๊ธฐ

๋ฐ์ดํ„ฐ ํ™•์ธ import pandas as pd df = pd.read_csv(”../dataset/housing.csv”, delim_whitespace=True, header=None) print(df.info()) Range Index:506 entries,0 to 505 Data columns (total 14 columns): 0 506 non-null float64 1 506 non-null float64 … … … … 13 506 non-null float64 Dtypes: float64(12), int64(2) memory usage: 55.4 KB Index 506๊ฐœ= ์ด ์ƒ˜ํ”Œ์˜ ์ˆ˜๋Š” 506๊ฐœ ์ปฌ๋Ÿผ 14๊ฐœ= 13๊ฐœ์˜ ์†์„ฑ๊ณผ 1๊ฐœ์˜ ํด๋ž˜์Šค 0 1 2 3 … 12 13 0 0.00632 18..

[๋”ฅ๋Ÿฌ๋‹] ์™€์ธ์˜ ์ข…๋ฅ˜ ์˜ˆ์ธกํ•˜๊ธฐ

df_pre๋ผ๋Š” ๊ณต๊ฐ„์— ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค. sample() ํ•จ์ˆ˜ : ์›๋ณธ ๋ฐ์ดํ„ฐ์˜ ๋ช‡ %๋ฅผ ์‚ฌ์šฉํ• ์ง€๋ฅผ ์ง€์ •, ์›๋ณธ ๋ฐ์ดํ„ฐ์—์„œ ์ •ํ•ด์ง„ ๋น„์œจ๋งŒํผ ๋žœ๋ค์œผ๋กœ ๋ฝ‘์•„์˜ค๋Š” ํ•จ์ˆ˜ frac = 1 : ์›๋ณธ ๋ฐ์ดํ„ฐ์˜ 100%๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๋ผ๋Š” ์˜๋ฏธ (frac = 0.5๋กœ ์ง€์ •ํ•˜๋ฉด 50%๋งŒ ๋žœ๋ค) df_pre = pd.read_csv(’../dataset/wine.csv’, header=None) df = df_pre.sample(frac=1) print(df.info()) Data columns (total 13 columns): 0 6497 non-null float64 1 6497 non-null float64 2 6497 non-null float64 3 6497 non-null float64 4 6497 non-nul..

[๋”ฅ๋Ÿฌ๋‹] ์ดˆ์ŒํŒŒ ๊ด‘๋ฌผ ๋ฐ์ดํ„ฐ : ๊ณผ์ ํ•ฉ ํ”ผํ•˜๊ธฐ

import pandas as pd df = pd.read_csv(’../dataset/sonar.csv’, header=None) print(df.info()) Range Index: 208 entries,0 to 207 Data columns (total 61 columns): 0 208 non-null float64 1 208 non-null float64 … … … … 59 208 non-null float64 60 208 non-null object Dtypes: float64(60), object(1) memory usage: 99.2+ KB Index๊ฐ€ 208๊ฐœ์ด๋ฏ€๋กœ ์ด ์ƒ˜ํ”Œ์˜ ์ˆ˜๋Š” 208๊ฐœ์ด๊ณ , ์ปฌ๋Ÿผ ์ˆ˜๊ฐ€ 61๊ฐœ์ด๋ฏ€๋กœ 60๊ฐœ์˜ ์†์„ฑ๊ณผ 1๊ฐœ์˜ ํด๋ž˜์Šค๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Œ ๋ชจ๋“  ์ปฌ๋Ÿผ์ด ์‹ค์ˆ˜ํ˜•(flo..

[๋”ฅ๋Ÿฌ๋‹] ๋‹ค์ค‘ ๋ถ„๋ฅ˜ ๋ฌธ์ œ : ๋ถ“๊ฝƒ(Iris) ํ’ˆ์ข… ๋ถ„๋ฅ˜

์ƒ˜ํ”Œ ์ˆ˜: 150 ์†์„ฑ ์ˆ˜: 4 - ์ •๋ณด 1: ๊ฝƒ๋ฐ›์นจ ๊ธธ์ด (sepal length, ๋‹จ์œ„: cm) - ์ •๋ณด 2: ๊ฝƒ๋ฐ›์นจ ๋„ˆ๋น„ (sepal width, ๋‹จ์œ„: cm) - ์ •๋ณด 3: ๊ฝƒ์žŽ ๊ธธ์ด (petal length, ๋‹จ์œ„: cm) - ์ •๋ณด 4: ๊ฝƒ์žŽ ๋„ˆ๋น„ (petal width, ๋‹จ์œ„: cm) ํด๋ž˜์Šค: Iris-setosa, Iris-versicolor, Iris-virginica ํด๋ž˜์Šค๊ฐ€ 3๊ฐœ - ์ฐธ(1)๊ณผ ๊ฑฐ์ง“(0) = ์ดํ•ญ ๋ถ„๋ฅ˜(binary classification)์™€ ๋‹ค๋ฆ„ , ์—ฌ๋Ÿฌ ๊ฐœ ์ค‘์— ์–ด๋–ค ๊ฒƒ์ด ๋‹ต์ธ์ง€๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฌธ์ œ - ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋‹ต ์ค‘ ํ•˜๋‚˜๋ฅผ ๊ณ ๋ฅด๋Š” ๋ถ„๋ฅ˜ ๋ฌธ์ œ = ๋‹ค์ค‘ ๋ถ„๋ฅ˜(multi classification) ์ƒ๊ด€๋„ ๊ทธ๋ž˜ํ”„ import pandas as pd df = pd.read..

[๋”ฅ๋Ÿฌ๋‹] ํ”ผ๋งˆ ์ธ๋””์–ธ ๋‹น๋‡จ๋ณ‘ ์˜ˆ์ธกํ•˜๊ธฐ

ํ”ผ๋งˆ์ธ๋””์–ธ์€ ์šฐ๋ฆฌ๋‚˜๋ผ ์‚ฌ๋žŒ๊ณผ ๊ฐ™์€ ๋ฟŒ๋ฆฌ๋ฅผ ๊ฐ€์ง„ ‘๋ชฝ๊ณจ๋กœ์ด๋“œ๊ณ„’๋กœ ์ฃผ์‹์€ ์ฝฉ๊ณผ ํ˜ธ๋ฐ•๊ฐ™์€ ์‹๋ฌผ์ด์—ˆ๊ณ , ๋จน์„ ๊ฒƒ์€ ํ•ญ์ƒ ํ’์กฑํ•˜์ง€ ๋ชปํ–ˆ๋‹ค. ์ด ๋•Œ๋ฌธ์— ๊ทธ๋“ค์€ ์กฐ๊ธˆ๋งŒ ๋จน์–ด๋„ ์ž˜ ์ƒ์กดํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ชธ์†์— ์Œ์‹์„ ์ถ•์ ํ•ด ๋†“์œผ๋ ค๋Š” ์œ ์ „์ž๋ฅผ ๊ฐ€์ง€๊ฒŒ ๋˜์—ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ฏธ๊ตญ์œผ๋กœ ์ด์ฃผํ•˜๊ณ  ์‹์ƒํ™œ์ด ์„œ๊ตฌํ™”๋˜๋ฉด์„œ ๋ชป ๋จน๋Š” ์ƒํ™ฉ์— ์œ ๋ฆฌํ•˜๊ฒŒ ์ ์‘ํ•œ ์œ ์ „์ž๊ฐ€ ๋น„๋งŒ์„ ์ดˆ๋ž˜ํ•ด ๊ฐ์ข… ์„ฑ์ธ๋ณ‘์„ ์œ ๋ฐœํ–ˆ๋‹ค. ๊ทธ ํ›„ ๊ทธ๋“ค์˜ ์ž์‹, ๊ทธ ์ž์‹์˜ ์ž์‹๋“ค๊นŒ์ง€ ๋น„๋งŒ, ๋‹น๋‡จ๋ณ‘ ๋“ฑ ์„ฑ์ธ๋ณ‘ ๋ฐœ๋ณ‘๋ฅ ์ด ์ฆ๊ฐ€ํ•˜๊ธฐ ์‹œ์ž‘ํ–ˆ๋‹ค. ํ”ผ๋งˆ์ธ๋””์–ธ์€ ์„ธ๊ณ„์—์„œ ๋‹น๋‡จ๋ณ‘ ๋ฐœ๋ณ‘๋ฅ ์ด ๊ฐ€์žฅ ๋งŽ์€ ์ข…์กฑ์ด ๋˜์—ˆ๊ณ , ํ”ผ๋งˆ ์กฑ์˜ ๋‚จ์ž 63%, ์—ฌ์ž 70%๊ฐ€ ๋‹น๋‡จ๋ณ‘์— ๊ฑธ๋ ธ๋‹ค๋Š” ๋ณด๊ณ ๊ฐ€ ์ตœ๊ทผ ๋ฐœํ‘œ๋˜๊ธฐ๋„ ํ–ˆ๋‹ค. pima-indians-diabetes.csv : 768๋ช…์˜ ์ธ๋””์–ธ์œผ๋กœ๋ถ€ํ„ฐ 8๊ฐœ์˜ ์ •๋ณด์™€ 1๊ฐœ์˜..

๋ฐ˜์‘ํ˜•