๋ฐ˜์‘ํ˜•

๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ 493

git, github ์›๊ฒฉ์—์„œ ์ฝ”๋“œ ์—…๋ฐ์ดํŠธ ํ•˜๋Š”๋ฒ•

github์—์„œ ๊ณ„์ • ํ† ํฐ ๋ฐœํ–‰ $ git init $ git remote add origin [์›๊ฒฉ์ €์žฅ์†Œ ์ฃผ์†Œ] - [์›๊ฒฉ์ €์žฅ์†Œ ์ฃผ์†Œ] ๋Š” code์—์„œ ๋‚˜์˜ค๋Š” https://~~~git //๋ธŒ๋žœ์น˜ ์ด๋ฆ„ ๋ฐ”๊พธ๊ธฐ $ git branch -m master main //ํŒŒ์ผ ์—…๋กœ๋“œ - add → commit → push ์ˆœ์„œ //์›๊ฒฉ ์ €์žฅ์†Œ์˜ ํŒŒ์ผ ๊ฐ€์ ธ์˜ค๊ธฐ $ git pull (๋˜๋Š” git pull origin [๋ธŒ๋žœ์น˜ ์ด๋ฆ„]) main //๋ชจ๋“  ๋ณ€๊ฒฝ์‚ฌํ•ญ์„ ์˜ฌ๋ฆฌ๋Š” ๊ฒฝ์šฐ $ git add . //ํŠน์ •ํ•œ ํŒŒ์ผ๋งŒ ์˜ฌ๋ฆฌ๋Š” ๊ฒฝ์šฐ $ git add [ํŒŒ์ผ/๋””๋ ‰ํ† ๋ฆฌ] $ git commit -m "commit message" $ git push (๋˜๋Š” git push origin [๋ธŒ๋žœ์น˜ ์ด๋ฆ„]) //์ถ”๊ฐ€์ ์ธ ๋ช…๋ น์–ด //..

[์ฝ”๋”ฉํ…Œ์ŠคํŠธ] ๋ฌธ๋‹จ์—์„œ ๊ฐ€์žฅ ํ”ํ•œ ๋‹จ์–ด ์ฐพ๊ธฐ - re.sub, counter ๊ฐ์ฒด

[๋ฌธ์ œ] paragraph์—์„œ ๋Œ€์†Œ๋ฌธ์ž, ์‰ผํ‘œ ๊ตฌ๋‘์ ๋“ฑ์„ ๋ฌด์‹œํ•˜๊ณ , banned ๋‹จ์–ด์— ํฌํ•จ๋˜์ง€ ์•Š์€ ๋‹จ์–ด ์ค‘ ๊ฐ€์žฅ ๋งŽ์ด ๋“ฑ์žฅํ•œ ๋‹จ์–ด ๋ฐ˜ํ™˜ Example 1: Input: paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.", banned = ["hit"] Output: "ball" Explanation: "hit" occurs 3 times, but it is a banned word. "ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph. Note that words in the paragraph..

๊ฐ์ • ๋ถ„๋ฅ˜ ๋ชจ๋ธ ๋งŒ๋“ค๊ณ  ์„ฑ๋Šฅ ๊ฐœ์„ ๊นŒ์ง€ (BERT, GPT2, RoBERTa, DistilBERT)

๊ฐ„๋‹จํ•œ ๊ธ๋ถ€์ • ์ด์ง„ ๋ถ„๋ฅ˜ ๋ชจ๋ธ์„ ๋งŒ๋“ค์—ˆ๋‹ค. ์ „์ฒด์ฝ”๋“œ๋Š” ๊นƒํ—™์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋‹ค! https://github.com/Juyoung-b/Improving-the-Performance-of-Sentiment-Classification GitHub - Juyoung-b/Improving-the-Performance-of-Sentiment-Classification Contribute to Juyoung-b/Improving-the-Performance-of-Sentiment-Classification development by creating an account on GitHub. github.com ์˜์–ด๋กœ ๋œ ๋ ˆ์Šคํ† ๋ž‘ ๋ฆฌ๋ทฐ๋ฅผ ๊ฐ€์ง€๊ณ , ๊ธ์ •(1), ๋ถ€์ •(0)์œผ๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฐ„๋‹จํ•œ task ๋ชจ๋ธ์ด๋‹ค. ์ด๋ฒˆ ํ”„๋กœ์ ํŠธ์—์„ ..

Einstein summation convention

Einstein summation convention ์„ ์‚ฌ์šฉํ•˜๋ฉด, ํ–‰๋ ฌ์˜ ๊ณฑ์…ˆ์„ ์กฐ๊ธˆ ๋” ๋‹จ์ˆœํ•˜๊ฒŒ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค. ์•„์ธ์Šˆํƒ€์ธ ํ‘œ๊ธฐ๋ฒ• ๋˜๋Š” ์•„์ธ์Šˆํƒ€์ธ์˜ ํ•ฉ ๊ทœ์•ฝ(Einstein summation convention) ์€ ์„ ํ˜•๋Œ€์ˆ˜ํ•™์„ ๋ฌผ๋ฆฌํ•™์— ์‘์šฉํ•˜๋ฉด์„œ ์ขŒํ‘œ๊ณ„์— ๊ด€ํ•œ ๊ณต์‹์„ ๋‹ค๋ฃฐ ๋•Œ ์œ ์šฉํ•œ ํ‘œ๊ธฐ ๊ทœ์น™์ด๋‹ค. ์•Œ๋ฒ ๋ฅดํŠธ ์•„์ธ์Šˆํƒ€์ธ์ด ์ด ํ‘œ๊ธฐ๋ฒ•์„ 1916๋…„์— ์ฒ˜์Œ ์†Œ๊ฐœํ•˜์˜€๋‹ค. ์ถœ์ฒ˜ : ์œ„ํ‚ค๋ฐฑ๊ณผ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ–‰๋ ฌ์˜ ๊ณฑ์…ˆ์—์„œ *a11์—์„œ ์™ผ์ชฝ 1์€ ํ–‰, ์˜ค๋ฅธ์ชฝ1์€ ์—ด์„ ๋‚˜ํƒ€๋‚ด๋Š” ์ˆซ์ž์ด๋‹ค) ํ–‰๋ ฌ A์™€ ํ–‰๋ ฌ B๋ฅผ ๊ณฑํ•œ AB์—์„œ AB23์˜ ๊ณ„์‚ฐ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. AB23 = a21b13 + a22b23+...+ a2nbn3 ์ด๋ฅผ ๊ณต์‹ํ™”ํ•˜๋ฉด, ์ด๋ ‡๊ฒŒ ๋˜๊ณ , Einstein summation convention์„ ์‚ฌ์šฉํ•˜์—ฌ ํ‘œ..

์ฝ”๋žฉ ํ”„๋กœ, ํ”„๋กœ ํ”Œ๋Ÿฌ์Šค ์จ๋ณธ ํ›„๊ธฐ

์ฝ”๋žฉ ํ”„๋กœ ํ”Œ๋Ÿฌ์Šค๋ฅผ ๊ตฌ๋…ํ•œ์ง€ 2์ฃผ ์ •๋„ ์ง€๋‚ฌ๋‹ค. ๊ฐ€๊ฒฉ์€ 49.99๋‹ฌ๋Ÿฌ ํ•œํ™”๋กœ ์•ฝ 64850์› ์ •๋„๋ฅผ ๊ฒฐ์ œํ–ˆ๋‹ค. ์จ๋ณด๊ณ  ๋‚จ๋“ค์—๊ฒŒ ์ถ”์ฒœํ•˜๋Š”๊ฐ€?๋ฅผ ๋ฌผ์–ด๋ณธ๋‹ค๋ฉด no๋ผ๊ณ  ๋งํ•˜๊ณ  ์‹ถ๋‹ค..... ๋ฌผ๋ก  ํ”„๋กœ์ ํŠธ ํฌ๊ธฐ์— ๋”ฐ๋ผ ๋‹ค๋ฅด์ง€๋งŒ ์ฝ”๋žฉ ๊ตฌ๋…์€ ํ”„๋กœ/ ํ”„๋กœํ”Œ๋Ÿฌ์Šค 2๊ฐ€์ง€๊ฐ€ ์žˆ๋‹ค. ๋‚˜๋Š” 8์›”์—๋Š” ํ”„๋กœ๋ฅผ, 1์›”๋ถ€ํ„ฐ๋Š” ํ”Œ๋Ÿฌ์Šค๋ฅผ ๊ตฌ๋…ํ•ด์„œ ์“ฐ๊ณ  ์žˆ๋‹ค. ์ผ๋‹จ ํ”„๋กœ์™€ ํ”„๋กœํ”Œ๋Ÿฌ์Šค์˜ ์†๋„ ์ฒด๊ฐ์€ ์žˆ๋‹ค. ํ™•์‹คํžˆ ๋” ๋น ๋ฅด๊ณ  ๋ฉ”๋ชจ๋ฆฌ๋„ ํฌ๋‹ค. ๊ทธ์น˜๋งŒ...... ์–ธ์ œ๋ถ€ํ„ฐ์ธ๊ฐ€ ์กฐ๊ฑด์œผ๋กœ computer unit์„ ์ฝ”๋žฉํ”„๋กœ๋Š” ํ•œ๋‹ฌ์— 100, ํ”„๋กœํ”Œ๋Ÿฌ์Šค๋Š” 500 ์„ ์ค€๋‹ค๋Š” ๊ฒƒ์ด ์ถ”๊ฐ€๋˜์—ˆ๋‹ค....... ํ”„๋กœํ”Œ๋Ÿฌ์Šค์—ฌ๋„ ๋‹ค์–‘ํ•œ ์กฐ๊ฑด์—์„œ ์—ํฌํฌ๋ฅผ ๋†’๊ฒŒ ํ•ด์„œ ๋Œ๋ฆฌ๋ฉด ํ•œ ์ดํ‹€ ์‚ผ์ผ์ด๋ฉด ๋‹ค ์“ฐ๋Š” ์–‘์ด๋‹ค.... ๋‚ด๊ฐ€ ์ง€๊ธˆ ๊ทธ๋ ‡๋‹ค..... ๋ช‡ ๋ฒˆ ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜๋‹ค๊ฐ€ ๊ฐ‘..

model.train() ๊ณผ model.eval()์˜ ์Šค์œ„์นญ์€ ํ•„์ˆ˜์ผ๊นŒ?

์ž์—ฐ์–ด์ฒ˜๋ฆฌ์—์„œ ๋‹ค์šด์ŠคํŠธ๋ฆผ ํƒœ์Šคํฌ ์ค‘ ์˜ˆ๋ฅผ ๋“ค๋ฉด ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ฌธ์ œ์—์„œ, ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ฌ ๋•Œ train ๊ณผ validation ์œผ๋กœ ๋จผ์ € ์„ฑ๋Šฅ์„ ์ฑ„์ ํ•œ ํ›„,๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ƒˆ๋กœ์šด ์ธํ’‹์œผ๋กœ test์…‹์„ ๋„ฃ์–ด ์˜ˆ์ธก๋œ ๋ ˆ์ด๋ธ” ๊ฐ’์„ ์–ป๋Š”๋‹ค. train ๊ณผ validation์„ ํ•˜๋Š” ๊ณผ์ •์—์„œ, train์„ ํ•˜๊ธฐ์ „ model.train() ์œผ๋กœ train์ƒํƒœ๋กœ ๋งŒ๋“ค์–ด์ฃผ๊ณ , train์ด ๋๋‚˜๋ฉด model.eval()๋กœ ์Šค์œ„์นญํ•˜์—ฌ ๊ฒ€์ฆ์„ ํ•˜๊ณ  ๋‹ค์‹œ train- eval ํ•˜๋Š” ์‹์œผ๋กœ ์—ํฌํฌ ๋งŒํผ ๋Œ๊ฒŒ ๋œ๋‹ค. ์ด ๋•Œ , train ํ•  ๋•Œ๋Š” ๋ฌด์กฐ๊ฑด train mode, validation ํ•  ๋•Œ๋Š” ๋ฌด์กฐ๊ฑด validation ๋ชจ๋“œ์— ์žˆ์–ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์Šค์œ„์นญ์€ ํ•„์ˆ˜ ์ด๋‹ค. ์ฝ”๋“œ์—์„œ๋„ ์ด๋ฅผ ์ˆ˜๋™์œผ๋กœ ๋ช…์‹œํ•ด์•ผ ํ•˜๋Š”์ง€ ๊ถ๊ธˆํ–ˆ์—ˆ๋Š”๋ฐ ๋งŒ์•ฝ e..

์ž์—ฐ์–ด ์ฒ˜๋ฆฌ์—์„œ์˜ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ์ข…๋ฅ˜, ์„ค์ •

ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ๋ž€ ์‚ฌ๋žŒ์ด ์ˆ˜๋™์œผ๋กœ ์กฐ์ •ํ•˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ์ด๋‹ค. ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ž˜ ์กฐ์ •ํ•˜๋ฉด ์„ฑ๋Šฅ์ด ๊ฐœ์„ ๋  ์ˆ˜ ์žˆ๋‹ค. ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ข…๋ฅ˜๊ฐ€ ์žˆ๋‹ค. ์ด๋ฒˆ์—๋Š” ๋”ฅ๋Ÿฌ๋‹์—์„œ ์šฐ๋ฆฌ๊ฐ€ ์กฐ์ž‘ํ•  ์ˆ˜ ์žˆ๋Š” ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ณ  ํŠœ๋‹(์กฐ์ •)ํ•˜๋Š” ๋ฐฉ๋ฒ•๋“ค๋„ ์ •๋ฆฌํ•ด๋ณด๊ณ ์ž ํ•œ๋‹ค. Model-free hyperparameters ( ๋ชจ๋ธ๊ณผ ๊ด€๋ จ ์—†๋Š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ) learning rate pytorch ์—์„œ๋Š” learning rate scheduler๋ฅผ ์ œ๊ณตํ•œ๋‹ค. ์ด ๋‘ ๊ฐ€์ง€๊ฐ€ ์žˆ๋‹ค. 1. LambdaLR 2. stepLR batch size - ํด์ˆ˜๋ก ์ข‹๋‹ค, ์ฃผ์–ด์ง„ GPU์˜ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋ณด๊ณ  ์ตœ์žฌ์˜ ๋ฐฐ์น˜์‚ฌ์ด์ฆˆ๋ฅผ ์„ค์ •ํ•˜๋Š” ๊ฒƒ์ด ์„ฑ๋Šฅ์ด ์ข‹๋‹ค๊ณ  ์•Œ๋ ค์ ธ ์žˆ๋‹ค. ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ๊ฐ€ ํฌ๋ฉด, ํ•œ๋ฒˆ ํ•™์Šตํ• ๋•Œ ๋ณด๋Š” ์ด๋ฏธ์ง€/ํ…์ŠคํŠธ ..

numpy argsort ์˜ ์˜๋ฏธ์™€ ์‚ฌ์šฉ๋ฒ• ์ •๋ฆฌ

์ฝ”๋“œ๋ฅผ ๋ฆฌ๋ทฐํ•˜๋‹ค๊ฐ€ np.argsort ๋ฅผ ๋ณด๊ณ  ์ •๋ ฌ์€ ์ •๋ ฌ์ธ๋ฐ ์–ด๋–ค์‹์œผ๋กœ ์ •๋ ฌ๋˜๋Š” ๊ฑด์ง€ ๊ถ๊ธˆํ•ด์„œ ์ •๋ฆฌํ•ด๋ณด์•˜๋‹ค. ๊ฐ ์›์†Œ์˜ ์ •๋ ฌ ์ธ๋ฑ์Šค ๊ฐ’์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค. numpy.argsort(a, axis=-1, kind=None, order=None) b = [0,1,2,3,10,9,8] x=np.argsort(b) print(x) ๊ฒฐ๊ณผ๊ฐ’ [0 1 2 3 6 5 4] x2 = np.argsort(b)[::-1]. #๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌ ๊ฒฐ๊ณผ๊ฐ’ [4 5 6 3 2 1 0] a = [[0,1,2,3],[1,2,3]] print(np.argsort([len(aa) for aa in a])) [1 0] x = np.array([[0, 3], [2, 2]]) np.argsort(x, axis=0) array([[0, 1], [1,..

์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๋ชจ๋ธ์ด ํƒœ์Šคํฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์€? (์ธ ์ปจํ…์ŠคํŠธ ๋Ÿฌ๋‹, ์ œ๋กœ์ƒท, ์›์ƒท ํ“จ์ƒท ๋Ÿฌ๋‹)

์šฐ๋ฆฌ๋Š” BERT, GPT ๋“ฑ ๋‹ค์–‘ํ•œ ํ”„๋ฆฌํŠธ๋ ˆ์ธ ๋œ ๋ชจ๋ธ๋“ค์„ ๊ฐ€์ง€๊ณ  ์–ด๋–ค ํ…Œ์Šคํฌ๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค๋ฉด, ๋ฌธ์„œ ๋ถ„๋ฅ˜, ๊ฐ์ •๋ถ„์„, ์งˆ์˜ ์‘๋‹ต , ๊ฐœ์ฒด๋ช…์ธ์‹,,, ๊ณผ ๊ฐ™์€ ์ผ๋“ค์„ ๋ง์ด์ฃ . ์ด๋•Œ ์ด๋Ÿฌํ•œ ํ…Œ์Šคํฌ๋“ค์€ ๋‹ค์šด์ŠคํŠธ๋ฆผ ํƒœ์Šคํฌ (downstream task)๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋Œ€๋กœ upstream task๋Š” ํ”„๋ฆฌํŠธ๋ ˆ์ธ ์‹œํ‚ค๋Š” ํ•™์Šต ๋ฐฉ์‹์ด๋ผ๊ณ  ์ดํ•ดํ•˜๋ฉด ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. data1 -> model -> upstream task (ex- MLM, ๋‹ค์Œ ๋‹จ์–ด ๋งž์ถ”๊ธฐ) (pretrain) data2 -> model(์œ„๋ž‘ ๊ฐ™์€ ๋ชจ๋ธ, ์ด๋ฅผ ์ „์ดํ•™์Šต transfer learning ์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.) -> downstream task (NER,QA,text-classification) downstream task๋ฅผ ํ•™์Šตํ•˜๋Š”..

์–ธ์–ด๋ชจ๋ธ GPT

BERT ๊ฐ€ ํŠธ๋žœ์Šคํฌ๋จธ์˜ ์ธ์ฝ”๋”๋ฅผ ํ™œ์šฉํ–ˆ๋‹ค๋ฉด, GPT๋Š” ํŠธ๋žœ์Šคํฌ๋จธ์˜ ๋””์ฝ”๋”๋งŒ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋””์ฝ”๋” ์ค‘์—์„œ๋„ encoder-decoder attention์ด ๋น ์ง„ ๋””์ฝ”๋”๋งŒ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. Masked Multi-Head Attention์—์„œ ์ผ์–ด๋‚˜๋Š” ์ผ์„ ๋ณด๋ฉด, ์ œ๊ฐ€ ๊ณ„์† ์˜ˆ์‹œ๋กœ ๋“œ๋Š” ๋ฌธ์žฅ์„ ๊ฐ€์ ธ์™€ ์ ์šฉํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. '๋‚˜๋Š” ํ† ๋ผ๋ฅผ ํ‚ค์›Œ. ๋ชจ๋“  ์‚ฌ๋žŒ์ด ๊ทธ๋ฅผ ์ข‹์•„ํ•ด'๋ผ๋Š” ๋ฌธ์žฅ์—์„œ ์ฒ˜์Œ์—๋Š” ๋‚˜๋Š”์„ ๋บด๊ณ  ๋ชจ๋‘ ๋งˆ์Šคํ‚น์ฒ˜๋ฆฌํ•ด์ค๋‹ˆ๋‹ค. ๋‚˜๋Š” ๋งŒ๋ณด๊ณ  ํ† ๋ผ๋ฅผ ์„ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ ์œ„ํ•ด์„œ ํ† ๋ผ๋ฅผ์— ํ™•๋ฅ ์„ ๋†’์ด๋Š” ์‹์œผ๋กœ ์—…๋ฐ์ดํ„ฐํ•˜๋ฉฐ ํ•™์Šต์ด ์ง„ํ–‰๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋‚˜๋Š” ํ† ๋ผ๋ฅผ ๋งŒ์œผ๋กœ ํ‚ค์›Œ๋ฅผ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๊ฒŒ , ํ‚ค์›Œ์— ํ™•๋ฅ ์„ ๋†’์ด๋Š” ๋ฐฉ์‹์œผ๋กœ ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. BERT์—์„œ๋Š” ๊ฐ€์šด๋ฐ ๋‹จ์–ด๋ฅผ [MASK]๋กœ ์ฒ˜๋ฆฌํ•˜๊ณ  ์•ž๊ณผ ๋’ค ๋‹จ์–ด๋“ค์„ ๋ณด๊ณ ..

๋ฐ˜์‘ํ˜•