์ž์—ฐ์–ด ์ฒ˜๋ฆฌ/Today I learned :

์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๋ชจ๋ธ์ด ํƒœ์Šคํฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์€? (์ธ ์ปจํ…์ŠคํŠธ ๋Ÿฌ๋‹, ์ œ๋กœ์ƒท, ์›์ƒท ํ“จ์ƒท ๋Ÿฌ๋‹)

์ฃผ์˜ ๐Ÿฑ 2023. 1. 17. 17:15
728x90
๋ฐ˜์‘ํ˜•

์šฐ๋ฆฌ๋Š” BERT, GPT ๋“ฑ ๋‹ค์–‘ํ•œ ํ”„๋ฆฌํŠธ๋ ˆ์ธ ๋œ ๋ชจ๋ธ๋“ค์„ ๊ฐ€์ง€๊ณ  ์–ด๋–ค ํ…Œ์Šคํฌ๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค๋ฉด, ๋ฌธ์„œ ๋ถ„๋ฅ˜, ๊ฐ์ •๋ถ„์„, ์งˆ์˜ ์‘๋‹ต , ๊ฐœ์ฒด๋ช…์ธ์‹,,, ๊ณผ ๊ฐ™์€ ์ผ๋“ค์„ ๋ง์ด์ฃ . ์ด๋•Œ ์ด๋Ÿฌํ•œ ํ…Œ์Šคํฌ๋“ค์€ ๋‹ค์šด์ŠคํŠธ๋ฆผ ํƒœ์Šคํฌ (downstream task)๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋Œ€๋กœ upstream task๋Š” ํ”„๋ฆฌํŠธ๋ ˆ์ธ ์‹œํ‚ค๋Š” ํ•™์Šต ๋ฐฉ์‹์ด๋ผ๊ณ  ์ดํ•ดํ•˜๋ฉด ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. 

data1 ->  model -> upstream task (ex- MLM, ๋‹ค์Œ ๋‹จ์–ด ๋งž์ถ”๊ธฐ) (pretrain)

data2 -> model(์œ„๋ž‘ ๊ฐ™์€ ๋ชจ๋ธ, ์ด๋ฅผ ์ „์ดํ•™์Šต transfer learning ์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.) -> downstream task (NER,QA,text-classification)

 

downstream task๋ฅผ ํ•™์Šตํ•˜๋Š” ๋ฐฉ์‹

- ํŒŒ์ธํŠœ๋‹ fine-tuning, ํ”„๋กฌํ”„ํŠธ ํŠœ๋‹, ์ธ ์ปจํ…์ŠคํŠธ ๋Ÿฌ๋‹ In-context learning

 

ํŒŒ์ธํŠœ๋‹ 

BERT๋ฅผ ๊ณต๋ถ€ํ–ˆ๋‹ค๋ฉด ํŒŒ์ธํŠœ๋‹์— ์ข€ ์ต์ˆ™ํ•  ๊ฒ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด BERT๋ฅผ ํ†ตํ•ด ์˜ํ™” ๋ฆฌ๋ทฐ์˜ ๊ฐ์ •์„ ๋ถ„๋ฅ˜ํ•˜๋Š” ์˜ˆ์ œ๋“ค์„ ์‹คํ–‰ํ–ˆ์„ ๋•Œ downstream task ๋Š” ํ…์ŠคํŠธ ๋ถ„๋ฅ˜์ด๊ณ , ์ด๋ฅผ ์‹œํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ๋ฆฌ๋ทฐ์™€ ๊ธ๋ถ€์ •๋ ˆ์ด๋ธ”์ด ํ•ฉ์ณ์ง„ ๋ฐ์ดํ„ฐ์…‹์„ BERT์— ๋˜ ํ•™์Šต์‹œ์ผœ ์ƒˆ๋กœ์šด ๋ฆฌ๋ทฐ๊ฐ€ ๋“ค์–ด์™”์„ ๋•Œ ๊ธ์ •์ธ์ง€ ๋ถ€์ •์ธ์ง€ ํŒ๋‹จํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ํŒŒ์ธํŠœ๋‹์€ downstream task์˜ ๋ฐ์ดํ„ฐ ์ „์ฒด๋ฅผ ์‚ฌ์šฉํ•ด ๋ชจ๋ธ ์ „์ฒด๋ฅผ ํ•™์Šต์‹œํ‚ค๊ณ  ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ downstream task๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. 

 

 ์ธ ์ปจํ…์ŠคํŠธ ๋Ÿฌ๋‹ In-context learning

์˜ค๋Š˜ ์ค‘์ ์ ์œผ๋กœ ๋ณผ ๊ฒƒ์€ ์ด๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ธ ์ปจํ…์ŠคํŠธ ๋Ÿฌ๋‹์ด๋ž€ downstream task์˜ ๋ฐ์ดํ„ฐ ์ค‘ ์ผ๋ถ€๋งŒ ์‚ฌ์šฉํ•˜๊ณ , ๋ชจ๋ธ์„ ์—…๋ฐ์ดํŠธํ•˜์ง€ ์•Š๊ณ  downstream task๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. 

 

 ์ธ ์ปจํ…์ŠคํŠธ ๋Ÿฌ๋‹ In-context learning ์˜ 3๊ฐ€์ง€ ๋ฐฉ์‹

์ธ ์ปจํ…์ŠคํŠธ ๋Ÿฌ๋‹์—๋Š” ์ œ๋กœ์ƒท, ์›์ƒท ํ“จ์ƒท ๋Ÿฌ๋‹ ์ด 3๊ฐ€์ง€ ๋ฐฉ์‹์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ œ๋กœ ์› ํ“จ๋Š” downstream task์˜ ๋ฐ์ดํ„ฐ ์ค‘ ์ฐธ๊ณ ํ•˜๋Š” ๊ฑด์ˆ˜๋ฅผ ๋œปํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์— ์งˆ๋ฌธ์„ ํ•ด์„œ ๋‹ต์„ ์–ป๊ณ  ์‹ถ์„๋•Œ downstream task์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์•„๋ฌด๊ฒƒ๋„ ์•ˆ๋ณด์—ฌ์ฃผ๊ณ  ๋ƒ…๋‹ค ์งˆ๋ฌธ์„ ๋˜์ง€๋ฉด ์ œ๋กœ์ƒท ๋Ÿฌ๋‹, ํ•œ๊ฐ€์ง€ ์˜ˆ์‹œ๋ฅผ ๋ณด์—ฌ์ฃผ๊ณ  ์งˆ๋ฌธํ•˜๋ฉด ์›์ƒท๋Ÿฌ๋‹, ๋ช‡๊ฐœ ๋ณด์—ฌ์ฃผ๊ณ  ์งˆ๋ฌธ์„ ํ•˜๋ฉด ํ“จ์ƒท๋Ÿฌ๋‹์ด ๋˜๋Š” ๊ฒƒ์ด์ฃ 

BERT์™€ ๋‹ค๋ฅด๊ฒŒ GPT 3๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์•„ ํŒŒ์ธํŠœ๋‹ํ•˜๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด๋Ÿฌํ•œ ๋ฐฉ์‹์„ ์ฑ„ํƒํ•ฉ๋‹ˆ๋‹ค. 

GPT ์˜ ์ž…๋ ฅ์€ BERT์™€๋Š” ์กฐ๊ธˆ ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ๊ธฐ๋ณธ์ ์œผ๋กœ task description๊ณผ prompt๊ฐ€ ๋“ค์–ด๊ฐ€๊ณ   example์€ ์•„์˜ˆ ์•ˆ๋„ฃ์œผ๋ฉด ์ œ๋กœ์ƒท๋Ÿฌ๋‹, example์ด ํ•œ๊ฐœ๋ฉด ์›์ƒท, ์—ฌ๋Ÿฌ๊ฐœ๋ฉด ํ“จ์ƒท ๋Ÿฌ๋‹์ธ ๊ฒƒ์ด์ฃ 

 

GPT ์˜ ์ž…๋ ฅ์˜ ์˜ˆ์‹œ๋ฅผ ๋“ค์–ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์˜์–ด๋‹จ์–ด๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ํ•œ๊ตญ์–ด ๋‹จ์–ด ๋ฒˆ์—ญํ•˜๊ณ  ์‹ถ์„ ๋•Œ, ์ž…๋ ฅ์„ ์ด๋Ÿฐ์‹์œผ๋กœ ์ค๋‹ˆ๋‹ค. 

 

Translate English to Korean: (task description์— ํ•ด๋‹น)

cheese => (prompt์— ํ•ด๋‹น )

 

์ด๋ ‡๊ฒŒ ํ•ด์„œ ๋”ฑ ์•„์›ƒํ’‹์ด '์น˜์ฆˆ'๋กœ ๋‚˜์˜จ๋‹ค๋ฉด ์ œ๋กœ์ƒท ๋Ÿฌ๋‹์ž…๋‹ˆ๋‹ค. 

 

 

์›์ƒท ํ“จ์ƒท ๋Ÿฌ๋‹์€ ๋ช‡๊นŒ์ง€ ์˜ˆ๋ฅผ ์ฃผ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. 

 

Translate English to Korean: (task description)

peppermint =>ํŽ˜ํผ๋ฏผํŠธ (example)

parsely =>ํŒŒ์Šฌ๋ฆฌ (example)

cheese => (prompt )

 

example์ด ํ•œ๊ฐœ๋ฉด ์›์ƒท, ์—ฌ๋Ÿฌ๊ฐœ๋ฉด ํ“จ ์ƒท์ž…๋‹ˆ๋‹ค. 

 

์ด๋ ‡๊ฒŒ๋งŒ ํ•ด์„œ ์„ฑ๋Šฅ์ด ์–ด๋–ป๊ฒŒ ๋˜๋Š”์ง€๋Š” ๋‹ค์Œ ๊ทธ๋ž˜ํ”„๋ฅผ ํ† ์•ป ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 

์„ฑ๋Šฅ์„ ์ธก์ •ํ•œ ๊ฒฐ๊ณผ ํ“จ ์ƒท์ด 50% ์ด์ƒ์˜ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์•„ ๋ช‡๊ฐ€์ง€ ์˜ˆ๋ฅผ ์ฃผ๋Š” ๊ฒƒ๋งŒ์œผ๋กœ๋„ ์•„์›ƒํ’‹์„ ๋‚ผ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ GPT๊ฐ€ ์•Œ๋ ค์ค€ ๊ฒƒ์ด์ฃ 

GPT3๋Š” Open AI์—์„œ API ์‹ ์ฒญ์„ ํ•˜๋ฉด ์‚ฌ์šฉํ•ด ๋ณผ ์ˆ˜๋Š” ์žˆ๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. 

 

 

์ธ ์ปจํ…์ŠคํŠธ ๋Ÿฌ๋‹ ๊ตฌํ˜„

 

gpt-2์—์„œ ํ•œ๋ฒˆ ์ฝ”๋“œ๋ฅผ ํ†ตํ•ด ์ฒดํ—˜ํ•ด๋ณด๋„๋ก ํ•ฉ์‹œ๋‹ค. huggingface ์—์„œ transformers ๋กœ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 

pip install transformers
 
 
ํ•„์š”ํ•œ ํ† ํฌ๋‚˜์ด์ €์™€ ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์˜ต์‹œ๋‹ค. 

from transformers import GPT2LMHeadModel, GPT2TokenizerFast

tokenizer = GPT2TokenizerFast.from_pretrained('gpt2-medium')

model = GPT2LMHeadModel.from_pretrained('gpt2-medium')

 
 
BERT์—์„œ ํ–ˆ๋˜ ๊ฒƒ๊ณผ ๊ฐ™๊ฒŒ ๋ฆฌ๋ทฐ๋ฅผ ๋„ฃ์œผ๋ฉด ๊ทธ๊ฒƒ์ด ๊ธ์ •์ธ์ง€ ๋ถ€์ •์ธ์ง€๋ฅผ ๋ฆฌํ„ดํ•˜๋Š” ํƒœ์Šคํฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋„๋ก ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. 
 

review = input("Write down review")

prompt=f"""\

Classify the given review into positive or negative.

I don't like this movie. This review is negative

{review.strip()}.This review is\

"""

encodings=tokenizer(prompt,return_tensors='pt')

encodings={key:value.cuda() for key,value in encodings.items()}

outputs = model.generate(max_length=40, **encodings)

outputs=outputs[0, encodings['input_ids'].shape[1]:]

outputs = tokenizer.decode(outputs).strip().split('\n',1)[0]

print(outputs)

 

์‹คํ–‰ ๊ฒฐ๊ณผ I love the actor ๋ฅผ ์ž…๋ ฅํ–ˆ๋”๋‹ˆ

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

positive

๋ผ๊ณ  ๋œน๋‹ˆ๋‹ค. ์•ž์— ์„ธํŒ… ~~ํ•˜๋Š” ๋ง์€ ์ง€๊ธˆ์€ ๋ฌด์‹œํ•ด๋„ ๋˜๊ณ , positive๋ผ๊ณ  ์ž˜ ๋‚˜์˜จ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 

 

 
ํ”„๋กฌํฌํŠธ ๋ถ€๋ถ„๋งŒ ๋”ฐ๋กœ ๋–ผ์–ด์„œ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. 
 

prompt=f"""\

Classify the given review into positive or negative.

I don't like this movie. This review is negative

{review.strip()}.This review is\

"""

Classify the given review into positive or negative. ๊ฐ€ ํƒœ์ŠคํŠธ ์„ค๋ช… ๋ถ€๋ถ„์ด๊ณ  
 
I don't like this movie. This review is negative ์˜ˆ์‹œ๋ฅผ ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ํ•œ๊ฐœ์ด๋‹ˆ ์—ฌ๊ธฐ์„œ๋Š” ์›์ƒท์ด๊ตฐ์š”
 
 
 
์˜ˆ์‹œ๋ฅผ ์ด๊ฒƒ๋ณด๋‹ค ๋” ๋งŽ์ด ์“ฐ๋ฉด ํ“จ์ƒท์ด ๋˜๊ณ , ํ•˜๋‚˜๋„ ์•ˆ์“ฐ๋ฉด ์ œ๋กœ์ƒท์ด ๋ฉ๋‹ˆ๋‹ค. 
 
 ๊ทธ๋ž˜ํ”„์—์„œ ๋ดค๋“ฏ์ด ์˜ˆ์‹œ๋ฅผ ๋” ๋งŽ์ด ์จ์ฃผ๋ฉด(ํ“จ์ƒท) ํ‹€๋ฆฐ ์˜ˆ์ธก์„ ํ•  ํ™•๋ฅ ์ด ์ค„์–ด๋“ค ๊ฒ๋‹ˆ๋‹ค. 
 
 
ํฅ๋ฏธ๋กœ์šด ๊ฒƒ์€,  prompt๋ฅผ ์–ด๋–ป๊ฒŒ ์“ฐ๋Š๋ƒ์— ๋”ฐ๋ผ ์„ฑ๋Šฅ์ด ๋‹ค๋ฅผ ์ˆ˜ ์žˆ๊ณ ,  ์ด์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๋„ ํ™œ๋ฐœํžˆ ์ง„ํ–‰์ค‘์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค. 
 

 

์ด์™€ ๊ด€๋ จํ•ด์„œ ์ด๋Ÿฌํ•œ ๋…ผ๋ฌธ๋“ค์„ ์ฐพ์•˜๊ณ 

 

Large Language Models are Zero-Shot Reasoners

https://arxiv.org/abs/2205.11916

 

The Power of Scale for Parameter-Efficient Prompt Tuning

https://arxiv.org/pdf/2104.08691.pdf

 

์š” ๋‘ ๋…ผ๋ฌธ์— ๋Œ€ํ•ด์„œ ์ฝ๊ณ  ์ •๋ฆฌํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. เซฎ • ๏ปŒ -แƒ ♥

๋ฐ˜์‘ํ˜•