๋ฐ˜์‘ํ˜•

์ž์—ฐ์–ด ์ฒ˜๋ฆฌ 35

Bag-of-words

Bag-of-words(BoW) is a statistical language model used to analyze text and documents based on word count. BoW๋Š” ๋‹จ์–ด์˜ ๋นˆ๋„์ˆ˜๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ๋‹ค. BoW๊ฐ€์ •์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ์›Œ๋“œ ์ž„๋ฒ ๋”ฉ์€ ๋ฌธ์žฅ ๋‚ด ๋‹จ์–ด ๋“ฑ์žฅ ์ˆœ์„œ๋ฅผ ๊ณ ๋ คํ•˜์ง€ ์•Š๋Š”๋‹ค. ์œ„ ์ด๋ฏธ์ง€ ์ฒ˜๋Ÿผ BoW๋ฅผ ์ ์šฉํ•ด ํ–‰๋ ฌ๋กœ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค. ๋ณด๋‹ค์‹œํ”ผ ์ˆœ์„œ๋ฅผ ๊ณ ๋ คํ•˜์ง„ ์•Š๋Š”๋‹ค. ๋‹จ์–ด ๊ฐœ์ˆ˜๋ฅผ ์„ผ๋‹ค๊ณ  ๋ด๋„ ๋ฌด๋ฐฉํ•˜๋‹ค. {'it':6, 'I':5, 'the': 4, 'to':3....} ํŒŒ์ด์ฌ์—์„œ collection ๋ชจ๋“ˆ์˜ Counter๋ฅผ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ scikit-learn์—์„œ CountVectorizer๋ฅผ ํ†ตํ•ด ์†์‰ฝ๊ฒŒ ํ–‰๋ ฌ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค. TF-IDF ๋‹จ์ˆœํžˆ ๋งŽ์ด ๋“ฑ์žฅํ•˜๋Š” ๋‹จ์–ด๊ฐ€ ์ค‘์š”ํ•œ..

PYTORCH(ํŒŒ์ดํ† ์น˜) cheatsheet

Imports General import torch # root package from torch.utils.data import Dataset, DataLoader # dataset representation and loading Neural Network API import torch.autograd as autograd # computation graph from torch import Tensor # tensor node in the computation graph import torch.nn as nn # neural networks import torch.nn.functional as F # layers, activations and more import torch.optim as optim ..

ํฌํ„ธ์˜ ๊ฒ€์ƒ‰์—”์ง„ ์ž‘๋™ ์›๋ฆฌ, ๊ตฌ๊ธ€์˜ pagerank ์•Œ๊ณ ๋ฆฌ์ฆ˜

๋„ค์ด๋ฒ„๋‚˜ ๊ตฌ๊ธ€๊ฐ™์€ ํฌํ„ธ์—๊ฒ€์ƒ‰์„ ํ•˜๋ฉด 1์ดˆ๋„ ์•ˆ๋ผ์„œ ์—„์ฒญ๋‚˜๊ฒŒ ๋งŽ์€ ๊ฒ€์ƒ‰๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜จ๋‹ค. ๊ฒ€์ƒ‰ ์„œ๋น„์Šค๋Š” ์–ด๋–ค ์›๋ฆฌ๋กœ ์ž‘๋™ํ•˜๋Š”๊ฒƒ์ผ๊นŒ? ๊ฒ€์ƒ‰์—”์ง„์ด ์ •๋ณด๋ฅผ ์ˆ˜์ง‘ํ•˜๊ณ  ์ €์žฅํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฌผ๋ก  ๊ฐ ๊ฒ€์ƒ‰ ์„œ๋น„์Šค๋งˆ๋‹ค ์ฐจ์ด๋Š” ์žˆ๊ฒ ์ง€๋งŒ ์ž‘๋™ ๋ฐฉ์‹์—๋Š” ์ด 3๊ฐ€์ง€ ๋‹จ๊ณ„๊ฐ€ ์žˆ๋‹ค๊ณ  ํ• ์ˆ˜์žˆ๋‹ค. 1. ํฌ๋กค๋ง - Crawling 2. ์ธ๋ฑ์‹ฑ - Indexing 3. ๋žญํ‚น - Ranking 1. ํฌ๋กค๋ง ๋„ค์ด๋ฒ„์˜ ํ™”๋ฉด์„ ๋ณด๋ฉด ๋ฐฐ๋„ˆ ๋ฉ”์ผ ์นดํŽ˜ ๋ธ”๋กœ๊ทธ ์‡ผํ•‘ ๋‰ด์Šค ์ฆ๊ถŒ ๋‚ ์”จ ๋“ฑ๋“ฑ ์ˆ˜๋งŽ์€ ๋งํฌ๋“ค์ด ์ˆจ์–ด์žˆ๋‹ค. ํ™”๋ฉด์˜ ๊ธ€์”จ์—๋Š” ํด๋ฆญํ•˜๋ฉด ๋‹ค๋ฅธ ํŽ˜์ด์ง€๋กœ ๋„˜์–ด๊ฐ€๋Š” ๋งํฌ๋“ค์ด ์ˆจ์–ด์žˆ๋Š”๊ฒƒ์ด๋‹ค. ํฌ๋กค๋ง์€ ์›น ํŽ˜์ด์ง€๋ฅผ ๊ทธ๋Œ€๋กœ ๊ฐ€์ ธ์™€์„œ ๊ฑฐ๊ธฐ์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•˜์—ฌ ์ˆ˜์ง‘ํ•˜๋Š” ๊ฒƒ์œผ๋กœ, ํฌ๋กค๋ง์„ ์ˆ˜ํ–‰ํ•˜๋ฉด ์ด ๋ชจ๋“  ๋งํฌ๋“ค์„ ๋‹ค ๋Œ์•„๋‹ค๋‹ˆ๊ณ  ํŽ˜์ด์ง€ ์•ˆ์— ์žˆ๋Š” ์ •๋ณด๋ฅผ ์ฝ์–ด๋“ค์ด๊ฒŒ ๋œ๋‹ค. ..

์ด๊ฒƒ์ €๊ฒƒ ํ•ด๋ณธ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ธฐ๋ฒ•๋“ค

๋ถ„๋ฅ˜ ๋ฌธ์ œ์—์„œ ํ•ญ์ƒ ๋งž๋‹ฅ๋œจ๋ฆฌ๊ฒŒ ๋˜๋Š” ๋ฌธ์ œ๋Š” ์•„๋ฌด๋ž˜๋„ ๋ฐ์ดํ„ฐ ๋ถˆ๊ท ํ˜•์ด ์•„๋‹๊นŒ ์‹ถ๋‹ค. ํ•œ ์นดํ…Œ๊ณ ๋ฆฌ๋งŒ ๋„ˆ๋ฌด ๋งŽ๊ฑฐ๋‚˜ ๋„ˆ๋ฌด ์ ์–ด ๋‹ค๋ฅธ ์นดํ…Œ๊ณ ๋ฆฌ์™€ ์ฐจ์ด๊ฐ€ ๋‚˜๋Š” ์ƒํ™ฉ์—์„œ ๊ทธ๋Œ€๋กœ ํ•™์Šต์‹œํ‚ค๋ฉด ๋ฐ์ดํ„ฐ๊ฐ€ ์ ์€ ์นดํ…Œ๊ณ ๋ฆฌ์— ๋Œ€ํ•ด์„œ๋Š” ์ž˜ ๋ถ„๋ฅ˜๊ฐ€ ๋˜์ง€ ์•Š๋Š”๋‹ค. ๋…ผ๋ฌธ์„ ์“ธ ๋•Œ๋„ ์ฒ˜์Œ์—๋Š” 17๋ถ„๋ฅ˜๋ฅผ ํ•˜๊ณ ์žํ–ˆ์ง€๋งŒ ๋ฐ์ดํ„ฐ ์ฐจ์ด๊ฐ€ ๋„ˆ๋ฌด ์ปค 5๊ฐ€์ง€ ํฐ ๋ถ„๋ฅ˜๋กœ ๋‚˜๋ˆด์—ˆ๋‹ค. ๊ทธ๋ž˜๋„ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์ด ํ•„์š”ํ•ด์„œ ๋ช‡๊ฐ€์ง€ ํ…Œ์ŠคํŒ…์„ ํ–ˆ๋˜ ๊ธฐ์–ต์ด ๋‚œ๋‹ค. ์•„๋ฌดํŠผ ์ง€๊ธˆ๊นŒ์ง€ ํ•ด๋ณธ ์—ฌ๋Ÿฌ ์‹œ๋„๋“ค์„ ์ •๋ฆฌํ•ด๋ณด๊ณ ์ž ํ•œ๋‹ค. ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ์—์„  ๋Œ€ํ‘œ์ ์œผ๋กœ ์•„๋ž˜์™€ ๊ฐ™์€ ๋„ค๊ฐ€์ง€ ์ฆ๊ฐ• ๊ธฐ๋ฒ•์ด ์กด์žฌํ•œ๋‹ค. - SR (Synonym Replacement): ํŠน์ • ๋‹จ์–ด๋ฅผ ๋น„์Šทํ•œ ์˜๋ฏธ์˜ ์œ ์˜์–ด๋กœ ๊ต์ฒด - RI (Random Insertion): ์ž„์˜์˜ ๋‹จ์–ด๋ฅผ ์‚ฝ์ž… - RS (Random Swap..

์ฝ”๋žฉ์—์„œ mecab ์„ค์น˜ํ•˜๊ธฐ

๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ฝ”๋“œ ์‹คํ–‰ !pip install konlpy from konlpy.tag import Mecab !git clone https://github.com/SOMJANG/Mecab-ko-for-Google-Colab.git %cd Mecab-ko-for-Google-Colab/ !bash install_mecab-ko_on_colab190912.sh ์‹คํ–‰ ํ›„ ๋งˆ์ง€๋ง‰ ์ค„์„ ๋ณด๋ฉด ๋Ÿฐํƒ€์ž„์„ ์žฌ์‹คํ–‰ํ•˜๋ผ๊ณ  ํ•œ๋‹ค. ์žฌ์‹คํ–‰ํ›„ ์œ„์˜ ์ฝ”๋“œ๋ฅผ ๋‹ค์‹œ ์‹คํ–‰ํ•˜์ง€ ์•Š๊ณ  mecab์„ ๋ถˆ๋Ÿฌ์˜ค๋ฉด ๋œ๋‹ค. Successfully Installed Now you can use Mecab from konlpy.tag import Mecab mecab = Mecab() ์‚ฌ์šฉ์ž ์‚ฌ์ „ ์ถ”๊ฐ€ ๋ฐฉ๋ฒ• : https://bit.ly/3k0ZH5..

[๋งˆ์ผ€ํŒ…์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐ๋ถ„์„3-2] ๊ธฐ์ˆ ์  ๋ฐฉ๋ฒ•(CP,RFM, KPI)

Customer Profit(CP) - ์–ด๋–ค ๊ณ ๊ฐ์ด ์ˆ˜์ต์„ฑ์ด ์žˆ๋Š”์ง€ ์ถ”์ ํ•˜๋Š” ๋ฐฉ๋ฒ• - ๊ณ ๊ฐ์„ ๊ด€๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ B2B ๊ด‘๊ณ ์— ์ข…์ข… ์‚ฌ์šฉ๋˜์ง€๋งŒ ๊ณ ๊ฐ์„ ์„ธ๋ถ„ํ™”ํ•˜๊ธฐ ์œ„ํ•ด B2C์—์„œ๋„ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค. 80/20 ๊ทœ์น™ ๊ณ ๊ฐ์˜ 20%๊ฐ€ 80%์˜ ์ด์ต์„ ์ œ๊ณตํ•œ๋‹ค. ๊ณ ๊ฐ ์ˆ˜์ต์„ฑ์€ ๊ทธ๊ฒƒ์ด ์‚ฌ์‹ค์ธ์ง€ ํ™•์ธํ•˜๋Š” ๋ฐ ํฌ์ปค์‹ฑํ•œ๋‹ค. ์ข‹์€ ๊ณ ๊ฐ์€ ์ ์€ ๋…ธ๋ ฅ์„ ํ•„์š”๋กœ ํ•˜๊ณ  ๋†’์€ ์ง€์ถœ์„ ์ดˆ๋ž˜ํ•˜๋ฉฐ, ์ข‹์ง€ ์•Š์€ ๊ณ ๊ฐ์€ ๋†’์€ ๋…ธ๋ ฅ๊ณผ ๋‚ฎ์€ ์ง€์ถœ์„ ์ดˆ๋ž˜ํ•จ์„ ์ดํ•ด CP Score = ์ผ๋…„๊ฐ„ ํ•œ ๊ณ ๊ฐ์œผ๋กœ๋ถ€ํ„ฐ ๋ฒ„๋Š” ์ˆ˜์ต - ์ผ๋…„๊ฐ„ ํ•œ ๊ณ ๊ฐ์„ ์„œํฌํŒ…ํ•˜๋Š” ๋น„์šฉ ์–ด๋–ป๊ฒŒ ๊ณ„์‚ฐํ• ๊นŒ? 1. customer cost๋ฅผ ์ •์˜ํ•œ๋‹ค(marketing cost, service, return, shipping,,,) 2. customer spend๋ฅผ ์ •์˜ํ•œ๋‹ค. (CRM์œผ๋กœ..

[NLP 1-2] BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding ๋…ผ๋ฌธ๋ฆฌ๋ทฐ - 3

#์Šค์Šค๋กœ ๊ณต๋ถ€ํ•˜๊ณ  ๋งŒ๋“  ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค. BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805 (์›๋ฌธ) ์ด์ „ ๊ธ€๊ณผ ์ด์–ด์ง€๋Š” ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค. - Introduction & Related Works - Pre-training - Fine-tuning - Experiment - Conclusion + koBERT fine-tuning์€ ์‚ฌ์ „ํ•™์Šต๋œ ๋ฌธ์žฅ์˜ ๋ฌธ๋งฅ ์ •๋ณด ๋“ฑ์„ ๊ณ ๋ คํ•œ weight ๊ฐ’์„ ๊ฐ€์ง€๊ณ , ์‚ฌ์ „ํ›ˆ๋ จ๋œ BERT์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ทธ๋Œ€๋กœ ๊ฐ€์ ธ์™€ ๋ฌธ์„œ๋ถ„๋ฅ˜, ๊ฐœ์ฒด๋ช…์ธ์‹๊ณผ ๊ฐ™์€ ๊ณผ์ œ์— ์ ์šฉ ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ชจ๋ธ์„ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค. fine-tuning์€ pre-tr..

[NLP 1-1] BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding ๋…ผ๋ฌธ๋ฆฌ๋ทฐ-2

#์Šค์Šค๋กœ ๊ณต๋ถ€ํ•˜๊ณ  ๋งŒ๋“  ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค. BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805 (์›๋ฌธ) ์ด์ „ ๊ธ€๊ณผ ์ด์–ด์ง€๋Š” ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค. - Introduction & Related Works - Pre-training - Fine-tuning - Experiment - Conclusion + koBert BERT๋Š” ๋ฌธ๋งฅ์„ ๋ฐ˜์˜ํ•œ ์ž„๋ฒ ๋”ฉ(Conatextual Embedding)์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ด์ฃผ๋Š” ๋ฒ”์šฉ ์–ธ์–ด ํ‘œํ˜„ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. BERT๋Š” ํฌ๊ฒŒ pre-training(์‚ฌ์ „ ํ•™์Šต), fine-tuning(๋ฏธ์„ธ ์กฐ์ •) ๋‘ ๋‹จ๊ณ„๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Pre-training..

[NLP 1] BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ - Introduction & Related Works

#์Šค์Šค๋กœ ๊ณต๋ถ€ํ•˜๊ณ  ๋งŒ๋“  ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค. BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805 (์›๋ฌธ) ์ž์—ฐ์–ด์ฒ˜๋ฆฌ์—์„œ ๊ฐ€์žฅ ๊ธฐ๋ณธ์ด ๋˜๊ณ  ์ค‘์š”ํ•œ ๋…ผ๋ฌธ ์ค‘ ํ•˜๋‚˜์ธ ๋ฒ„ํŠธ ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค. ์›๋ฌธ์„ ๋ฐ”ํƒ•์œผ๋กœ ์„ค๋ช…ํ•˜์˜€์œผ๋ฉฐ, ์ดํ•ด๋ฅผ ๋•๊ธฐ ์œ„ํ•œ ๋ช‡ ๊ฐ€์ง€ ํ•œ๊ตญ์–ด ์˜ˆ์‹œ๋ฅผ ๋ฆฌ์„œ์น˜ํ•˜์—ฌ ๋„ฃ์—ˆ์Šต๋‹ˆ๋‹ค! ์•„๋งˆ 5๋ถ€๋ถ„์œผ๋กœ ๋‚˜๋ˆ„์–ด ์„ค๋ช…ํ•  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค! - Introduction & Related Works - Pre-training - Fine-tuning - Experiment - Conclusion + koBert BERT๋Š” ๊ตฌ๊ธ€์—์„œ ๊ฐœ๋ฐœํ•œ NLP ์‚ฌ์ „ ํ›ˆ๋ จ ๋ชจ๋ธ๋กœ, ํŠน์ • ๋ถ„์•ผ์— ๊ตญํ•œ๋œ ๊ธฐ์ˆ ์ด..

[NLP] ์…€ํ”„ ์–ดํƒ ์…˜

์…€ํ”„ ์–ดํ…์…˜ ์ˆ˜ํ–‰ ๋Œ€์ƒ = ์ž…๋ ฅ ์‹œํ€€์Šค ์ „์ฒด ๊ฐœ๋ณ„ ๋‹จ์–ด์™€ ์ „์ฒด ์ž…๋ ฅ ์‹œํ€€์Šค๋ฅผ ๋Œ€์ƒ์œผ๋กœ ์–ดํ…์…˜ ๊ณ„์‚ฐ์„ ์ˆ˜ํ–‰ํ•ด ๋ฌธ๋งฅ ์ „์ฒด๋ฅผ ๊ณ ๋ คํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ง€์—ญ์ ์ธ ๋ฌธ๋งฅ๋งŒ ๋ณด๋Š” CNN๊ณผ ์ฐจ์ด๊ฐ€ ์žˆ์Œ ๋ชจ๋“  ๊ฒฝ์šฐ์˜ ์ˆ˜๋ฅผ ๊ณ ๋ ค(๋‹จ์–ด๋“ค ์„œ๋กœ๊ฐ€ ์„œ๋กœ๋ฅผ 1๋Œ€ 1๋กœ ๋ฐ”๋ผ๋ณด๊ฒŒ ํ•จ)ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์‹œํ€€์Šค ๊ธธ์ด๊ฐ€ ๊ธธ์–ด์ง€๋”๋ผ๋„ ์ •๋ณด๋ฅผ ์žŠ๊ฑฐ๋‚˜ ์™œ๊ณกํ•  ์—ผ๋ ค๊ฐ€ ์—†๋‹ค๋Š” ์ ์—์„œ RNN๊ณผ ์ฐจ์ด ์–ดํ…์…˜๊ณผ ์…€ํ”„ ์–ดํ…์…˜ ์ฐจ์ด ์–ดํ…์…˜์€ ์†Œ์Šค ์‹œํ€€์Šค ์ „์ฒด ๋‹จ์–ด๋“ค(์–ด์ œ, ์นดํŽ˜, …, ๋งŽ๋”๋ผ)๊ณผ ํƒ€๊นƒ ์‹œํ€€์Šค ๋‹จ์–ด ํ•˜๋‚˜(cafe) ์‚ฌ์ด๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” ๋ฐ ์“ฐ์ž…๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด ์…€ํ”„ ์–ดํ…์…˜์€ ์ž…๋ ฅ ์‹œํ€€์Šค ์ „์ฒด ๋‹จ์–ด๋“ค ์‚ฌ์ด๋ฅผ ์—ฐ๊ฒฐํ•ฉ๋‹ˆ๋‹ค. ์–ดํ…์…˜์€ RNN ๊ตฌ์กฐ ์œ„์—์„œ ๋™์ž‘ํ•˜์ง€๋งŒ ์…€ํ”„ ์–ดํ…์…˜์€ RNN ์—†์ด ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค. ํƒ€๊นƒ ์–ธ์–ด์˜ ๋‹จ์–ด๋ฅผ 1๊ฐœ ์ƒ์„ฑํ•  ๋•Œ ์–ดํ…์…˜์€ 1ํšŒ ์ˆ˜ํ–‰ํ•˜์ง€๋งŒ ์…€ํ”„์–ดํ…์…˜..

๋ฐ˜์‘ํ˜•