μžμ—°μ–΄ 처리/Today I learned :

[μžμ—°μ–΄ 처리] νŒŒμ΄ν† μΉ˜ LSTM κ΅¬ν˜„

주영 🐱 2023. 1. 5. 17:40
728x90
λ°˜μ‘ν˜•

μ΄λ²ˆμ—λŠ” μ‹€μ œλ‘œ LSTM을 κ΅¬ν˜„ν•΄λ³΄κ² μŠ΅λ‹ˆλ‹€. 

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)
`torch.nn` 을 ν™œμš©ν•˜μ—¬ LSTM cell 을 μƒμ„±ν•˜λŠ” 방법은 λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€.

* `input_size` : The number of expected features in the input x

* `hidden_size` : The number of features in the hidden state h



lstm = nn.LSTM(input_size, hidden_size)
# input_size: 3, hidden_size: 3 으둜 μ„€μ •ν•˜μ—¬ LSTM cell 을 μƒμ„±ν•©λ‹ˆλ‹€.
lstm = nn.LSTM(3, 3)

LSTM cell 을 μƒμ„±ν•œ ν›„μ—λŠ”, μž…λ ₯으둜 λ“€μ–΄κ°ˆ input x, hidden state h, cell state c λ₯Ό 생성해야 ν•©λ‹ˆλ‹€.

μœ„μ—μ„œ μ •ν•œ input_size 와 hidden_size λ₯Ό κ³ λ €ν•˜μ—¬ inputs 와 hidden (h 와 c) 을 생성해 λ΄…μ‹œλ‹€.

# sequence length κ°€ 5 인 input을 μƒμ„±ν•©λ‹ˆλ‹€. 
# μ΄λ•Œ, input_size λ₯Ό 3 으둜 μ„€μ •ν–ˆμœΌλ―€λ‘œ, 3 차원 벑터 5개λ₯Ό 생성해야 ν•©λ‹ˆλ‹€.
inputs = [torch.randn(1, 3) for _ in range(5)] 

# lstm 은 input x 와 hidden state h λ₯Ό μž…λ ₯으둜 λ°›κΈ° λ•Œλ¬Έμ—, hidden state 도 생성해 μ€λ‹ˆλ‹€.
# μ΄λ•Œ, hidden_size λ₯Ό 3 으둜 μ„€μ •ν–ˆμœΌλ―€λ‘œ, 3 차원 벑터λ₯Ό μƒμ„±ν•©λ‹ˆλ‹€.
# lstm 의 μž…λ ₯으둜 λ“€μ–΄κ°€λŠ” h λŠ” RNN μ—μ„œμ˜ hidden state 와, lstm μ—μ„œ λ“±μž₯ν•œ κ°œλ…μΈ cell state 둜 κ΅¬μ„±λ˜μ–΄ 있기 λ•Œλ¬Έμ—
# hidden 은 3 차원 벑터 2개둜 κ΅¬μ„±λ˜μ–΄μ•Ό ν•©λ‹ˆλ‹€.
hidden = (torch.randn(1, 1, 3),
          torch.randn(1, 1, 3))

방법 1: Sequence length κ°€ 5 인 input 에 λŒ€ν•˜μ—¬ ν•œ λ²ˆμ— ν•˜λ‚˜μ˜ element λ₯Ό lstm cell 에 ν†΅κ³Όμ‹œν‚΅λ‹ˆλ‹€.

방법 2: 전체 μ‹œν€€μŠ€λ₯Ό ν•œλ²ˆμ— ν†΅κ³Όμ‹œν‚€λŠ” 방법도 μžˆμŠ΅λ‹ˆλ‹€.

LSTM 이 λ°˜ν™˜ν•˜λŠ” 좜λ ₯의 첫 번째 값은 전체 μ‹œν€€μŠ€μ— λŒ€ν•œ ν†΅κ³Όν•œ hidden state 이고, 두 번째 값은 λ§ˆμ§€λ§‰ step 의 hidden state μž…λ‹ˆλ‹€. out κ³Ό hidden μ˜ size λ₯Ό λΉ„κ΅ν•΄λ³΄μ„Έμš”.

inputs = torch.cat(inputs).view(len(inputs), 1, -1) # 방법 2 λ₯Ό μ μš©ν•˜κΈ° μœ„ν•΄ input 을 list κ°€ μ•„λ‹Œ ν•˜λ‚˜μ˜ tensor 둜 concat ν•΄μ€λ‹ˆλ‹€.
hidden = (torch.randn(1, 1, 3), torch.randn(1, 1, 3))  # 방법 2 λ₯Ό μ μš©ν•˜κΈ° μœ„ν•΄ hidden 을 λ‹€μ‹œ μ΄ˆκΈ°ν™”ν•©λ‹ˆλ‹€.
out, hidden = lstm(inputs, hidden)
print(out)
print(hidden)

LSTM 을 μ΄μš©ν•΄ Part-of-Speech (PoS) Tagging 을 ν•˜κΈ° μœ„ν•΄ ν•™μŠ΅ 데이터λ₯Ό μ€€λΉ„ν•©λ‹ˆλ‹€.

  • training_data μ—λŠ” 단어 μ‹œν€€μŠ€μ™€ 각 λ‹¨μ–΄μ˜ ν’ˆμ‚¬ νƒœκ·Έλ₯Ό μ€€λΉ„ν•΄μ•Ό ν•©λ‹ˆλ‹€.
  • word_to_ix: λͺ¨λΈμ˜ μž…λ ₯으둜 μ‚¬μš©ν•˜κΈ° μœ„ν•΄ 각 단어λ₯Ό id 둜 mapping ν•©λ‹ˆλ‹€.
  • tag_to_ix: ν’ˆμ‚¬ νƒœκ·Έ λ˜ν•œ id 둜 mapping ν•©λ‹ˆλ‹€.
def prepare_sequence(seq, to_ix):
    idxs = [to_ix[w] for w in seq]
    return torch.tensor(idxs, dtype=torch.long)


training_data = [
    # Tags are: DET - determiner; NN - noun; V - verb
    # For example, the word "The" is a determiner
    ("The dog ate the apple".split(), ["DET", "NN", "V", "DET", "NN"]),
    ("Everybody read that book".split(), ["NN", "V", "DET", "NN"])
]
word_to_ix = {}
# For each words-list (sentence) and tags-list in each tuple of training_data
for sent, tags in training_data:
    for word in sent:
        if word not in word_to_ix:  # word has not been assigned an index yet
            word_to_ix[word] = len(word_to_ix)  # Assign each word with a unique index
print(word_to_ix)
tag_to_ix = {"DET": 0, "NN": 1, "V": 2}  # Assign each tag with a unique index

# These will usually be more like 32 or 64 dimensional.
# We will keep them small, so we can see how the weights change as we train.
EMBEDDING_DIM = 6
HIDDEN_DIM = 6

{'The': 0, 'dog': 1, 'ate': 2, 'the': 3, 'apple': 4, 'Everybody': 5, 'read': 6, 'that': 7, 'book': 8}

 

Embedding layer, output layer, lstm cell 을 ν¬ν•¨ν•œ LSTMTagger λͺ¨λ“ˆμ„ μ •μ˜ν•©λ‹ˆλ‹€.

  • embeds: input id λ₯Ό embedding layer 둜 encode ν•˜μ—¬ input 에 ν•΄λ‹Ήν•˜λŠ” embedding μƒμ„±ν•©λ‹ˆλ‹€.
  • lstm_out: embedding 을 lstm 에 ν†΅κ³Όν•˜μ—¬ 전체 μ‹œν€€μŠ€μ— λŒ€ν•œ hidden state λ₯Ό μ €μž₯ν•©λ‹ˆλ‹€.
  • tag_space: lstm 의 output 인 hidden 을 μ΄μš©ν•΄ μ‘΄μž¬ν•˜λŠ” tag (DET, NN, V) κ³΅κ°„μœΌλ‘œ linear transform ν•©λ‹ˆλ‹€.
  • tag_scores: 이후 softmax λ₯Ό μ μš©ν•˜μ—¬ 각 tag κ°€ 될 score λ₯Ό μΈ‘μ •ν•©λ‹ˆλ‹€.
class LSTMTagger(nn.Module):

    def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size):
        super(LSTMTagger, self).__init__()
        self.hidden_dim = hidden_dim

        self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)

        # The LSTM takes word embeddings as inputs, and outputs hidden states
        # with dimensionality hidden_dim.
        self.lstm = nn.LSTM(embedding_dim, hidden_dim)

        # The linear layer that maps from hidden state space to tag space
        self.hidden2tag = nn.Linear(hidden_dim, tagset_size)

    def forward(self, sentence):
        embeds = self.word_embeddings(sentence)
        lstm_out, _ = self.lstm(embeds.view(len(sentence), 1, -1))
        tag_space = self.hidden2tag(lstm_out.view(len(sentence), -1))
        tag_scores = F.log_softmax(tag_space, dim=1)
        return tag_scores

model μ„ build ν•˜κ³ , ν•™μŠ΅μ— ν•„μš”ν•œ loss ν•¨μˆ˜μ™€ optimizer λ₯Ό μ„ μ–Έν•©λ‹ˆλ‹€.

model = LSTMTagger(EMBEDDING_DIM, HIDDEN_DIM, len(word_to_ix), len(tag_to_ix))
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

이제, training data λ₯Ό μ΄μš©ν•΄ λͺ¨λΈμ„ ν•™μŠ΅ν•©λ‹ˆλ‹€. μ¦‰, input μ„ LSTMTagger μ— ν†΅κ³Όμ‹œμΌœ κ° λ‹¨μ–΄μ˜ PoS tag λ₯Ό μ˜ˆμΈ‘ν•˜κ³ , μ •λ‹΅ tag μ™€ λΉ„κ΅ν•˜μ—¬ loss λ₯Ό κ³„μ‚°ν•œ ν›„ loss λ₯Ό backpropagate ν•˜μ—¬ λͺ¨λΈ νŒŒλΌλ―Έν„°λ₯Ό μ—…λ°μ΄νŠΈ ν•©λ‹ˆλ‹€

for epoch in range(300):  # again, normally you would NOT do 300 epochs, it is toy data
    for sentence, tags in training_data:
        # Step 1. Remember that Pytorch accumulates gradients.
        # We need to clear them out before each instance
        model.zero_grad()

        # Step 2. Get our inputs ready for the network, that is, turn them into
        # Tensors of word indices.
        sentence_in = prepare_sequence(sentence, word_to_ix)
        targets = prepare_sequence(tags, tag_to_ix)

        # Step 3. Run our forward pass.
        tag_scores = model(sentence_in)

        # Step 4. Compute the loss, gradients, and update the parameters by
        #  calling optimizer.step()
        loss = loss_function(tag_scores, targets)
        loss.backward()
        optimizer.step()

 

 

LSTM 이 μ•„λ‹Œ GRU λ₯Ό μ‚¬μš©ν•˜λ €λ©΄ nn.GRU λ₯Ό ν™œμš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

μœ„μ²˜λŸΌ ν•™μŠ΅ λ°μ΄ν„°λ§Œ ν™œμš©ν•΄ λͺ¨λΈμ„ 평가할 경우 λͺ¨λΈμ˜ generalization μ„±λŠ₯이 μ–΄λ–»κ²Œ λ˜λŠ”μ§€ ν‰κ°€ν•˜κΈ° μ–΄λ ΅μŠ΅λ‹ˆλ‹€. ν•™μŠ΅μ‹œ 보지 λͺ»ν•œ μƒˆλ‘œμš΄ 데이터λ₯Ό μ΄μš©ν•΄ λͺ¨λΈμ„ 평가해야 ν•©λ‹ˆλ‹€. 주어진 데이터λ₯Ό train, test 둜 split ν•˜κ±°λ‚˜ cross-validation 방법을 ν™œμš©ν•΄μ•Ό ν•©λ‹ˆλ‹€.

 

Reference: https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html

 

λ°˜μ‘ν˜•