[μμ°μ΄ μ²λ¦¬] νμ΄ν μΉ LSTM ꡬν
μ΄λ²μλ μ€μ λ‘ LSTMμ ꡬνν΄λ³΄κ² μ΅λλ€.
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
torch.manual_seed(1)
`torch.nn` μ νμ©νμ¬ LSTM cell μ μμ±νλ λ°©λ²μ λ€μκ³Ό κ°μ΅λλ€.
* `input_size` : The number of expected features in the input x
* `hidden_size` : The number of features in the hidden state h
lstm = nn.LSTM(input_size, hidden_size)
# input_size: 3, hidden_size: 3 μΌλ‘ μ€μ νμ¬ LSTM cell μ μμ±ν©λλ€.
lstm = nn.LSTM(3, 3)
LSTM cell μ μμ±ν νμλ, μ λ ₯μΌλ‘ λ€μ΄κ° input x, hidden state h, cell state c λ₯Ό μμ±ν΄μΌ ν©λλ€.
μμμ μ ν input_size μ hidden_size λ₯Ό κ³ λ €νμ¬ inputs μ hidden (h μ c) μ μμ±ν΄ λ΄ μλ€.
# sequence length κ° 5 μΈ inputμ μμ±ν©λλ€.
# μ΄λ, input_size λ₯Ό 3 μΌλ‘ μ€μ νμΌλ―λ‘, 3 μ°¨μ λ²‘ν° 5κ°λ₯Ό μμ±ν΄μΌ ν©λλ€.
inputs = [torch.randn(1, 3) for _ in range(5)]
# lstm μ input x μ hidden state h λ₯Ό μ
λ ₯μΌλ‘ λ°κΈ° λλ¬Έμ, hidden state λ μμ±ν΄ μ€λλ€.
# μ΄λ, hidden_size λ₯Ό 3 μΌλ‘ μ€μ νμΌλ―λ‘, 3 μ°¨μ 벑ν°λ₯Ό μμ±ν©λλ€.
# lstm μ μ
λ ₯μΌλ‘ λ€μ΄κ°λ h λ RNN μμμ hidden state μ, lstm μμ λ±μ₯ν κ°λ
μΈ cell state λ‘ κ΅¬μ±λμ΄ μκΈ° λλ¬Έμ
# hidden μ 3 μ°¨μ λ²‘ν° 2κ°λ‘ ꡬμ±λμ΄μΌ ν©λλ€.
hidden = (torch.randn(1, 1, 3),
torch.randn(1, 1, 3))
λ°©λ² 1: Sequence length κ° 5 μΈ input μ λνμ¬ ν λ²μ νλμ element λ₯Ό lstm cell μ ν΅κ³Όμν΅λλ€.
λ°©λ² 2: μ 체 μνμ€λ₯Ό νλ²μ ν΅κ³Όμν€λ λ°©λ²λ μμ΅λλ€.
LSTM μ΄ λ°ννλ μΆλ ₯μ 첫 λ²μ§Έ κ°μ μ 체 μνμ€μ λν ν΅κ³Όν hidden state μ΄κ³ , λ λ²μ§Έ κ°μ λ§μ§λ§ step μ hidden state μ λλ€. out κ³Ό hidden μ size λ₯Ό λΉκ΅ν΄λ³΄μΈμ.
inputs = torch.cat(inputs).view(len(inputs), 1, -1) # λ°©λ² 2 λ₯Ό μ μ©νκΈ° μν΄ input μ list κ° μλ νλμ tensor λ‘ concat ν΄μ€λλ€.
hidden = (torch.randn(1, 1, 3), torch.randn(1, 1, 3)) # λ°©λ² 2 λ₯Ό μ μ©νκΈ° μν΄ hidden μ λ€μ μ΄κΈ°νν©λλ€.
out, hidden = lstm(inputs, hidden)
print(out)
print(hidden)
LSTM μ μ΄μ©ν΄ Part-of-Speech (PoS) Tagging μ νκΈ° μν΄ νμ΅ λ°μ΄ν°λ₯Ό μ€λΉν©λλ€.
- training_data μλ λ¨μ΄ μνμ€μ κ° λ¨μ΄μ νμ¬ νκ·Έλ₯Ό μ€λΉν΄μΌ ν©λλ€.
- word_to_ix: λͺ¨λΈμ μ λ ₯μΌλ‘ μ¬μ©νκΈ° μν΄ κ° λ¨μ΄λ₯Ό id λ‘ mapping ν©λλ€.
- tag_to_ix: νμ¬ νκ·Έ λν id λ‘ mapping ν©λλ€.
def prepare_sequence(seq, to_ix):
idxs = [to_ix[w] for w in seq]
return torch.tensor(idxs, dtype=torch.long)
training_data = [
# Tags are: DET - determiner; NN - noun; V - verb
# For example, the word "The" is a determiner
("The dog ate the apple".split(), ["DET", "NN", "V", "DET", "NN"]),
("Everybody read that book".split(), ["NN", "V", "DET", "NN"])
]
word_to_ix = {}
# For each words-list (sentence) and tags-list in each tuple of training_data
for sent, tags in training_data:
for word in sent:
if word not in word_to_ix: # word has not been assigned an index yet
word_to_ix[word] = len(word_to_ix) # Assign each word with a unique index
print(word_to_ix)
tag_to_ix = {"DET": 0, "NN": 1, "V": 2} # Assign each tag with a unique index
# These will usually be more like 32 or 64 dimensional.
# We will keep them small, so we can see how the weights change as we train.
EMBEDDING_DIM = 6
HIDDEN_DIM = 6
{'The': 0, 'dog': 1, 'ate': 2, 'the': 3, 'apple': 4, 'Everybody': 5, 'read': 6, 'that': 7, 'book': 8}
Embedding layer, output layer, lstm cell μ ν¬ν¨ν LSTMTagger λͺ¨λμ μ μν©λλ€.
- embeds: input id λ₯Ό embedding layer λ‘ encode νμ¬ input μ ν΄λΉνλ embedding μμ±ν©λλ€.
- lstm_out: embedding μ lstm μ ν΅κ³Όνμ¬ μ 체 μνμ€μ λν hidden state λ₯Ό μ μ₯ν©λλ€.
- tag_space: lstm μ output μΈ hidden μ μ΄μ©ν΄ μ‘΄μ¬νλ tag (DET, NN, V) 곡κ°μΌλ‘ linear transform ν©λλ€.
- tag_scores: μ΄ν softmax λ₯Ό μ μ©νμ¬ κ° tag κ° λ score λ₯Ό μΈ‘μ ν©λλ€.
class LSTMTagger(nn.Module):
def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size):
super(LSTMTagger, self).__init__()
self.hidden_dim = hidden_dim
self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)
# The LSTM takes word embeddings as inputs, and outputs hidden states
# with dimensionality hidden_dim.
self.lstm = nn.LSTM(embedding_dim, hidden_dim)
# The linear layer that maps from hidden state space to tag space
self.hidden2tag = nn.Linear(hidden_dim, tagset_size)
def forward(self, sentence):
embeds = self.word_embeddings(sentence)
lstm_out, _ = self.lstm(embeds.view(len(sentence), 1, -1))
tag_space = self.hidden2tag(lstm_out.view(len(sentence), -1))
tag_scores = F.log_softmax(tag_space, dim=1)
return tag_scores
model μ build νκ³ , νμ΅μ νμν loss ν¨μμ optimizer λ₯Ό μ μΈν©λλ€.
model = LSTMTagger(EMBEDDING_DIM, HIDDEN_DIM, len(word_to_ix), len(tag_to_ix))
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)
μ΄μ , training data λ₯Ό μ΄μ©ν΄ λͺ¨λΈμ νμ΅ν©λλ€. μ¦, input μ LSTMTagger μ ν΅κ³ΌμμΌ κ° λ¨μ΄μ PoS tag λ₯Ό μμΈ‘νκ³ , μ λ΅ tag μ λΉκ΅νμ¬ loss λ₯Ό κ³μ°ν ν loss λ₯Ό backpropagate νμ¬ λͺ¨λΈ νλΌλ―Έν°λ₯Ό μ λ°μ΄νΈ ν©λλ€
for epoch in range(300): # again, normally you would NOT do 300 epochs, it is toy data
for sentence, tags in training_data:
# Step 1. Remember that Pytorch accumulates gradients.
# We need to clear them out before each instance
model.zero_grad()
# Step 2. Get our inputs ready for the network, that is, turn them into
# Tensors of word indices.
sentence_in = prepare_sequence(sentence, word_to_ix)
targets = prepare_sequence(tags, tag_to_ix)
# Step 3. Run our forward pass.
tag_scores = model(sentence_in)
# Step 4. Compute the loss, gradients, and update the parameters by
# calling optimizer.step()
loss = loss_function(tag_scores, targets)
loss.backward()
optimizer.step()
LSTM μ΄ μλ GRU λ₯Ό μ¬μ©νλ €λ©΄ nn.GRU λ₯Ό νμ©ν μ μμ΅λλ€.
μμ²λΌ νμ΅ λ°μ΄ν°λ§ νμ©ν΄ λͺ¨λΈμ νκ°ν κ²½μ° λͺ¨λΈμ generalization μ±λ₯μ΄ μ΄λ»κ² λλμ§ νκ°νκΈ° μ΄λ ΅μ΅λλ€. νμ΅μ λ³΄μ§ λͺ»ν μλ‘μ΄ λ°μ΄ν°λ₯Ό μ΄μ©ν΄ λͺ¨λΈμ νκ°ν΄μΌ ν©λλ€. μ£Όμ΄μ§ λ°μ΄ν°λ₯Ό train, test λ‘ split νκ±°λ cross-validation λ°©λ²μ νμ©ν΄μΌ ν©λλ€.
Reference: https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html