CS224N作业A3:依存分析

本文最后更新于:几秒前

1、Machine Learning & Neural Networks (8 points)

(a)Adam Optimizer

(i)

$m$通过在每次更新时将历史方向和梯度方向进行比较,同向则矢量相加加快收敛速度,异向则减缓更新速度,这种含动量的更新方向可以减少梯度更新的动荡,这种低方差有助于保持梯度下降的效率。导致更快的收敛。

(ii)

具有更少更新历史的模型参数将获得更大的更新。这规范化了更新步骤,避免超调或单调递减的学习速率,适度调整了学习率的大小

(b)Dropout

(i)

$d\odot h$在$1-p_{drop}$的比例下降低了隐向量的规模,为了将其恢复到原规模,$\gamma = \frac{1}{1-p_{drop} }$

(ii)

训练时使用dropout可以提高训练模型的鲁棒性,防止过拟合现象发生,而评估时没有这个需要。

2. Neural Transition-Based Dependency Parsing (46 points)

(a)

Stack Buffer New dependency Transition
[ROOT] [I,attend,lectures,in,the,NLP,class] Initial Configuration
[ROOT,I] [attend,lectures,in,the,NLP,class] SHIFT
[ROOT,I,attend] [lectures,in,the,NLP,class] SHIFT
[ROOT,attend] [lectures,in,the,NLP,class] attend->I LEFT-ARC
[ROOT,attend,lectures] [in,the,NLP,class] SHIFT
[ROOT,attend] [in,the,NLP,class] attend->lectures RIGHT-ARC
[ROOT,attend,in] [the,NLP,class] SHIFT
[ROOT,attend,in,the] [NLP,class] SHIFT
[ROOT,attend,in,the,NLP] [class] SHIFT
[ROOT,attend,in,the,NLP,class] [] SHIFT
[ROOT,attend,in,the,class] [] class->NLP LEFT-ARC
[ROOT,attend,in,class] [] class->the LEFT-ARC
[ROOT,attend,class] [] class->in LEFT-ARC
[ROOT,attend] [] attend->class RIGHT-ARC
[ROOT] [] ROOT->attend RIGHT-ARC
[ROOT] [] Decline

(b)

$2n$,每个词都执行一个SHIFT,并作为关系尾执行一次LEFT-ARC或RIGHT-ARC

(c)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
class PartialParse(object):
def __init__(self, sentence):
"""Initializes this partial parse.

@param sentence (list of str): The sentence to be parsed as a list of words.
Your code should not modify the sentence.
"""
# The sentence being parsed is kept for bookkeeping purposes. Do NOT alter it in your code.
self.sentence = sentence

### YOUR CODE HERE (3 Lines)
### Your code should initialize the following fields:
### self.stack: The current stack represented as a list with the top of the stack as the
### last element of the list.
### self.buffer: The current buffer represented as a list with the first item on the
### buffer as the first item of the list
### self.dependencies: The list of dependencies produced so far. Represented as a list of
### tuples where each tuple is of the form (head, dependent).
### Order for this list doesn't matter.
###
### Note: The root token should be represented with the string "ROOT"
### Note: If you need to use the sentence object to initialize anything, make sure to not directly
### reference the sentence object. That is, remember to NOT modify the sentence object.

self.stack = ['ROOT']
self.buffer = sentence
self.dependencies = []
### END YOUR CODE


def parse_step(self, transition):
"""Performs a single parse step by applying the given transition to this partial parse

@param transition (str): A string that equals "S", "LA", or "RA" representing the shift,
left-arc, and right-arc transitions. You can assume the provided
transition is a legal transition.
"""
### YOUR CODE HERE (~7-12 Lines)
### TODO:
### Implement a single parsing step, i.e. the logic for the following as
### described in the pdf handout:
### 1. Shift
### 2. Left Arc
### 3. Right Arc
if transition == "S":
self.stack.append(self.buffer[0])
self.buffer = self.buffer[1:]
else:
r = self.stack[-1]
l = self.stack[-2]
if transition == "LA":
self.dependencies.append((r, l))
self.stack = self.stack[:-2]
self.stack.append(r)
else :
self.dependencies.append((l, r))
self.stack = self.stack[:-1]
### END YOUR CODE

def parse(self, transitions):
"""Applies the provided transitions to this PartialParse

@param transitions (list of str): The list of transitions in the order they should be applied

@return dependencies (list of string tuples): The list of dependencies produced when
parsing the sentence. Represented as a list of
tuples where each tuple is of the form (head, dependent).
"""
for transition in transitions:
self.parse_step(transition)
return self.dependencies
1
2
3
4
5
6
parser_transitions.py:203: SyntaxWarning: "is" with a literal. Did you mean "=="?
return [("RA" if pp.stack[1] is "right" else "LA") if len(pp.buffer) == 0 else "S"
SHIFT test passed!
LEFT-ARC test passed!
RIGHT-ARC test passed!
parse test passed!

(d)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
def minibatch_parse(sentences, model, batch_size):
"""Parses a list of sentences in minibatches using a model.

@param sentences (list of list of str): A list of sentences to be parsed
(each sentence is a list of words and each word is of type string)
@param model (ParserModel): The model that makes parsing decisions. It is assumed to have a function
model.predict(partial_parses) that takes in a list of PartialParses as input and
returns a list of transitions predicted for each parse. That is, after calling
transitions = model.predict(partial_parses)
transitions[i] will be the next transition to apply to partial_parses[i].
@param batch_size (int): The number of PartialParses to include in each minibatch


@return dependencies (list of dependency lists): A list where each element is the dependencies
list for a parsed sentence. Ordering should be the
same as in sentences (i.e., dependencies[i] should
contain the parse for sentences[i]).
"""
dependencies = []

### YOUR CODE HERE (~8-10 Lines)
### TODO:
### Implement the minibatch parse algorithm. Note that the pseudocode for this algorithm is given in the pdf handout.
###
### Note: A shallow copy (as denoted in the PDF) can be made with the "=" sign in python, e.g.
### unfinished_parses = partial_parses[:].
### Here `unfinished_parses` is a shallow copy of `partial_parses`.
### In Python, a shallow copied list like `unfinished_parses` does not contain new instances
### of the object stored in `partial_parses`. Rather both lists refer to the same objects.
### In our case, `partial_parses` contains a list of partial parses. `unfinished_parses`
### contains references to the same objects. Thus, you should NOT use the `del` operator
### to remove objects from the `unfinished_parses` list. This will free the underlying memory that
### is being accessed by `partial_parses` and may cause your code to crash.
partial_parses = []
for each in sentences:
partial_parses.append(PartialParse(each))
unfinished_parses = partial_parses[:]
while len(unfinished_parses)>0:
minibatch = unfinished_parses[:batch_size]
# print("minibatch",len(minibatch))
transitions = model.predict(minibatch)
# print("transitions:",transitions)
# for idx in range(len(minibatch)):
# p = minibatch[idx]
# p.parse_step(transitions[idx])
for transition,unfinished_parse in zip(transitions,minibatch):
unfinished_parse.parse_step(transition)
unfinished_parses = [
m for m in unfinished_parses if not(len(m.buffer)==0 and len(m.stack)==1)
]
### END YOUR CODE
for p in partial_parses:
dependencies.append(p.dependencies)
return dependencies

1
2
3
parser_transitions.py:203: SyntaxWarning: "is" with a literal. Did you mean "=="?
return [("RA" if pp.stack[1] is "right" else "LA") if len(pp.buffer) == 0 else "S"
minibatch_parse test passed!

(e)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
class ParserModel(nn.Module):
""" Feedforward neural network with an embedding layer and two hidden layers.
The ParserModel will predict which transition should be applied to a
given partial parse configuration.

PyTorch Notes:
- Note that "ParserModel" is a subclass of the "nn.Module" class. In PyTorch all neural networks
are a subclass of this "nn.Module".
- The "__init__" method is where you define all the layers and parameters
(embedding layers, linear layers, dropout layers, etc.).
- "__init__" gets automatically called when you create a new instance of your class, e.g.
when you write "m = ParserModel()".
- Other methods of ParserModel can access variables that have "self." prefix. Thus,
you should add the "self." prefix layers, values, etc. that you want to utilize
in other ParserModel methods.
- For further documentation on "nn.Module" please see https://pytorch.org/docs/stable/nn.html.
"""
def __init__(self, embeddings, n_features=36,
hidden_size=200, n_classes=3, dropout_prob=0.5):
""" Initialize the parser model.

@param embeddings (ndarray): word embeddings (num_words, embedding_size)
@param n_features (int): number of input features
@param hidden_size (int): number of hidden units
@param n_classes (int): number of output classes
@param dropout_prob (float): dropout probability
"""
super(ParserModel, self).__init__()
self.n_features = n_features
self.n_classes = n_classes
self.dropout_prob = dropout_prob
self.embed_size = embeddings.shape[1]
self.hidden_size = hidden_size
self.embeddings = nn.Parameter(torch.tensor(embeddings))

### YOUR CODE HERE (~9-10 Lines)
### TODO:
### 1) Declare `self.embed_to_hidden_weight` and `self.embed_to_hidden_bias` as `nn.Parameter`.
### Initialize weight with the `nn.init.xavier_uniform_` function and bias with `nn.init.uniform_`
### with default parameters.
### 2) Construct `self.dropout` layer.
### 3) Declare `self.hidden_to_logits_weight` and `self.hidden_to_logits_bias` as `nn.Parameter`.
### Initialize weight with the `nn.init.xavier_uniform_` function and bias with `nn.init.uniform_`
### with default parameters.
###
### Note: Trainable variables are declared as `nn.Parameter` which is a commonly used API
### to include a tensor into a computational graph to support updating w.r.t its gradient.
### Here, we use Xavier Uniform Initialization for our Weight initialization.
### It has been shown empirically, that this provides better initial weights
### for training networks than random uniform initialization.
### For more details checkout this great blogpost:
### http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization
###
### Please see the following docs for support:
### nn.Parameter: https://pytorch.org/docs/stable/nn.html#parameters
### Initialization: https://pytorch.org/docs/stable/nn.init.html
### Dropout: https://pytorch.org/docs/stable/nn.html#dropout-layers
###
### See the PDF for hints.
#输入维度为特征数*嵌入size
#weight:从输入维度到权重维度
self.embed_to_hidden_weight = nn.Parameter(torch.empty(self.embed_size*self.n_features,self.hidden_size))
#也可以
#self.embed_to_hidden_weight == nn.Parameter(nn.init.xavier_uniform_(....))
nn.init.xavier_uniform_(self.embed_to_hidden_weight)
#和权重维度保持一致
self.embed_to_hidden_bias = nn.Parameter(torch.empty(self.hidden_size))
nn.init.uniform_(self.embed_to_hidden_bias)
#构建一个Dropout层
self.dropout = nn.Dropout(p=dropout_prob)
self.hidden_to_logits_weight = nn.Parameter(torch.empty(self.hidden_size,self.n_classes))
nn.init.xavier_uniform_(self.hidden_to_logits_weight)
self.hidden_to_logits_bias = nn.Parameter(torch.empty(self.n_classes))
nn.init.uniform_(self.hidden_to_logits_bias)
### END YOUR CODE

def embedding_lookup(self, w):
""" Utilize `w` to select embeddings from embedding matrix `self.embeddings`
@param w (Tensor): input tensor of word indices (batch_size, n_features)

@return x (Tensor): tensor of embeddings for words represented in w
(batch_size, n_features * embed_size)
"""
### YOUR CODE HERE (~1-4 Lines)
### TODO:
### 1) For each index `i` in `w`, select `i`th vector from self.embeddings
### 2) Reshape the tensor using `view` function if necessary
###
### Note: All embedding vectors are stacked and stored as a matrix. The model receives
### a list of indices representing a sequence of words, then it calls this lookup
### function to map indices to sequence of embeddings.
###
### This problem aims to test your understanding of embedding lookup,
### so DO NOT use any high level API like nn.Embedding
### (we are asking you to implement that!). Pay attention to tensor shapes
### and reshape if necessary. Make sure you know each tensor's shape before you run the code!
###
### Pytorch has some useful APIs for you, and you can use either one
### in this problem (except nn.Embedding). These docs might be helpful:
### Index select: https://pytorch.org/docs/stable/torch.html#torch.index_select
### Gather: https://pytorch.org/docs/stable/torch.html#torch.gather
### View: https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view
### Flatten: https://pytorch.org/docs/stable/generated/torch.flatten.html
# index_select函数:选择其中的部分
x = []
# print(w.shape)
for batch in w:
# dim=0选择行,batch用于lookup
em = torch.index_select(self.embeddings,dim=0,index=batch)
# 选择完之后的size就是n_features*embedding_size
x.append(em)
# 要转换的list里面的元素包含多维的tensor
x = torch.tensor([item.detach().numpy() for item in x]).view(w.shape[0],-1)
# 一行写完
# x = torch.tensor([torch.index_select(self.embeddings,dim=0,index=batch).detach().numpy() for batch in w])
### END YOUR CODE
return x


def forward(self, w):
""" Run the model forward.

Note that we will not apply the softmax function here because it is included in the loss function nn.CrossEntropyLoss

PyTorch Notes:
- Every nn.Module object (PyTorch model) has a `forward` function.
- When you apply your nn.Module to an input tensor `w` this function is applied to the tensor.
For example, if you created an instance of your ParserModel and applied it to some `w` as follows,
the `forward` function would called on `w` and the result would be stored in the `output` variable:
model = ParserModel()
output = model(w) # this calls the forward function
- For more details checkout: https://pytorch.org/docs/stable/nn.html#torch.nn.Module.forward

@param w (Tensor): input tensor of tokens (batch_size, n_features)

@return logits (Tensor): tensor of predictions (output after applying the layers of the network)
without applying softmax (batch_size, n_classes)
"""
### YOUR CODE HERE (~3-5 lines)
### TODO:
### Complete the forward computation as described in write-up. In addition, include a dropout layer
### as decleared in `__init__` after ReLU function.
###
### Note: We do not apply the softmax to the logits here, because
### the loss function (torch.nn.CrossEntropyLoss) applies it more efficiently.
###
### Please see the following docs for support:
### Matrix product: https://pytorch.org/docs/stable/torch.html#torch.matmul
### ReLU: https://pytorch.org/docs/stable/nn.html?highlight=relu#torch.nn.functional.relu
# forward:前向传播
# 执行embedding_lookup作为输入x
emb = self.embedding_lookup(w)
emb = self.dropout(emb)
# @:矩阵乘法,这里可以简化操作,省去detach().numpy()
hidden_state = torch.nn.functional.relu(emb @ self.embed_to_hidden_weight+self.embed_to_hidden_bias)
logits = hidden_state @ self.hidden_to_logits_weight+self.hidden_to_logits_bias
### END YOUR CODE
return logits
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
def train(parser, train_data, dev_data, output_path, batch_size=1024, n_epochs=10, lr=0.0005):
""" Train the neural dependency parser.

@param parser (Parser): Neural Dependency Parser
@param train_data ():
@param dev_data ():
@param output_path (str): Path to which model weights and results are written.
@param batch_size (int): Number of examples in a single batch
@param n_epochs (int): Number of training epochs
@param lr (float): Learning rate
"""
best_dev_UAS = 0


### YOUR CODE HERE (~2-7 lines)
### TODO:
### 1) Construct Adam Optimizer in variable `optimizer`
### 2) Construct the Cross Entropy Loss Function in variable `loss_func` with `mean`
### reduction (default)
###
### Hint: Use `parser.model.parameters()` to pass optimizer
### necessary parameters to tune.
### Please see the following docs for support:
### Adam Optimizer: https://pytorch.org/docs/stable/optim.html
### Cross Entropy Loss: https://pytorch.org/docs/stable/nn.html#crossentropyloss
optimizer = torch.optim.Adam(parser.model.parameters(),lr=lr)
loss_func = nn.CrossEntropyLoss()

### END YOUR CODE

for epoch in range(n_epochs):
print("Epoch {:} out of {:}".format(epoch + 1, n_epochs))
dev_UAS = train_for_epoch(parser, train_data, dev_data, optimizer, loss_func, batch_size)
if dev_UAS > best_dev_UAS:
best_dev_UAS = dev_UAS
print("New best dev UAS! Saving model.")
torch.save(parser.model.state_dict(), output_path)
print("")


def train_for_epoch(parser, train_data, dev_data, optimizer, loss_func, batch_size):
""" Train the neural dependency parser for single epoch.

Note: In PyTorch we can signify train versus test and automatically have
the Dropout Layer applied and removed, accordingly, by specifying
whether we are training, `model.train()`, or evaluating, `model.eval()`

@param parser (Parser): Neural Dependency Parser
@param train_data ():
@param dev_data ():
@param optimizer (nn.Optimizer): Adam Optimizer
@param loss_func (nn.CrossEntropyLoss): Cross Entropy Loss Function
@param batch_size (int): batch size

@return dev_UAS (float): Unlabeled Attachment Score (UAS) for dev data
"""
parser.model.train() # Places model in "train" mode, i.e. apply dropout layer
n_minibatches = math.ceil(len(train_data) / batch_size)
loss_meter = AverageMeter()

with tqdm(total=(n_minibatches)) as prog:
for i, (train_x, train_y) in enumerate(minibatches(train_data, batch_size)):
optimizer.zero_grad() # remove any baggage in the optimizer
loss = 0. # store loss for this batch here
train_x = torch.from_numpy(train_x).long()
train_y = torch.from_numpy(train_y.nonzero()[1]).long()

### YOUR CODE HERE (~4-10 lines)
### TODO:
### 1) Run train_x forward through model to produce `logits`
### 2) Use the `loss_func` parameter to apply the PyTorch CrossEntropyLoss function.
### This will take `logits` and `train_y` as inputs. It will output the CrossEntropyLoss
### between softmax(`logits`) and `train_y`. Remember that softmax(`logits`)
### are the predictions (y^ from the PDF).
### 3) Backprop losses
### 4) Take step with the optimizer
### Please see the following docs for support:
### Optimizer Step: https://pytorch.org/docs/stable/optim.html#optimizer-step
logits = parser.model.forward(train_x)
# nn.CrossEntropyLoss的使用
loss = loss_func(logits,train_y)
loss.backward()
optimizer.step()

### END YOUR CODE
prog.update(1)
loss_meter.update(loss.item())

print ("Average Train Loss: {}".format(loss_meter.avg))

print("Evaluating on dev set",)
parser.model.eval() # Places model in "eval" mode, i.e. don't apply dropout layer
dev_UAS, _ = parser.parse(dev_data)
print("- dev UAS: {:.2f}".format(dev_UAS * 100.0))
return dev_UAS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
================================================================================
INITIALIZING
================================================================================
Loading data...
took 2.34 seconds
Building parser...
took 1.25 seconds
Loading pretrained embeddings...
took 3.19 seconds
Vectorizing data...
took 1.98 seconds
Preprocessing training data...
took 51.43 seconds
took 0.03 seconds

================================================================================
TRAINING
================================================================================
Epoch 1 out of 10
0%| | 0/1848 [00:00<?, ?it/s]D
:\软件\安装目录\pycharm\projects\a3\parser_model.py:128: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at C:\acti
ons-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\utils\tensor_new.cpp:233.)
x = torch.tensor([item.detach().numpy() for item in x]).view(w.shape[0],-1)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1848/1848 [08:49<00:00, 3.49it/s]
Average Train Loss: 0.26055007827069077
Evaluating on dev set
1445850it [00:00, 35055374.72it/s]
- dev UAS: 79.67
New best dev UAS! Saving model.

Epoch 2 out of 10
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1848/1848 [09:00<00:00, 3.42it/s]
Average Train Loss: 0.16393936111378077
Evaluating on dev set
1445850it [00:00, 46227698.79it/s]
- dev UAS: 82.79
New best dev UAS! Saving model.

Epoch 3 out of 10
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1848/1848 [09:05<00:00, 3.38it/s]
Average Train Loss: 0.14417635859897385
Evaluating on dev set
1445850it [00:00, 30844015.60it/s]
- dev UAS: 83.73
New best dev UAS! Saving model.

Epoch 4 out of 10
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1848/1848 [08:45<00:00, 3.51it/s]
Average Train Loss: 0.13362910604967185
Evaluating on dev set
1445850it [00:00, 28795236.69it/s]
- dev UAS: 84.39
New best dev UAS! Saving model.

Epoch 5 out of 10
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1848/1848 [09:00<00:00, 3.42it/s]
Average Train Loss: 0.12646352259434146
Evaluating on dev set
1445850it [00:00, 36683288.00it/s]
- dev UAS: 85.27
New best dev UAS! Saving model.

Epoch 6 out of 10
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1848/1848 [08:46<00:00, 3.51it/s]
Average Train Loss: 0.12200444055825987
Evaluating on dev set
1445850it [00:00, 32126541.28it/s]
- dev UAS: 85.61
New best dev UAS! Saving model.

Epoch 7 out of 10
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1848/1848 [08:46<00:00, 3.51it/s]
Average Train Loss: 0.11807776755729923
Evaluating on dev set
1445850it [00:00, 35243415.11it/s]
- dev UAS: 86.00
New best dev UAS! Saving model.

Epoch 8 out of 10
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1848/1848 [08:50<00:00, 3.48it/s]
Average Train Loss: 0.11562635477841933
Evaluating on dev set
1445850it [00:00, 32848902.51it/s]
- dev UAS: 85.93

Epoch 9 out of 10
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1848/1848 [08:41<00:00, 3.54it/s]
Average Train Loss: 0.1125802615318786
Evaluating on dev set
1445850it [00:00, 38039985.19it/s]
- dev UAS: 86.69
New best dev UAS! Saving model.

Epoch 10 out of 10
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1848/1848 [08:17<00:00, 3.72it/s]
Average Train Loss: 0.11088591745099077
Evaluating on dev set
1445850it [00:00, 38262233.51it/s]
- dev UAS: 86.71
New best dev UAS! Saving model.

================================================================================
TESTING
================================================================================
Restoring the best model weights found on the dev set
Final evaluation on test set
2919736it [00:00, 68380481.23it/s]
- test UAS: 86.79
Done!


dev UAS:86.71
test UAS:86.79

(f)

(i)

  • Error type:Prepositional Phrase Attachment Error
  • Incorrect dependency:concerns->risks
  • Correct dependency:citing->risks

(ii)

  • Error type:Modifier Attachment Error
  • Incorrect dependency:left->early
  • Correct dependency:afternoon->early

(iii)

  • Error type:Verb Phrase Attachment Error
  • Incorrect dependency:declined->decision
  • Correct dependency:comment->decision

(iv)

  • Error type:Coordination Attachment Error
  • Incorrect dependency:affects->one
  • Correct dependency:plants->one

(g)

提高模型对词性的辨析能力,对基本的短语搭配有一定的学习能力。


CS224N作业A3:依存分析
http://paopao0226.site/post/30fdab5f.html
作者
Ywj226
发布于
2023年3月27日
更新于
2023年9月23日
许可协议