深度學習模型在文本的情緒預測

資料介紹與預測目標

資料:IMDB Movie reviews sentiment classification
標準: 88%(RNN)
目標: 90%以上

參數設定

資料特徵選取

1
2
Max_feature = 20000          # 資料特徵數
Max_length = 500 # 句子長度

深度學習模型參數設定

  • 由於此次模型有包含 GRU(LSTM 的快速版),為避免耗費過多時間,因此 迭代次數 只設定 3 次,相對地必須提高 批次訓練的樣本數,在此設定為 100
  • 今次預測問題為文字情緒的好壞,也就是說 y_label 只有 0 與 1 的值,因此 損失函數 設定為 **binary_crossentropy(二元分類)**,相對地輸出層的 激活函數 必須設定為 sigmoid(羅吉斯回歸的一種)
1
2
3
4
5
6
7
Epoch = 3                    # 迭代次數
batch_size = 100 # 批次訓練樣本
validation_split = 0.2 # 設定驗證集資料

loss = "binary_crossentropy" # 損失函數
optimizer = "adam" # 優化器
metrics = "accuracy" # 評分

多層感知器 Multiple Layer Perceptron(MLP)

模型架構

程式碼

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Initialize model
model <- keras_model_sequential()
model %>%
# Creates dense embedding layer; outputs 3D tensor
# with shape (batch_size, sequence_length, output_dim)

layer_embedding(input_dim = max_features,
output_dim = 128,
input_length = maxlen) %>%
layer_flatten() %>%
layer_dense(units = 32, activation = "relu") %>%
layer_dense(units = 1, activation = 'sigmoid')

# Optimizer
model %>% compile(
loss = 'binary_crossentropy',
optimizer = 'adam',
metrics = c('accuracy')
)

# Train model
model %>% fit(
x_train, y_train,
batch_size = 100,
epochs = 3,
validation_split = 0.2
)

# Prediction & ConfusionMatrix
dnn_y_pred <- model %>% predict(x_test)
dnn_y_pred <- as.numeric(dnn_y_pred > 0.5)
dnn_results <- model %>% evaluate(x_test, y_test)
dnn_cMatrix <- table(dnn_y_pred,y_test)

訓練

1
2
3
4
5
6
7
Train on 20000 samples, validate on 5000 samples
Epoch 1/3
20000/20000 [==============================] - 6s 280us/step - loss: 0.3990 - acc: 0.8002 - val_loss: 0.3048 - val_acc: 0.8710
Epoch 2/3
20000/20000 [==============================] - 5s 239us/step - loss: 0.0590 - acc: 0.9800 - val_loss: 0.3775 - val_acc: 0.8622
Epoch 3/3
20000/20000 [==============================] - 5s 237us/step - loss: 0.0033 - acc: 0.9998 - val_loss: 0.4416 - val_acc: 0.8704

混淆矩陣

MLP 真實值
預測值 0 1
0 11060 1914
1 1440 10586

雙向 GRU (Bidirectional GRU)

  • GRU 為 LSTM 的一種,速度比 LSTM 更快,且預測力差不多

模型架構

程式碼

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Initialize model
model <- keras_model_sequential()
model %>%
# Creates dense embedding layer; outputs 3D tensor
# with shape (batch_size, sequence_length, output_dim)

layer_embedding(input_dim = max_features,
output_dim = 128,
input_length = maxlen) %>%
bidirectional(layer_gru(units = 32)) %>%
layer_global_max_pooling_1d() %>%
layer_dense(units = 20, activation = "relu") %>%
layer_dropout(rate = 0.05) %>%
layer_dense(units = 1, activation = 'sigmoid')

# Optimizer
model %>% compile(
loss = 'binary_crossentropy',
optimizer = 'adam',
metrics = c('accuracy')
)

# Train model
model %>% fit(
x_train, y_train,
batch_size = 100,
epochs = 3,
# validation_data = list(x_test, y_test)
validation_split = 0.2
)

# Prediction & ConfusionMatrix
b_lstm_y_pred <- model %>% predict_proba(x_test)
b_lstm_y_pred <- as.numeric(b_lstm_y_pred > 0.5)
b_lstm_results <- model %>% evaluate(x_test, y_test)
b_lstm_cMatrix <- table(b_lstm_y_pred,y_test)

訓練

1
2
3
4
5
6
7
Train on 20000 samples, validate on 5000 samples
Epoch 1/3
20000/20000 [==============================] - 573s 29ms/step - loss: 0.3693 - acc: 0.8347 - val_loss: 0.2629 - val_acc: 0.8902
Epoch 2/3
20000/20000 [==============================] - 570s 29ms/step - loss: 0.1591 - acc: 0.9407 - val_loss: 0.3182 - val_acc: 0.8810
Epoch 3/3
20000/20000 [==============================] - 575s 29ms/step - loss: 0.0619 - acc: 0.9801 - val_loss: 0.3378 - val_acc: 0.8860

混淆矩陣

Bi_GRU 真實值
預測值 0 1
0 10779 1341
1 1721 11159

卷積神經網路 Convolutional Neuron Network (CNN)+Bidirectional GRU

模型架構

程式碼

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# Initialize model
model <- keras_model_sequential()
model %>%
# Creates dense embedding layer; outputs 3D tensor
# with shape (batch_size, sequence_length, output_dim)

layer_embedding(input_dim = max_features,
output_dim = 256,
input_length = maxlen) %>%
layer_conv_1d(filters = 128,
padding = 'valid',
kernel_size = 5) %>%
layer_dropout(rate = 0.2) %>%
bidirectional(layer_gru(units = 32, return_sequences = T)) %>%
layer_global_max_pooling_1d() %>%
layer_dense(units = 20, activation = "relu") %>%
layer_dropout(rate = 0.05) %>%
layer_dense(units = 1, activation = 'sigmoid')

# Optimizer
model %>% compile(
loss = 'binary_crossentropy',
optimizer = 'adam',
metrics = c('accuracy')
)

# Train model
model %>% fit(
x_train, y_train,
batch_size = 100,
epochs = 3,
# validation_data = list(x_test, y_test)
validation_split = 0.2
)

# Prediction & ConfusionMatrix
cnn_lstm_y_pred <- model %>% predict(x_test)
cnn_lstm_y_pred <- as.numeric(cnn_lstm_y_pred > 0.5)
cnn_lstm_results <- model %>% evaluate(x_test, y_test)
cnn_lstm_cMatrix <- table(cnn_lstm_y_pred,y_test)

訓練

1
2
3
4
5
6
7
Train on 20000 samples, validate on 5000 samples
Epoch 1/3
20000/20000 [==============================] - 581s 29ms/step - loss: 0.3985 - acc: 0.8095 - val_loss: 0.2526 - val_acc: 0.9016
Epoch 2/3
20000/20000 [==============================] - 580s 29ms/step - loss: 0.1638 - acc: 0.9393 - val_loss: 0.2626 - val_acc: 0.8952
Epoch 3/3
20000/20000 [==============================] - 584s 29ms/step - loss: 0.0729 - acc: 0.9766 - val_loss: 0.3134 - val_acc: 0.8956

混淆矩陣

CNN+Bi_GRU 真實值
預測值 0 1
0 11428 1912
1 1072 10588

Merge Multiple CNN model

模型架構

程式碼

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# Initialize model
input <- layer_input(shape = c(maxlen))
model <- input %>%
# Creates dense embedding layer; outputs 3D tensor
# with shape (batch_size, sequence_length, output_dim)

layer_embedding(max_features, 300, input_length=maxlen, trainable = TRUE)

conv1 = model %>% layer_conv_1d(filters=128,
kernel_size =3,
padding ='valid',
activation='relu') %>%
layer_global_max_pooling_1d()


conv2 = model %>% layer_conv_1d(filters=128,
kernel_size =4,
padding ='valid',
activation='relu') %>%
layer_global_max_pooling_1d()


conv3 = model %>% layer_conv_1d(filters=128,
kernel_size =5,
padding ='valid',
activation='relu') %>%
layer_global_max_pooling_1d()


merge.layer <- layer_concatenate(c(conv1, conv2, conv3))


output <- merge.layer %>%
layer_dense(units = 1, activation = 'sigmoid')


model <- keras_model(
inputs = input,
outputs = output
)



# Optimizer
model %>% compile(
loss = 'binary_crossentropy',
optimizer = 'adam',
metrics = c('accuracy')
)

# Train model
model %>% fit(
x_train, y_train,
batch_size = 100,
epochs = 3,
validation_split = 0.2
)
# Prediction & ConfusionMatrix
m3_cnn_y_pred <- model %>% predict(x_test)
m3_cnn_y_pred <- as.numeric(m3_cnn_y_pred > 0.5)
m3_cnn_results <- model %>% evaluate(x_test, y_test)
m3_cnn_cMatrix <- table(m3_cnn_y_pred,y_test)

訓練

1
2
3
4
5
6
7
8
Train on 22500 samples, validate on 2500 samples
Epoch 1/3
22500/22500 [==============================] - 16s 726us/step - loss: 0.3514 - acc: 0.8429 - val_loss: 0.2651 - val_acc: 0.8936
Epoch 2/3
22500/22500 [==============================] - 15s 685us/step - loss: 0.1357 - acc: 0.9516 - val_loss: 0.2513 - val_acc: 0.9056
Epoch 3/3
22500/22500 [==============================] - 15s 679us/step - loss: 0.0278 - acc: 0.9943 - val_loss: 0.2620 - val_acc: 0.9136

混淆矩陣

merge_CNN 真實值
預測值 0 1
0 11388 1404
1 1112 11096

Merge Multiple GRU model

模型架構

程式碼

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# Initialize model
input <- layer_input(shape = c(maxlen))
model <- input %>%
# Creates dense embedding layer; outputs 3D tensor
# with shape (batch_size, sequence_length, output_dim)

layer_embedding(max_features, 300, input_length=maxlen, trainable = TRUE)

gru1 = model %>% layer_gru(units = 128, return_sequences = T)

gru2 = model %>% layer_gru(units = 64, return_sequences = T)

gru3 = model %>% layer_gru(units = 32, return_sequences = T)

merge.layer <- layer_concatenate(c(gru1, gru2, gru3))

output <- merge.layer %>%
layer_global_average_pooling_1d() %>%
layer_dense(units = 1, activation = 'sigmoid')


model <- keras_model(
inputs = input,
outputs = output
)



# Optimizers
model %>% compile(
loss = 'binary_crossentropy',
optimizer = 'adam',
metrics = c('accuracy')
)

# Train model

model %>% fit(
x_train, y_train,
batch_size = 100,
epochs = 3,
validation_split = 0.2
)

# Prediction & ConfusionMatrix
m3_gru_y_pred <- model %>% predict(x_test)
m3_gru_y_pred <- as.numeric(m3_gru_y_pred > 0.5)
m3_gru_results <- model %>% evaluate(x_test, y_test)
m3_gru_cMatrix <- table(m3_gru_y_pred,y_test)

訓練

1
2
3
4
5
6
7
Train on 22500 samples, validate on 2500 samples
Epoch 1/3
22500/22500 [==============================] - 896s 40ms/step - loss: 0.3408 - acc: 0.8475 - val_loss: 0.2549 - val_acc: 0.8984
Epoch 2/3
22500/22500 [==============================] - 899s 40ms/step - loss: 0.1502 - acc: 0.9449 - val_loss: 0.2571 - val_acc: 0.9060
Epoch 3/3
22500/22500 [==============================] - 898s 40ms/step - loss: 0.0587 - acc: 0.9817 - val_loss: 0.3179 - val_acc: 0.9020

混淆矩陣

merge_GRU 真實值
預測值 0 1
0 11233 1669
1 1267 10831

集成式學習 Ensembling Learning

方法 (多數決)

混淆矩陣

Ensemble 真實值
預測值 0 1
0 11423 1359
1 1077 11141

預測結果

模型 test_set 準確率
MLP 86.584%
Bi-GRU 87.752%
CNN+Bi-GRU 88.064%
merge-GRU 88.256%
merge-CNN 89.936%
Ensembling model(多數決:至少 3 個以上) 90.256%