encoder model

embedding_size = 50
encoder_inputs = Input(shape=(None,))
en_x = Embedding(vocab_size, embedding_size)(encoder_inputs)

encoder = LSTM(50, return_state=True)
encoder_outputs, state_h, state_c = encoder(en_x)
encoder_states = [state_h, state_c]

decoder model

decoder_inputs = Input(shape=(None,))
dex = Embedding(vocab_size, embedding_size)
final_dex = dex(decoder_inputs)
decoder_lstm = LSTM(50, return_sequences=True, return_state=True)
decoder_outputs,, = decoder_lstm(final_dex, initial_state=encoder_states)
decoder_dense = Dense(vocab_size, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

model

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

and the batch data is gerenate like this:
def mygenerator(batch_size):
max_batch_index = len(trainx) // batch_size
i = 0
while 1:
batch_trainy_categ = to_categorical(trainy[ibatch_size:(i+1)batch_size].reshape(batch_sizemax_sentB_len),
num_classes=vocab_size)
batch_trainy_categ = np.array(batch_trainy_categ).reshape(-1, max_sentB_len, vocab_size)
batch_trainx = trainx[ibatch_size:(i+1)batch_size]
batch_trainy = trainy[ibatch_size:(i+1)*batch_size]
i += 1
i = i % max_batch_index
# print('batch data:')
# print(batch_trainx[:1])
# print(batch_trainy[:1])
# print(batch_trainy_categ[:1])
yield ([batch_trainx, batch_trainy], batch_trainy_categ)

model.fit_generator(mygenerator(128), steps_per_epoch=len(trainx) // 128, epochs=1, verbose=1,
validation_data=([testx, testy], testy_catey))
can you give me some advice about debugging or reason? thank you.

chen0040 · 2018-04-17T00:22:10Z

@babyhuzi111 one possibility may be the tokenizer, if you can share with me your training chinese text file. i can try it with my models and let you know.

Tigeryang93 · 2018-04-17T01:47:17Z

The data like this, each line has two sentences, left is raw sentence, right is the target sentence. Thank you.

…

------------------ 原始邮件 ------------------ 发件人: "Xianshun Chen"<notifications@github.com>; 发送时间: 2018年4月17日(星期二) 上午8:22 收件人: "chen0040/keras-text-summarization"<keras-text-summarization@noreply.github.com>; 抄送: "杨虎"<1064549937@qq.com>; "Mention"<mention@noreply.github.com>; 主题: Re: [chen0040/keras-text-summarization] Similar task has some problem(#3) @babyhuzi111 one possibility may be the tokenizer, if you can share with me your training chinese text file. i can try it with my models and let you know. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread. 从QQ邮箱发来的超大附件 data_clean.raw (133.01M, 2018年05月17日 09:43 到期)进入下载页面：http://mail.qq.com/cgi-bin/ftnExs_download?k=276133357cd5cbc755df400a4530561f0157040d510253064900550c061d500500041e0d0254061d54525153065150005252520c633e64540515526a005c01510a4f4154143059&t=exs_ftn_download&code=da35c0d0

Tigeryang93 · 2018-04-17T02:16:56Z

I forgot some line has only one sentence, just let it go.

…

------------------ 原始邮件 ------------------ 发件人: "Xianshun Chen"<notifications@github.com>; 发送时间: 2018年4月17日(星期二) 上午8:22 收件人: "chen0040/keras-text-summarization"<keras-text-summarization@noreply.github.com>; 抄送: "杨虎"<1064549937@qq.com>; "Mention"<mention@noreply.github.com>; 主题: Re: [chen0040/keras-text-summarization] Similar task has some problem(#3) @babyhuzi111 one possibility may be the tokenizer, if you can share with me your training chinese text file. i can try it with my models and let you know. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

fdujuan · 2018-09-06T01:49:03Z

@babyhuzi111 测试结果我的也是得到重复性的某个词，这个问题你解决了吗

kevin369ml · 2019-02-12T02:57:35Z

@babyhuzi111 测试结果我的也是得到重复性的某个词，这个问题你解决了吗
你们是如何做prediction的。我认为你们的prediction 可能有问题

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Similar task has some problem #3

Similar task has some problem #3

Tigeryang93 commented Apr 15, 2018

chen0040 commented Apr 17, 2018

Tigeryang93 commented Apr 17, 2018 via email

Tigeryang93 commented Apr 17, 2018 via email

fdujuan commented Sep 6, 2018

kevin369ml commented Feb 12, 2019

Similar task has some problem #3

Similar task has some problem #3

Comments

Tigeryang93 commented Apr 15, 2018

encoder model

decoder model

model

chen0040 commented Apr 17, 2018

Tigeryang93 commented Apr 17, 2018 via email

Tigeryang93 commented Apr 17, 2018 via email

fdujuan commented Sep 6, 2018

kevin369ml commented Feb 12, 2019