Attention Before Transformers | NLP

Live Engine

Select Topic

easyAttention Before Transformers

A seq2seq model without attention translates a 30-word English sentence to French. The encoder produces a single 512-dim context vector c. At each decoder step, the decoder uses the same c as input. A researcher argues this is "like asking someone to translate a paragraph after reading it once with no notes." What specific failure mode does this describe?

Live Engine

Select Topic

easyAttention Before Transformers