Live Engine
Select Topic
easyAttention Before Transformers
A seq2seq model without attention translates a 30-word English sentence to French. The encoder produces a single 512-dim context vector c. At each decoder step, the decoder uses the same c as input. A researcher argues this is "like asking someone to translate a paragraph after reading it once with no notes." What specific failure mode does this describe?