목록GQA (1)
On the journey of
[논문읽기] LXMERT: Learning Cross-Modality Encoder Representations from Transformers(2019.08)
Original Paper ) https://arxiv.org/abs/1908.07490 LXMERT: Learning Cross-Modality Encoder Representations from Transformers Vision-and-language reasoning requires an understanding of visual concepts, language semantics, and, most importantly, the alignment and relationships between these two modalities. We thus propose the LXMERT (Learning Cross-Modality Encoder Representations arxiv.org 깔끔하게 읽혀..
Experiences & Study/VQA
2023. 11. 27. 11:52