Jason Choi's Dev Blog
/
All Post
/
[NeurIPS 2021] VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
π¨
[NeurIPS 2021] VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
VATT- Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text.pdf
3708.6KB
μ’ κ°λ νμΌλ λ€μ λ¬λ €λ³΄μ νμ΄ν ~
VISμͺ½ λ Όλ¬Έμ΄λ μμ λ₯λ¬λ μͺ½ λ Όλ¬Έμ΄λμ λ λλμ΄ λ€λ₯΄λ€,,,