现代自然语言处理最佳、最新进展

博客原文：The Best and Most Current of Modern Natural Language Processing

作者：Victor Sanh

Photo by Radu Marcusu on Unsplash

过去的两年，NLP社区目睹了各种任务和应用的加速发展🚀。这一进展是由于我们传统地构建NLP系统的方式发生了转变：长久以来，我们使用预训练的词嵌入，像word2vec或GloVe来初始化神经网络的第一层，然后用单个数据集通过监督学习来训练具体任务的体系结构。

最近，一些作品证明了我们可以在网络规模的数据集📖上学习分层上下文表示，利用无监督（或半监督）信号例如语言模型，把这些预训练任务的转换成下游任务（迁移学习）。令人鼓舞地是，这种转换导致了各种下游应用的重大进步，从问答，到自然语言推理，再到的句法分析。

我该读哪些论文来了解现代NLP的最新趋势？

几周前，我的一个朋友决定潜心研究NLP。他已经有机器学习和深度学习的背景，所以他真诚地问我：“我该读哪些论文来了解现代NLP的最新趋势？”。 👩‍🎓👨‍🎓

这是一个好问题，尤其是当你考虑到NLP会议（和一般的机器学习会议）获得指数增长的提交数量时，2019年NAACL的提交量比2018年增加了80%，ACL增加了90%。

我为他编写了这个论文列表和资源📚，并且我认为把它分享给NLP社区会很棒，我相信它会帮到更多人。

免责声明：这个列表不会面面俱到，也不会囊括NLP的每一个领域（例如，没有语义分析，对抗学习，增强学习在NLP方面的应用）。这是过去几年、几个月中最新最有影响的作品（截至2019年5月），主要是受到我所读的东西的影响。

通常来说，一个好的开始方式是读介绍性或总结性的博客文章（例如，这篇文章或这篇），在真正花费时间读论文之前，这些文章能从高层次的视角给你足够的背景 ✋。

Who said that naming models should be boring and sad? — Source: Moviefone

🌊 一种新范式：迁移学习

Deep contextualized word representations(NAACL 2018)
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer
Universal Language Model Fine-tuning for Text Classification(ACL 2018)
Jeremy Howard, Sebastian Ruder
Improving Language Understanding by Generative Pre-Training
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever
Language Models are Unsupervised Multitask Learners
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(NAACL 2019)
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
Cloze-driven Pretraining of Self-attention Networks(arXiv 2019)
Alexei Baevski, Sergey Edunov, Yinhan Liu, Luke Zettlemoyer, Michael Auli
Unified Language Model Pre-training for Natural Language Understanding and Generation(arXiv 2019)
Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon
MASS: Masked Sequence to Sequence Pre-training for Language Generation(ICML 2019)
Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu

The Transformer architecture has become ubiquitous in sequence modeling tasks. — Source: Attention is all you need

🖼 表示学习

What you can cram into a single vector: Probing sentence embeddings for linguistic properties(ACL 2018)
Alexis Conneau, German Kruszewski, Guillaume Lample, Loïc Barrault, Marco Baroni
No Training Required: Exploring Random Encoders for Sentence Classification(ICLR 2019)
John Wieting, Douwe Kiela
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding(ICLR 2019)
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman
和
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems(arXiv 2019)
Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman
Linguistic Knowledge and Transferability of Contextual Representations(NAACL 2019)
Nelson F. Liu, Matt Gardner, Yonatan Belinkov, Matthew E. Peters, Noah A. Smith
To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks(arXiv 2019)
Matthew Peters, Sebastian Ruder, Noah A. Smith

🗣 神经对话

A Neural Conversational Model(ICML Deep Learning Workshop 2015)
Oriol Vinyals, Quoc Le
A Persona-Based Neural Conversation Model(ACL 2016)
Jiwei Li, Michel Galley, Chris Brockett, Georgios P. Spithourakis, Jianfeng Gao, Bill Dolan
A Simple, Fast Diverse Decoding Algorithm for Neural Generation(arXiv 2017)
Jiwei Li, Will Monroe, Dan Jurafsky
Neural Approaches to Conversational AI(arXiv 2018)
Jianfeng Gao, Michel Galley, Lihong Li
TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents(NeurIPS 2018 CAI Workshop)
Thomas Wolf, Victor Sanh, Julien Chaumond, Clement Delangue
免责声明：我是这个出版物的作者之一
逐步解释博客文章
Wizard of Wikipedia: Knowledge-Powered Conversational agents(ICLR 2019)
Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, Jason Weston
Learning to Speak and Act in a Fantasy Text Adventure Game(arXiv 2019)
Jack Urbanek, Angela Fan, Siddharth Karamcheti, Saachi Jain, Samuel Humeau, Emily Dinan, Tim Rocktäschel, Douwe Kiela, Arthur Szlam, Jason Weston

🍱 任你选

Pointer Networks(NIPS 2015)
Oriol Vinyals, Meire Fortunato, Navdeep Jaitly
End-To-End Memory Networks(NIPS 2015)
Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus
Get To The Point: Summarization with Pointer-Generator Networks(ACL 2017)
Abigail See, Peter J. Liu, Christopher D. Manning
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data(EMNLP 2017)
Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, Antoine Bordes
End-to-end Neural Coreference Resolution(EMNLP 2017)
Kenton Lee, Luheng He, Mike Lewis, Luke Zettlemoyer
StarSpace: Embed All The Things!(AAAI 2018)
Ledell Wu, Adam Fisch, Sumit Chopra, Keith Adams, Antoine Bordes, Jason Weston
The Natural Language Decathlon: Multitask Learning as Question Answering(arXiv 2018)
Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, Richard Socher
Character-Level Language Modeling with Deeper Self-Attention(arXiv 2018)
Rami Al-Rfou, Dokook Choe, Noah Constant, Mandy Guo, Llion Jones
Linguistically-Informed Self-Attention for Semantic Role Labeling(EMNLP 2018)
Emma Strubell, Patrick Verga, Daniel Andor, David Weiss, Andrew McCallum
Phrase-Based & Neural Unsupervised Machine Translation(EMNLP 2018)
Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, Marc’Aurelio Ranzato
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning(ICLR 2018)
Sandeep Subramanian, Adam Trischler, Yoshua Bengio, Christopher J Pal
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context(arXiv 2019)
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov
Universal Transformers(ICLR 2019)
Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, Łukasz Kaiser
An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models(NAACL 2019)
Alexandra Chronopoulou, Christos Baziotis, Alexandros Potamianos
…对于老一点的论文，在选择阅读内容时，引用量通常是一个合理的参考。

一个好的法则是，你应该阅读那些你感兴趣并能激发你快乐的文章！🤷‍🌟

🌍 通用资源

还有大量可选择的资源供你使用，并不一定是论文，如下所示：

其他

Papers With Code
Twitter🐦
arXiv daily newsletter
Survey papers
…

🎅 写在最后

就到这里！阅读其中的这些资源应该可以让你对现代NLP的最新趋势有一个很好的认识并希望帮你建立自己的NLP系统！🎮

最后一件事，我没有在这篇博客里过多谈及，但是我发现它是极其重要的（有时候可以忽略），那就是动手实践比单纯的阅读要更好！ 👩‍💻通过深入阅读附带的代码或尝试自己实现一些代码，你常常能学到更多。实践的资源包括the amazing blog posts and courses from fast.ai或我们的开源库🤗

你感觉如何呢？哪些作品对你影响最深？现在就告诉我们吧！⌨️

像往常一样，如果你喜欢这篇文章，👏 告诉我们并分享一些你身边的消息吧！

非常感谢Lysandre Debut, Clément Delangue, Thibault Févry, Peter Martigny, Anthony Moi and Thomas Wolf 提供的评价和反馈。