Search the docs /
NLP
Resources
CS388: Natural Language Processing
SpaCy - Industrial-strength Natural Language Processing (NLP) with Python and Cython. (HN: SpaCy 3.0 (2021) )
Adding voice control to your projects
Course materials for “Natural Language” course
NLP progress - Track the progress in Natural Language Processing (NLP) and give an overview of the state-of-the-art across the most common NLP tasks and their corresponding datasets. (Web )
Natural - General natural language facilities for Node.
YSDA Natural Language Processing course (2018)
PyText - Natural language modeling framework based on PyTorch.
FlashText - Extract Keywords from sentence or Replace keywords in sentences.
BERT PyTorch implementation
LASER Language-Agnostic SEntence Representations - Library to calculate and use multilingual sentence embeddings.
StanfordNLP - Python NLP Library for Many Human Languages.
nlp-tutorial - Tutorial for who is studying NLP(Natural Language Processing) using TensorFlow and PyTorch.
Better Language Models and Their Implications (2019)
gpt-2 - Code for the paper “Language Models are Unsupervised Multitask Learners” .
Lingvo - Framework for building neural networks in Tensorflow, particularly sequence models.
Fairseq - Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Stanford CS224N: NLP with Deep Learning (2019) - Course page . (HN )
Advanced NLP with spaCy: A free online course (Web )
Code for Stanford Natural Language Understanding course, CS224u (2019)
Awesome Reinforcement Learning for Natural Language Processing
ParlAI - Framework for training and evaluating AI models on a variety of openly available dialogue datasets.
Training language GANs from Scratch (2019)
Olivia - Your new best friend built with an artificial neural network.
Learn-Natural-Language-Processing-Curriculum
This repository recorded my NLP journey
Project Alias - Open-source parasite to train custom wake-up names for smart home devices while disturbing their built-in microphone.
Cornell Tech NLP Code
Cornell Tech NLP Publications
Thinc - SpaCy’s Machine Learning library for NLP in Python. (Docs )
Knowledge is embedded in language neural networks but can they reason? (2019)
NLP Best Practices
Transfer NLP library - Framework built on top of PyTorch to promote reproducible experimentation and Transfer Learning in NLP.
FARM - Fast & easy transfer learning for NLP. Harvesting language models for the industry.
Transformers - State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch. (Web )
NLP Roadmap 2019
Flair - Very simple framework for state-of-the-art NLP. Developed by Zalando Research.
Unsupervised Data Augmentation - Semi-supervised learning method which achieves state-of-the-art results on a wide variety of language and vision tasks.
Rasa - Open source machine learning framework to automate text-and voice-based conversations.
T5 - Text-To-Text Transfer Transformer.
100 Must-Read NLP Papers (HN )
Awesome NLP
NLP Library - Curated collection of papers for the NLP practitioner.
spacy-transformers - spaCy pipelines for pre-trained BERT, XLNet and GPT-2.
AllenNLP - Open-source NLP research library, built on PyTorch. (Announcing AllenNLP 1.0 )
GloVe - Global Vectors for Word Representation.
Botpress - Open-source Virtual Assistant platform.
Mycroft - Hackable open source voice assistant. (HN )
VizSeq - Visual Analysis Toolkit for Text Generation Tasks.
Awesome Natural Language Generation
How I used NLP (Spacy) to screen Data Science Resume (2019)
Introduction to Natural Language Processing book - Survey of computational methods for understanding, generating, and manipulating human language, which offers a synthesis of classical representations and algorithms with contemporary machine learning techniques.
Natural Language Processing with PyTorch: Build Intelligent Language Applications Using Deep Learning (Code )
Tokenizers - Fast State-of-the-Art Tokenizers optimized for Research and Production. (Article )
Example Notebook using BERT for NLP with Keras (2020)
NLP 2019/2020 Highlights
Overview of Modern Deep Learning Techniques Applied to Natural Language Processing
Language Identification from Very Short Strings (2019)
SentenceRepresentation - Code acompanies the paper ‘Learning Sentence Representations from Unlabelled Data’ Felix Hill, KyungHyun Cho and Anna Korhonen 2016.
Deep Learning for Language Processing course
Megatron LM - Ongoing research training transformer language models at scale, including: BERT & GPT-2.
XLNet - New unsupervised language representation learning method based on a novel generalized permutation language modeling objective.
ALBERT - Lite BERT for Self-supervised Learning of Language Representations.
BERT - TensorFlow code and pre-trained models for BERT.
Multilingual Denoising Pre-training for Neural Machine Translation (2020)
List of NLP tutorials built on PyTorch
sticker - Sequence labeler that uses either recurrent neural networks, transformers, or dilated convolution networks.
sticker-transformers - Pretrained transformer models for sticker.
pke - Python Keyphrase Extraction module.
How to train a new language model from scratch using Transformers and Tokenizers (2020)
Interactive Attention Visualization - Small example of an interactive visualization for attention values as being used by transformer language models like GPT2 and BERT.
The Annotated GPT-2 (2020)
GluonNLP - Toolkit that enables easy text preprocessing, datasets loading and neural models building to help you speed up your NLP research.
Finetune - Scikit-learn style model finetuning for NLP.
Stanza: A Python Natural Language Processing Toolkit for Many Human Languages (2020) (HN )
NLP Newsletter
NLP Paper Summaries
Advanced NLP with spaCy
Myle Ott’s research
Natural Language Toolkit (NLTK) - Suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. (Web ) (Book )
NLP 100 Exercise - Bootcamp designed for learning skills for programming, data analysis, and research activities. (Code )
The Transformer Family (2020)
Minimalist Implementation of a BERT Sentence Classifier
fastText - Library for efficient text classification and representation learning. (Code )
Awesome NLP Paper Discussions - Papers & presentations from Hugging Face’s weekly science day.
SynST: Syntactically Supervised Transformers
The Cost of Training NLP Models: A Concise Overview (2020)
Tutorial - Transformers (Tweet )
TTS - Deep learning for Text to Speech.
MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer (2020)
gpt-2-simple - Python package to easily retrain OpenAI’s GPT-2 text-generating model on new texts.
BERTScore - BERT score for text generation.
ML and NLP Paper Discussions
NLP Datasets
Word Embeddings (2017)
NLP from Scratch: Annotated Attention (2020)
This Word Does Not Exist - Allows people to train a variant of GPT-2 that makes up words, definitions and examples from scratch. (Code ) (HN )
Ultimate guide to choosing an online course covering practical NLP (2020)
HuggingFace nlp library - Quick overview (2020) (Twitter )
aitextgen - Robust Python tool for text-based AI training and generation using GPT-2. (HN )
Self Supervised Representation Learning in NLP (2020) (HN )
Synthetic and Natural Noise Both Break Neural Machine Translation (2017)
Inferbeddings - Injecting Background Knowledge in Neural Models via Adversarial Set Regularisation.
UCL Natural Language Processing group
Interactive Lecture Notes, Slides and Exercises for Statistical NLP
Beyond Accuracy: Behavioral Testing of NLP models with CheckList
CMU LTI Low Resource NLP Bootcamp 2020
GPT-3: Language Models Are Few-Shot Learners (2020) (HN ) (Code )
nlp - Lightweight and extensible library to easily share and access datasets and evaluation metrics for NLP.
Brainsources for NLP enthusiasts
Movement Pruning: Adaptive Sparsity by Fine-Tuning (Paper )
NLP Resources
TaBERT: Learning Contextual Representations for Natural Language Utterances and Structured Tables (Article ) (HN )
vtext - NLP in Rust with Python bindings.
Language Technology Lab @ University of Cambridge
The Natural Language Processing Dictionary
Introduction to NLP using Fastai (2020)
Gwern on GPT-3 (HN )
Semantic Machines - Solving conversational artificial intelligence. Part of Microsoft.
The Reformer – Pushing the limits of language modeling (HN )
GPT-3 Creative Fiction (2020) (HN )
Classifying 200k articles in 7 hours using NLP (2020) (HN )
HN: Using GPT-3 to generate user interfaces (2020)
Thread of GPT-3 use cases (2020)
GPT-3 Code Experiments (Examples )
How GPT3 Works - Visualizations and Animations (2020) (Lobsters ) (HN )
What is GPT-3? written in layman’s terms (2020) (HN )
DQI: Measuring Data Quality in NLP (2020)
Humanloop - Train and deploy NLP. (HN )
Do NLP Beyond English (2020) (HN )
Giving GPT-3 a Turing Test (2020) (HN )
Neural Network Methods for Natural Language Processing (2017)
Tempering Expectations for GPT-3 and OpenAI’s API (2020)
Philosophers on GPT-3 (2020) (HN )
GPT-3 Explorer - Power tool for experimenting with GPT-3. (Code )
Recent Advances in Natural Language Processing (2020) (HN )
Project Insight - NLP as a Service. (Forum post )
Bob Coecke: Quantum Natural Language Processing (QNLP) (2020) (Article )
Language-Agnostic BERT Sentence Embedding (2020)
Language Interpretability Tool (LIT) - Interactively analyze NLP models for model understanding in an extensible and framework agnostic interface.
Booste Pre Trained Models - Free-to-use GPT-2 API. (HN )
Context-theoretic Semantics for Natural Language: an Algebraic Framework (2007)
THUNLP (Natural Language Processing Lab at Tsinghua University) research
AI training method exceeds GPT-3 performance with fewer parameters (2020) (HN )
BERT Attention Analysis
Neural Modules and Models for Conversational AI (2020)
BERTopic - Topic modeling technique that leverages BERT embeddings and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions.
NLP Pandect - Comprehensive reference for all topics related to Natural Language Processing.
Practical Natural Language Processing book (Code )
NLP Reseach Project: Best Practices for Finetuning Large Transformer Language models (2020)
Deep Learning for NLP notes (2020)
Modern Practical Natural Language Processing course
LXMERT: Learning Cross-Modality Encoder Representations from Transformers in PyTorch
Awesome software for Text ML
Pretrained Transformers for Text Ranking: BERT and Beyond (2020)
SpaCy v3.0 Nightly (2020) (HN ) (Tweet )
Explore trained spaCy v3.0 pipelines
spacy-streamlit - sGpaCy building blocks for Streamlit apps. (Tweet )
Informers - State-of-the-art natural language processing for Ruby.
How to Structure and Manage Natural Language Processing (NLP) Projects (2020)
Sentence-BERT for spaCy - Wraps sentence-transformers (also known as sentence-BERT) directly in spaCy.
Lingua Franca - Mycroft’s multilingual text parsing and formatting library.
Simple Transformers - Based on the Transformers library by HuggingFace. Lets you quickly train and evaluate Transformer models.
Deep Bidirectional Transformers for Language Understanding (2020) - Explains a legendary paper, BERT. (HN )
EasyTransfer - Designed to make the development of transfer learning in NLP applications easier.
LambdaBERT - Transformers-style implementation of BERT using LambdaNetworks instead of self-attention.
DialoGPT - State-of-the-Art Large-scale Pretrained Response Generation Model.
Neural reading comprehension and beyond - Danqi Chen’s Thesis (2020) (Code )
LAMA: LAnguage Model Analysis - Probe for analyzing the factual and commonsense knowledge contained in pretrained language models.
awesome-2vec - Curated list of 2vec-type embedding models.
Rethinking Attention with Performers (2020) (HN )
BERT Research - Key Concepts & Sources (2019)
The Pile - Large, diverse, open source language modelling data set that consists of many smaller datasets combined together.
Bort - Companion code for the paper “Optimal Subarchitecture Extraction for BERT.”
Vector AI - Encode And Deploy Vectors At The Edge. (Code )
KeyBERT - Minimal keyword extraction with BERT. (Web )
Multimodal Transformer for Unaligned Multimodal Language Sequences - In PyTorch.
The Illustrated GPT-2 (Visualizing Transformer Language Models) (2020)
A Primer in BERTology: What we know about how BERT works (2020) (HN )
GPT Neo - Open-source GPT model, with pretrained 1.3B & 2.7B weight models. (HN )
Text Synth - Text completion using the GPT-2 language model.
How to Go from NLP in 1 Language to NLP in N Languages in One Shot (2020)
Contextualized Topic Models - Family of topic models that use pre-trained representations of language (e.g., BERT) to support topic modeling.
Language Style Transfer - Code for Style Transfer from Non-Parallel Text by Cross-Alignment paper.
NLU - Power of Spark NLP, the Simplicity of Python. 1 line for hundreds of NLP models and algorithms.
PyTorch Implementation of Google BERT
High Performance Natural Language Processing (2020)
duoBERT - Multi-stage passage ranking: monoBERT + duoBERT.
Awesome GPT-3
SMAC3 - Sequential Model-based Algorithm Configuration.
Semantic Experiences by Google - Experiments in understanding language.
Long-Range Arena - Systematic evaluation of efficient transformer models.
PaddleHub - Awesome pre-trained models toolkit based on PaddlePaddle.
DeepSPIN (Deep Structured Prediction in Natural Language Processing) (GitHub )
Multi-Task Learning in NLP
FastSeq - Provides efficient implementation of popular sequence models (e.g. Bart, ProphetNet) for text generation, summarization, translation tasks etc.
Sentence Embeddings with BERT & XLNet
FastFormers - Provides a set of recipes and methods to achieve highly efficient inference of Transformer models for Natural Language Understanding (NLU).
Adversarial NLI - Adversarial Natural Language Inference Benchmark.
textract - Extract text from any document. No muss. No fuss. (Docs )
NLP e Named Entity Recognition (2020)
Big Bird: Transformers for Longer Sequences
NLP PyTorch Tutorial
EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
CrossWeigh: Training Named Entity Tagger from Imperfect Annotations (2019) (Code )
Does GPT-2 Know Your Phone Number? (2020)
Towards Fully Automated Manga Translation (2020)
Text Classification Models - All kinds of text classification models and more with deep learning.
Awesome Text Summarization
Shortformer: Better Language Modeling using Shorter Inputs (2020) (HN )
huggingface_hub - Client library to download and publish models and other files on the huggingface.co hub.
Embeddings from the Ground Up (2020)
Ecco - Tools to visuals and explore NLP language models. (Web ) (HN )
Interfaces for Explaining Transformer Language Models (2020)
DALL·E: Creating Images from Text (2021) (HN ) (Reddit )
CLIP: Connecting Text and Images (2021) (HN ) (Paper ) (Code )
OpenNRE - Open-Source Package for Neural Relation Extraction (NRE).
Princeton NLP Group (GitHub )
Must-read papers on neural relation extraction (NRE)
FewRel Dataset, Toolkits and Baseline Models
Tree Transformer: Integrating Tree Structures into Self-Attention (2019) (Code )
SentEval: evaluation toolkit for sentence embeddings
gpt-scrolls - Collaborative collection of open-source safe GPT-3 prompts that work well.
SLING - A natural language frame semantics parser - Built to learn to read and understand Wikipedia articles in many languages for the purpose of knowledge base completion.
Awesome Neural Adaptation in NLP
Natural language generation: The commercial state of the art in 2020 (HN )
Non-Autoregressive Generation Progress
Trankit: A Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
VecMap - Framework to learn cross-lingual word embedding mappings.
Kiri - Natural Language Engine. (Web )
GPT3 List - List of things that people are claiming is enabled by GPT3.
DeBERTa - Decoding-enhanced BERT with Disentangled Attention.
Sockeye - Open-source sequence-to-sequence framework for Neural Machine Translation based on Apache MXNet. (Docs )
Robustness Gym - Python evaluation toolkit for natural language processing.
State-of-the-Art Conversational AI with Transfer Learning
GPT-Neo - GPT-3-sized model, open source and free. (HN ) (Code )
Deep Daze - Simple command line tool for text to image generation using OpenAI’s CLIP and Siren (Implicit neural representation network).
Notebooks using the Hugging Face libraries
NLP Cloud - Serve spaCy pre-trained models, and your own custom models, through a RESTful API.
CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters (2020) (Code )
Must-read Papers on Textual Adversarial Attack and Defense
Reranker - Build Text Rerankers with Deep Language Models.
rust-bert - Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,…).
rust-tokenizers - Offers high-performance tokenizers for modern language models.
Replicating GPT-2 at Home (2021) (HN )
Shifterator - Interpretable data visualizations for understanding how texts differ at the word level.
CMU Neural Networks for NLP Course (2021) (Videos )
minnn - Exercise in developing a minimalist neural network toolkit for NLP.
Controllable Sentence Simplification (2019) (Code )
Awesome Relation Extraction
retext - Natural language processor powered by plugins part of the unified collective.
GPT-3 Demo - GPT-3 Examples, Demos, Showcase, and NLP Use-cases.
Big Sleep - Simple command line tool for text to image generation, using OpenAI’s CLIP and a BigGAN.
Beyond the Imitation Game Benchmark (BIG-bench) - Collaborative benchmark intended to probe large language models, and extrapolate their future capabilities.
AutoNLP - Automatic way to train, evaluate and deploy state-of-the-art NLP models for different tasks.
DeText - Deep Neural Text Understanding Framework for Ranking and Classification Tasks.
Paragraph Vectors in PyTorch
NeuSpell: A Neural Spelling Correction Toolkit
Natural Language YouTube Search - Search inside YouTube videos using natural language.
Accelerate - Simple way to train and use NLP models with multi-GPU, TPU, mixed-precision.
Classical Language Toolkit (CLTK) - Python library offering natural language processing (NLP) for pre-modern languages. (Web )
Guide: Finetune GPT2-XL
GENRE (Generarive ENtity REtrieval) - Uses a sequence-to-sequence approach to entity retrieval (e.g., linking), based on fine-tuned BART architecture.
DensePhrases - Provides answers to your natural language questions from the entire Wikipedia in real-time.
How to use GPT-3 recursively to solve general problems (2021)
Podium - Framework agnostic Python NLP library for data loading and preprocessing.
Prompts - Advanced GPT-3 playground. (Code )
TextFlint - Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing.
Awesome Text Summarization
The GPT-3 Architecture, on a Napkin
How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources
GPT in 60 Lines of NumPy
BERT