NLP
Resources
- CS388: Natural Language Processing
- SpaCy - Industrial-strength Natural Language Processing (NLP) with Python and Cython. (HN: SpaCy 3.0 (2021))
- Adding voice control to your projects
- Course materials for "Natural Language" course
- NLP progress - Track the progress in Natural Language Processing (NLP) and give an overview of the state-of-the-art across the most common NLP tasks and their corresponding datasets. (Web)
- Natural - General natural language facilities for Node.
- YSDA Natural Language Processing course (2018)
- PyText - Natural language modeling framework based on PyTorch.
- FlashText - Extract Keywords from sentence or Replace keywords in sentences.
- BERT PyTorch implementation
- LASER Language-Agnostic SEntence Representations - Library to calculate and use multilingual sentence embeddings.
- StanfordNLP - Python NLP Library for Many Human Languages.
- nlp-tutorial - Tutorial for who is studying NLP(Natural Language Processing) using TensorFlow and PyTorch.
- Better Language Models and Their Implications (2019)
- gpt-2 - Code for the paper "Language Models are Unsupervised Multitask Learners".
- Lingvo - Framework for building neural networks in Tensorflow, particularly sequence models.
- Fairseq - Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
- Stanford CS224N: NLP with Deep Learning (2019) - Course page. (HN)
- Advanced NLP with spaCy: A free online course (Web)
- Code for Stanford Natural Language Understanding course, CS224u (2019)
- Awesome Reinforcement Learning for Natural Language Processing
- ParlAI - Framework for training and evaluating AI models on a variety of openly available dialogue datasets.
- Training language GANs from Scratch (2019)
- Olivia - Your new best friend built with an artificial neural network.
- Learn-Natural-Language-Processing-Curriculum
- This repository recorded my NLP journey
- Project Alias - Open-source parasite to train custom wake-up names for smart home devices while disturbing their built-in microphone.
- Cornell Tech NLP Code
- Cornell Tech NLP Publications
- Thinc - SpaCy's Machine Learning library for NLP in Python. (Docs)
- Knowledge is embedded in language neural networks but can they reason? (2019)
- NLP Best Practices
- Transfer NLP library - Framework built on top of PyTorch to promote reproducible experimentation and Transfer Learning in NLP.
- FARM - Fast & easy transfer learning for NLP. Harvesting language models for the industry.
- Transformers - State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch. (Web)
- NLP Roadmap 2019
- Flair - Very simple framework for state-of-the-art NLP. Developed by Zalando Research.
- Unsupervised Data Augmentation - Semi-supervised learning method which achieves state-of-the-art results on a wide variety of language and vision tasks.
- Rasa - Open source machine learning framework to automate text-and voice-based conversations.
- T5 - Text-To-Text Transfer Transformer.
- 100 Must-Read NLP Papers (HN)
- Awesome NLP
- NLP Library - Curated collection of papers for the NLP practitioner.
- spacy-transformers - spaCy pipelines for pre-trained BERT, XLNet and GPT-2.
- AllenNLP - Open-source NLP research library, built on PyTorch. (Announcing AllenNLP 1.0)
- GloVe - Global Vectors for Word Representation.
- Botpress - Open-source Virtual Assistant platform.
- Mycroft - Hackable open source voice assistant. (HN)
- VizSeq - Visual Analysis Toolkit for Text Generation Tasks.
- Awesome Natural Language Generation
- How I used NLP (Spacy) to screen Data Science Resume (2019)
- Introduction to Natural Language Processing book - Survey of computational methods for understanding, generating, and manipulating human language, which offers a synthesis of classical representations and algorithms with contemporary machine learning techniques.
- Natural Language Processing with PyTorch: Build Intelligent Language Applications Using Deep Learning (Code)
- Tokenizers - Fast State-of-the-Art Tokenizers optimized for Research and Production. (Article)
- Example Notebook using BERT for NLP with Keras (2020)
- NLP 2019/2020 Highlights
- Overview of Modern Deep Learning Techniques Applied to Natural Language Processing
- Language Identification from Very Short Strings (2019)
- SentenceRepresentation - Code acompanies the paper 'Learning Sentence Representations from Unlabelled Data' Felix Hill, KyungHyun Cho and Anna Korhonen 2016.
- Deep Learning for Language Processing course
- Megatron LM - Ongoing research training transformer language models at scale, including: BERT & GPT-2.
- XLNet - New unsupervised language representation learning method based on a novel generalized permutation language modeling objective.
- ALBERT - Lite BERT for Self-supervised Learning of Language Representations.
- BERT - TensorFlow code and pre-trained models for BERT.
- Multilingual Denoising Pre-training for Neural Machine Translation (2020)
- List of NLP tutorials built on PyTorch
- sticker - Sequence labeler that uses either recurrent neural networks, transformers, or dilated convolution networks.
- sticker-transformers - Pretrained transformer models for sticker.
- pke - Python Keyphrase Extraction module.
- How to train a new language model from scratch using Transformers and Tokenizers (2020)
- Interactive Attention Visualization - Small example of an interactive visualization for attention values as being used by transformer language models like GPT2 and BERT.
- The Annotated GPT-2 (2020)
- GluonNLP - Toolkit that enables easy text preprocessing, datasets loading and neural models building to help you speed up your NLP research.
- Finetune - Scikit-learn style model finetuning for NLP.
- Stanza: A Python Natural Language Processing Toolkit for Many Human Languages (2020) (HN)
- NLP Newsletter
- NLP Paper Summaries
- Advanced NLP with spaCy
- Myle Ott's research
- Natural Language Toolkit (NLTK) - Suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. (Web) (Book)
- NLP 100 Exercise - Bootcamp designed for learning skills for programming, data analysis, and research activities. (Code)
- The Transformer Family (2020)
- Minimalist Implementation of a BERT Sentence Classifier
- fastText - Library for efficient text classification and representation learning. (Code)
- Awesome NLP Paper Discussions - Papers & presentations from Hugging Face's weekly science day.
- SynST: Syntactically Supervised Transformers
- The Cost of Training NLP Models: A Concise Overview (2020)
- Tutorial - Transformers (Tweet)
- TTS - Deep learning for Text to Speech.
- MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer (2020)
- gpt-2-simple - Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts.
- BERTScore - BERT score for text generation.
- ML and NLP Paper Discussions
- NLP Datasets
- Word Embeddings (2017)
- NLP from Scratch: Annotated Attention (2020)
- This Word Does Not Exist - Allows people to train a variant of GPT-2 that makes up words, definitions and examples from scratch. (Code) (HN)
- Ultimate guide to choosing an online course covering practical NLP (2020)
- HuggingFace
nlp
library - Quick overview (2020) (Twitter)
- aitextgen - Robust Python tool for text-based AI training and generation using GPT-2. (HN)
- Self Supervised Representation Learning in NLP (2020) (HN)
- Synthetic and Natural Noise Both Break Neural Machine Translation (2017)
- Inferbeddings - Injecting Background Knowledge in Neural Models via Adversarial Set Regularisation.
- UCL Natural Language Processing group
- Interactive Lecture Notes, Slides and Exercises for Statistical NLP
- Beyond Accuracy: Behavioral Testing of NLP models with CheckList
- CMU LTI Low Resource NLP Bootcamp 2020
- GPT-3: Language Models Are Few-Shot Learners (2020) (HN) (Code)
- nlp - Lightweight and extensible library to easily share and access datasets and evaluation metrics for NLP.
- Brainsources for NLP enthusiasts
- Movement Pruning: Adaptive Sparsity by Fine-Tuning (Paper)
- NLP Resources
- TaBERT: Learning Contextual Representations for Natural Language Utterances and Structured Tables (Article) (HN)
- vtext - NLP in Rust with Python bindings.
- Language Technology Lab @ University of Cambridge
- The Natural Language Processing Dictionary
- Introduction to NLP using Fastai (2020)
- Gwern on GPT-3 (HN)
- Semantic Machines - Solving conversational artificial intelligence. Part of Microsoft.
- The Reformer – Pushing the limits of language modeling (HN)
- GPT-3 Creative Fiction (2020) (HN)
- Classifying 200k articles in 7 hours using NLP (2020) (HN)
- HN: Using GPT-3 to generate user interfaces (2020)
- Thread of GPT-3 use cases (2020)
- GPT-3 Code Experiments (Examples)
- How GPT3 Works - Visualizations and Animations (2020) (Lobsters) (HN)
- What is GPT-3? written in layman's terms (2020) (HN)
- GPT3 Examples (HN)
- DQI: Measuring Data Quality in NLP (2020)
- Humanloop - Train and deploy NLP. (HN)
- Do NLP Beyond English (2020) (HN)
- Giving GPT-3 a Turing Test (2020) (HN)
- Neural Network Methods for Natural Language Processing (2017)
- Tempering Expectations for GPT-3 and OpenAI’s API (2020)
- Philosophers on GPT-3 (2020) (HN)
- GPT-3 Explorer - Power tool for experimenting with GPT-3. (Code)
- Recent Advances in Natural Language Processing (2020) (HN)
- Project Insight - NLP as a Service. (Forum post)
- Bob Coecke: Quantum Natural Language Processing (QNLP) (2020) (Article)
- Language-Agnostic BERT Sentence Embedding (2020)
- Language Interpretability Tool (LIT) - Interactively analyze NLP models for model understanding in an extensible and framework agnostic interface.
- Booste Pre Trained Models - Free-to-use GPT-2 API. (HN)
- Context-theoretic Semantics for Natural Language: an Algebraic Framework (2007)
- THUNLP (Natural Language Processing Lab at Tsinghua University) research
- AI training method exceeds GPT-3 performance with fewer parameters (2020) (HN)
- BERT Attention Analysis
- Neural Modules and Models for Conversational AI (2020)
- BERTopic - Topic modeling technique that leverages BERT embeddings and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions.
- NLP Pandect - Comprehensive reference for all topics related to Natural Language Processing.
- Practical Natural Language Processing book (Code)
- NLP Reseach Project: Best Practices for Finetuning Large Transformer Language models (2020)
- Deep Learning for NLP notes (2020)
- Modern Practical Natural Language Processing course
- LXMERT: Learning Cross-Modality Encoder Representations from Transformers in PyTorch
- Awesome software for Text ML
- Pretrained Transformers for Text Ranking: BERT and Beyond (2020)
- SpaCy v3.0 Nightly (2020) (HN) (Tweet)
- Explore trained spaCy v3.0 pipelines
- spacy-streamlit - sGpaCy building blocks for Streamlit apps. (Tweet)
- Informers - State-of-the-art natural language processing for Ruby.
- How to Structure and Manage Natural Language Processing (NLP) Projects (2020)
- Sentence-BERT for spaCy - Wraps sentence-transformers (also known as sentence-BERT) directly in spaCy.
- Lingua Franca - Mycroft's multilingual text parsing and formatting library.
- Simple Transformers - Based on the Transformers library by HuggingFace. Lets you quickly train and evaluate Transformer models.
- Deep Bidirectional Transformers for Language Understanding (2020) - Explains a legendary paper, BERT. (HN)
- EasyTransfer - Designed to make the development of transfer learning in NLP applications easier.
- LambdaBERT - Transformers-style implementation of BERT using LambdaNetworks instead of self-attention.
- DialoGPT - State-of-the-Art Large-scale Pretrained Response Generation Model.
- Neural reading comprehension and beyond - Danqi Chen's Thesis (2020) (Code)
- LAMA: LAnguage Model Analysis - Probe for analyzing the factual and commonsense knowledge contained in pretrained language models.
- awesome-2vec - Curated list of 2vec-type embedding models.
- Rethinking Attention with Performers (2020) (HN)
- BERT Research - Key Concepts & Sources (2019)
- The Pile - Large, diverse, open source language modelling data set that consists of many smaller datasets combined together.
- Bort - Companion code for the paper "Optimal Subarchitecture Extraction for BERT."
- Vector AI - Encode And Deploy Vectors At The Edge. (Code)
- KeyBERT - Minimal keyword extraction with BERT. (Web)
- Multimodal Transformer for Unaligned Multimodal Language Sequences - In PyTorch.
- The Illustrated GPT-2 (Visualizing Transformer Language Models) (2020)
- A Primer in BERTology: What we know about how BERT works (2020) (HN)
- GPT Neo - Open-source GPT model, with pretrained 1.3B & 2.7B weight models. (HN)
- Text Synth - Text completion using the GPT-2 language model.
- How to Go from NLP in 1 Language to NLP in N Languages in One Shot (2020)
- Contextualized Topic Models - Family of topic models that use pre-trained representations of language (e.g., BERT) to support topic modeling.
- Language Style Transfer - Code for Style Transfer from Non-Parallel Text by Cross-Alignment paper.
- NLU - Power of Spark NLP, the Simplicity of Python. 1 line for hundreds of NLP models and algorithms.
- PyTorch Implementation of Google BERT
- High Performance Natural Language Processing (2020)
- duoBERT - Multi-stage passage ranking: monoBERT + duoBERT.
- Awesome GPT-3
- SMAC3 - Sequential Model-based Algorithm Configuration.
- Semantic Experiences by Google - Experiments in understanding language.
- Long-Range Arena - Systematic evaluation of efficient transformer models.
- PaddleHub - Awesome pre-trained models toolkit based on PaddlePaddle.
- DeepSPIN (Deep Structured Prediction in Natural Language Processing) (GitHub)
- Multi-Task Learning in NLP
- FastSeq - Provides efficient implementation of popular sequence models (e.g. Bart, ProphetNet) for text generation, summarization, translation tasks etc.
- Sentence Embeddings with BERT & XLNet
- FastFormers - Provides a set of recipes and methods to achieve highly efficient inference of Transformer models for Natural Language Understanding (NLU).
- Adversarial NLI - Adversarial Natural Language Inference Benchmark.
- textract - Extract text from any document. No muss. No fuss. (Docs)
- NLP e Named Entity Recognition (2020)
- Big Bird: Transformers for Longer Sequences
- NLP PyTorch Tutorial
- EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
- CrossWeigh: Training Named Entity Tagger from Imperfect Annotations (2019) (Code)
- Does GPT-2 Know Your Phone Number? (2020)
- Towards Fully Automated Manga Translation (2020)
- Text Classification Models - All kinds of text classification models and more with deep learning.
- Awesome Text Summarization
- Shortformer: Better Language Modeling using Shorter Inputs (2020) (HN)
- huggingface_hub - Client library to download and publish models and other files on the huggingface.co hub.
- Embeddings from the Ground Up (2020)
- Ecco - Tools to visuals and explore NLP language models. (Web) (HN)
- Interfaces for Explaining Transformer Language Models (2020)
- DALL·E: Creating Images from Text (2021) (HN) (Reddit)
- CLIP: Connecting Text and Images (2021) (HN) (Paper) (Code)
- OpenNRE - Open-Source Package for Neural Relation Extraction (NRE).
- Princeton NLP Group (GitHub)
- Must-read papers on neural relation extraction (NRE)
- FewRel Dataset, Toolkits and Baseline Models
- Tree Transformer: Integrating Tree Structures into Self-Attention (2019) (Code)
- SentEval: evaluation toolkit for sentence embeddings
- gpt-scrolls - Collaborative collection of open-source safe GPT-3 prompts that work well.
- SLING - A natural language frame semantics parser - Built to learn to read and understand Wikipedia articles in many languages for the purpose of knowledge base completion.
- Awesome Neural Adaptation in NLP
- Natural language generation: The commercial state of the art in 2020 (HN)
- Non-Autoregressive Generation Progress
- Trankit: A Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
- VecMap - Framework to learn cross-lingual word embedding mappings.
- Kiri - Natural Language Engine. (Web)
- GPT3 List - List of things that people are claiming is enabled by GPT3.
- DeBERTa - Decoding-enhanced BERT with Disentangled Attention.
- Sockeye - Open-source sequence-to-sequence framework for Neural Machine Translation based on Apache MXNet. (Docs)
- Robustness Gym - Python evaluation toolkit for natural language processing.
- State-of-the-Art Conversational AI with Transfer Learning
- GPT-Neo - GPT-3-sized model, open source and free. (HN) (Code)
- Deep Daze - Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network).
- Notebooks using the Hugging Face libraries
- NLP Cloud - Serve spaCy pre-trained models, and your own custom models, through a RESTful API.
- CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters (2020) (Code)
- jiant - Multitask and transfer learning toolkit for NLP. (Web)
- Must-read Papers on Textual Adversarial Attack and Defense
- Reranker - Build Text Rerankers with Deep Language Models.
- rust-bert - Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...).
- rust-tokenizers - Offers high-performance tokenizers for modern language models.
- Replicating GPT-2 at Home (2021) (HN)
- Shifterator - Interpretable data visualizations for understanding how texts differ at the word level.
- CMU Neural Networks for NLP Course (2021) (Videos)
- minnn - Exercise in developing a minimalist neural network toolkit for NLP.
- Controllable Sentence Simplification (2019) (Code)
- Awesome Relation Extraction
- retext - Natural language processor powered by plugins part of the unified collective.
- CLIP Playground - Try OpenAI's CLIP model in your browser.
- GPT-3 Demo - GPT-3 Examples, Demos, Showcase, and NLP Use-cases.
- Big Sleep - Simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN.
- Beyond the Imitation Game Benchmark (BIG-bench) - Collaborative benchmark intended to probe large language models, and extrapolate their future capabilities.
- AutoNLP - Automatic way to train, evaluate and deploy state-of-the-art NLP models for different tasks.
- DeText - Deep Neural Text Understanding Framework for Ranking and Classification Tasks.
- Paragraph Vectors in PyTorch
- NeuSpell: A Neural Spelling Correction Toolkit
- Natural Language YouTube Search - Search inside YouTube videos using natural language.
- Accelerate - Simple way to train and use NLP models with multi-GPU, TPU, mixed-precision.
- Classical Language Toolkit (CLTK) - Python library offering natural language processing (NLP) for pre-modern languages. (Web)
- Guide: Finetune GPT2-XL
- GENRE (Generarive ENtity REtrieval) - Uses a sequence-to-sequence approach to entity retrieval (e.g., linking), based on fine-tuned BART architecture.
- Teachable NLP - GPT-2 Training as a Service.
- DensePhrases - Provides answers to your natural language questions from the entire Wikipedia in real-time.
- How to use GPT-3 recursively to solve general problems (2021)
- Podium - Framework agnostic Python NLP library for data loading and preprocessing.
- Prompts - Advanced GPT-3 playground. (Code)
- TextFlint - Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing.
- Awesome Text Summarization
- The GPT-3 Architecture, on a Napkin
- How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources
- GPT in 60 Lines of NumPy
BERT