This lesson is part of The Carpentries Incubator, a place to share and use each other's Carpentries-style lessons. This lesson has not been reviewed by and is not endorsed by The Carpentries.
Welcome to the Text Analysis workshop for Python! Below is the list of lessons including a brief summary. Click through the modules and follow our video lessons along with the full-text lessons below. Be sure to follow the setup instructions at the bottom of this page before you start the modules.
Setup | Download files required for the lesson | |
00:00 | 1. Introduction to Natural Language Processing | What is Natural Language Processing? What tasks can be done by Natural Language Processing? What does a workflow for an NLP project look? |
00:35 | 2. Corpus Development- Text Data Collection | How do I evaluate what kind of data to use for my project? What do I need to consider when building my corpus? |
01:15 | 3. Preparing and Preprocessing Your Data | How can I prepare data for NLP? What are tokenization, casing and lemmatization? |
01:35 | 4. Vector Space and Distance | How can we model documents effectively? How can we measure similarity between documents? What’s the difference between cosine similarity and distance? |
02:15 | 5. Document Embeddings and TF-IDF | What is a document embedding? What is TF-IDF? |
02:45 | 6. Latent Semantic Analysis | What is topic modeling? What is Latent Semantic Analysis (LSA)? |
03:15 | 7. Intro to Word Embeddings | How can we extract vector representations of individual words rather than documents? What sort of research questions can be answered with word embedding models? |
04:00 | 8. The Word2Vec Algorithm | How does the Word2Vec model produce meaningful word embeddings? How is a Word2Vec model trained? |
04:45 | 9. Training Word2Vec | How can we train a Word2Vec model? When is it beneficial to train a Word2Vec model on a specific dataset? |
05:50 | 10. Finetuning LLMs | How can I fine-tune preexisting LLMs for my own research? How do I pick the right data format? How do I create my own labels? How do I put my data into a model for finetuning? How do I evaluate success at my task? |
07:50 | 11. Ethics and Text Analysis | Is text analysis artificial intelligence? How can training data influence results? What are the risk zones to consider when using text analysis for research? |
08:30 | Finish |