
Jul 29, 2022 58m
How do you course of and classify textual content paperwork in Python? What are the basic strategies and constructing blocks for Pure Language Processing (NLP)? This week on the present, Jodie Burchell, developer advocate for knowledge science at JetBrains, talks about how machine studying (ML) fashions perceive textual content.
Episode Sponsor:
Jodie explains how ML fashions require knowledge in a structured format, which entails remodeling textual content paperwork into columns and rows. She covers essentially the most simple method, known as binary vectorization. We focus on the bag-of-words methodology and the instruments of stemming, lemmatization, and rely vectorization.
We bounce into phrase embedding fashions subsequent. Jodie talks about WordNet, Pure Language Toolkit (NLTK), word2vec, and Gensim. Our dialog lays a basis for beginning with textual content classification, implementing sentiment evaluation, and constructing tasks utilizing these instruments. Jodie additionally shares a number of sources that will help you proceed exploring NLP and modeling.
Course Highlight: Learn Text Classification With Python and Keras
On this course, you’ll find out about Python textual content classification with Keras, working your manner from a bag-of-words mannequin with logistic regression to extra superior strategies, reminiscent of convolutional neural networks. You’ll see how you should utilize pretrained phrase embeddings, and also you’ll squeeze extra efficiency out of your mannequin by means of hyperparameter optimization.
Matters:
- 00:00:00 – Introduction
- 00:02:47 – Exploring the subject
- 00:06:00 – Perceived sentience of LaMDA
- 00:10:24 – How will we get began?
- 00:11:16 – What are classification and sentiment evaluation?
- 00:13:03 – Remodeling textual content in rows and columns
- 00:14:47 – Sponsor: Snyk
- 00:15:27 – Bag-of-words method
- 00:19:12 – Stemming and lemmatization
- 00:22:05 – Capturing N-grams
- 00:25:34 – Depend vectorization
- 00:27:14 – Cease phrases
- 00:28:46 – Textual content Frequency / Inverse Doc Frequency (TFIDF) vectorization
- 00:32:28 – Potential tasks for bag-of-words strategies
- 00:34:07 – Video Course Highlight
- 00:35:20 – WordNet and NLTK bundle
- 00:37:27 – Phrase embeddings and word2vec
- 00:45:30 – Earlier coaching and too many dimensions
- 00:50:07 – Learn how to use word2vec and Gensim?
- 00:51:26 – What varieties of tasks for word2vec and Gensim?
- 00:54:41 – Moving into GPT and BERT in one other episode
- 00:56:11 – Learn how to observe Jodie’s work?
- 00:57:36 – Thanks and goodbye
Present Hyperlinks:
Tweet
Share
Share
Email
class=”h4″>