00:00:00 | Basic bag-of-words (BoW) |
00:00:22 | The need for vectors |
00:00:53 | Selecting and extracting features from our data |
00:04:04 | Idea: similar documents share similar vocabulary |
00:04:46 | Turning a corpus into a BoW matrix |
00:07:10 | What vectorization helps us accomplish |
00:08:20 | Measuring document similarity |
00:11:09 | Shortcomings of basic BoW |
00:12:37 | Capturing a bit of context with n-grams |
00:14:10 | DEMO: creating basic BoW with scikit-learn and spaCy, measuring document similarity, and creating n-grams |
00:19:35 | Basic BoW recap |