This little function returns the vector of each word found before the end of a sentence. I ended up writing this for a pet project to help with the babble function within the ngrams R package.
It can be used to find the best spot to terminate sentences from the resulting babbles and adjusted to fit your needs.
1 2 | stops <- data.frame(table(termination_words)) stops <- stops[which(stops$Freq>10),] |
The stringr package also has similar functionality, str_extract to work the way I wanted. Probably just me though.
Be First to Comment