I continued my text experiments from last week with Hidden Markov Chains to analyze the same combination if texts. Still using the IndoEuropean Tokenizer, in addition to Regex Tokenizer.
I made some slight modifications to the code switching between output of Nouns and Verbs-NN/V.
I found that the trained Brown Model (CK 01-027) gave me dubious results, I assume based on the hidden statistical probability of the Markov Chain.
For each instance, words that are obviously either verbs or nouns, are being chunked, tokenized and tagged as different POS.
Not sure why there isn’t much precision, but I am still trying to wrap my head around the code, and perhaps the CK portion of the Brown Model is not the most appropriate training source.
Visualizations based on random RGB values.