Python, Firefox programming and Irish Whiskey.

Sunday, November 15, 2009

word count

How do you count words in a html document? In javascript? When you have got the HTML's DOM, javascript and whole Firefox at your disposal? Of course, there is a bruteforce method of going through all DOM nodes, but that's rather silly. I found this: (source)
Nice, but it has got 2 problems. It doesn't split words well, e.g. if you have something like word1,word2,word3, it is considered to be a single word. So let us not split by spaces but by something more complex: Unfortunately this is still not enough for it will include words from between the script tags. What now? Let's simply substract the scripts and we are done. Now, the trouble is, this lot can also contain spaces now. But it does not bother me much as I need to go through all words one by one and I can remove unsuitables then.

Originally at wordpress, 2009-10-06.

No comments:

Post a Comment