Monday 20 February 2012

Word count in Javascript

I notice the (current at time of writing) top Google results for word count in Javascript all offer something like:
function countWrong(textarea){
 var words=textarea.split(" ");
 alert(words.length+" words");
}
This is Wrong!


Why? Because  a space isn't the only thing that can separate words. If the user hits Return at the end of the word, this inserts a newline, which won't be counted as a word separator by the above code. And if the user adds commas or other punctuation  that is not followed by a space - or adds a comma with a space either side of it - then this will also result in an incorrect count.


Even worse, if the user has more than one space between words, or extra space at the end of the input, then this will up the word count making it even more inaccurate.


Fortunately there's an easy way to count words using regular expressions. A pattern of '\w+' will match a string of characters that can be in a word. With this technique, there's no need to worry about trimming the string to remove excess space, either.

So try this instead:


function wordCount(textarea){
 var chars=textarea.value.length,
 words=textarea.value.match(/\w+/g).length;
 alert(words+" words\n"+chars+" characters");
}


for a much more accurate result.

1 comment:

  1. Thank you very much for this script. It is the first word counter that works as I expect it to work. Next to scripts that just split by spaces there are also some that use RegEx but still count things like dashes surrounded by spaces, and that's wrong. Thank you again!

    ReplyDelete