mike mcintyre's

.N e t J o u r n a l

vbCity Blogs moved to:
http://cs.vbcity.com/blogs
  Home :: Syndication  :: Login

OctNovember 2009Dec
SMTWTFS
25262728293031
1234567
891011121314
15161718192021
22232425262728
293012345

Archives

Topics

Source Code

Source Code: Count Occurences of Words and Terms in a String

Count Occurences of Words and Terms in a String

This article with source code shows how to create a class that can be used to count the occurrences of words and terms in a string.

 

There are many reasons to count the occurrences of words and terms in a string.

 

A page author may count the occurrences of words and terms in a web page to determine which words and terms search engines will recognize on a web page.

 

A web spider may count the occurrences of words and terms in a page it is crawling as part of determining what page ranking the page should be given.

 

The  'WordAndTermCounter' and 'TermCount' classes shown below can count the occurrences of words and terms in a string.

 

Imports System.Collections.Generic

Imports System.Text.RegularExpressions

Public Class WordAndTermCounter

 

    Public Shared Function GetTermCount(ByVal pString As String, ByVal pTerms As String) As List(Of TermCount)

 

        ' Declare a variable named termCountHashTable of type HashTable;

        '    call the HashTable's New method;

        '    assign the HashTable object the New method returns to the termHashTable variable.

        Dim termCountHashTable As New Hashtable

 

        ' If no terms were passed into this function

        '  count the occurences of every word in pString

        If String.IsNullOrEmpty(pTerms) Then

            ' Get an array of all the words in pString.

            Dim terms() As String = pString.Split(" "c)

 

            ' Loop the terms array.

            For i As Integer = 0 To terms.GetUpperBound(0)

                ' Declare a variable named word of type String.

                Dim word As String

                ' Assign the word at index i of the terms array to the word variable.

                word = terms(i).Trim

 

                If word <> String.Empty Then

                    ' If the termCountHashTable already contains a termCount object for the word.

                    If termCountHashTable.ContainsKey(word) Then

                        ' Increment the termCount object's TermCount property.

                        CType(termCountHashTable.Item(word), TermCount).IncrementTermCount()

                    Else

                        ' Add a new TermCount object to the termCountHashTable;

                        '   it's initial count will be set to 1.

                        termCountHashTable.Add(word, New TermCount(word))

                    End If

                End If

            Next

 

        Else

            ' If terms were passed into this function

            '   count the occurence of each term.

 

            ' Get an array of all the terms in pTerms.

            Dim terms() As String = pTerms.Split(","c)

 

            ' Loop through the terms array.

            For ii As Integer = 0 To terms.GetUpperBound(0)

                ' Declare a variable named term of type String.

                Dim term As String

                ' Assign the term at index i of the terms array to the terms variable.

                term = terms(ii).Trim.ToUpper

 

                ' Declare and instantiate a RegEx object.

                ' Dim wordRegEx As New Regex(term)

 

                ' Delclare a Match object.

                Dim theMatch As MatchCollection

 

                ' Call wordRegEx's RegEx Match method;

                '    assign the resulting Match object to theMatch variable.

                theMatch = Regex.Matches(pString.ToUpper, term)

 

                ' Add a new TermCount object to the termCountHashTable.

                termCountHashTable.Add(term, New TermCount(term, theMatch.Count))

            Next

 

        End If

 

        Dim listOfTermCount As New List(Of TermCount)

        Dim x As DictionaryEntry

        For Each x In termCountHashTable

            Dim theWordCount As TermCount = CType(x.Value, TermCount)

            listOfTermCount.Add(theWordCount)

        Next

 

        Return listOfTermCount

    End Function

 

End Class

 

Public Class TermCount

    Implements IComparable(Of TermCount)

 

    ' Term Property

    Private m_Term As String

    Public Property Term() As String

        Get

            Return m_Term

        End Get

        Set(ByVal value As String)

            m_Term = value

        End Set

    End Property

 

    ' TermCount Property

    Private m_TermCount As Integer

    Public Property TermCount() As Integer

        Get

            Return m_TermCount

        End Get

        Set(ByVal value As Integer)

            m_TermCount = value

        End Set

    End Property

 

    Public Sub New(ByVal word As String)

        Me.m_Term = word

        Me.IncrementTermCount()

    End Sub

 

    Public Sub New(ByVal word As String, ByVal wordCount As Integer)

        Me.m_Term = word

        Me.m_TermCount = wordCount

    End Sub

 

    Public Sub IncrementTermCount()

        Me.m_TermCount += 1

    End Sub

 

    Public Function CompareTo(ByVal other As TermCount) As Integer Implements System.IComparable(Of TermCount).CompareTo

        Return other.TermCount.CompareTo(Me.TermCount)

    End Function

End Class

Click the download link above to get Visual Basic source code in a Visual Studio Windows solution that demonstrates how to use the class above in an application.

 

mike mcintyre  http://www.getdotnetcode.com

 

posted on Monday, December 11, 2006 4:06 PM