House of Fusion
Search over 2,500 ColdFusion resources here
  
Home of the ColdFusion Community

Mailing Lists
Home /  Groups /  ColdFusion Talk (CF-Talk)

Strip multiple words from string

  << Previous Post |  RSS |  Sort Oldest First |  Sort Latest First |  Subscribe to this Group Next >> 
Top  |   Reply  |   Original Post  |   RSS Feed  |   Subscribe to this Group
Author:
Douglas Fentiman
05/15/2004 10:38 PM

Any suggestions how to strip multiple occurances of a short list (4-8) of words from a string. The first occurance of each word must be preserved at its position. Using CF5. Thanks, Doug

Top  |   Parent  |   Reply  |   Original Post  |   RSS Feed  |   Subscribe to this Group
Author:
Claude Schneegans
05/15/2004 11:29 PM

Hmmmm, I think I would use the replace function to first replace the first occurrence of the word by some string unlikely to be found in the text,  like say "%*%*%*", then replace all remaining words by nothing, using "all" , then but back the word at the place %*%*%* is. If you don't have too many words, the solution is workable. -- _______________________________________ See some cool custom tags here: http://www.contentbox.com/claude/customtags/tagstore.cfm Please send any spam to this address: piegeacon@internetique.com Thanks.

Top  |   Parent  |   Reply  |   Original Post  |   RSS Feed  |   Subscribe to this Group
Author:
DougF
05/16/2004 03:04 PM

Thanks Claude, Taking your suggestion I put this function together. My effort works in a simple way, but it is not a total solution to my problem... What I'm attempting to do is stitch together a number of different strings to create a keyword list. After the strings are assembled I need to strip out any duplicate words or phrases delimited by commas. Is there a way to do this with having to specify the words/phrases to search for. RegExp's maybe? Any suggestions? -Doug ----------------------- <cfscript> function CleanKeywords(str){ str = ReReplaceNoCase(str,"^[^[:alnum:]]*", "");// trims space at front str = rereplaceNoCase(str,"[^[:alnum:]]*$", "");// trims space at rear str = ReplaceNoCase(str,"Word_1","1x1x1x1");// replace first occurance of word with place holder word. str = ReplaceNoCase(str,"Word_2","2x2x2x2"); str = ReplaceNoCase(str,"Word_3","3x3x3x3"); // add addional words to find. str = ReplaceList(str,"Word_1,Word_2,Word_3",", , ,");// delete additional occurances of word. str = ReplaceNoCase(str,"1x1%1x1","Word_1");// restore original word at place holder word. str = ReplaceNoCase(str,"2x2%2x2","Word_2"); str = ReplaceNoCase(str,"3x3%3x3","Word_3"); // add addional words to restore. return str; } </cfscript> > Hmmmm, I think I would use the replace function to first replace the first occurrence of the word by some string unlikely to be found in the text, like say "%*%*%*", then replace all remaining words by nothing, using "all" , then but back the word at the place %*%*%* is. >  If you don't have too many words, the solution is workable.

Top  |   Parent  |   Reply  |   Original Post  |   RSS Feed  |   Subscribe to this Group
Author:
Claude Schneegans
05/16/2004 03:14 PM

>>What I'm attempting to do is stitch together a number of different strings to create a keyword list. Ah ah! Now this is a bit different. Hmmm, to do this, I would 1º remove all puctuation marks, CR, LF etc, 2º consider the text as a space delimited list, 3º in a loop, create a new list by adding any word from the first list which is not already in the new one and which has more that 3 chars or so. -- _______________________________________ See some cool custom tags here: http://www.contentbox.com/claude/customtags/tagstore.cfm Please send any spam to this address: piegeacon@internetique.com Thanks.

Top  |   Parent  |   Reply  |   Original Post  |   RSS Feed  |   Subscribe to this Group
Author:
DougF
05/16/2004 03:31 PM

Sorry should have said: "Is there a way to do this with 'OUT' having to specify the words/phrases to search for" -Doug > Thanks Claude, > Taking your suggestion I put this function together. My effort works in a > simple way, but it is not a total solution to my problem... > What I'm attempting to do is stitch together a number of different strings > to create a keyword list. After the strings are assembled I need to strip > out any duplicate words or phrases delimited by commas. Is there a way to do > this with having to specify the words/phrases to search for. RegExp's maybe? ----- Excess quoted text cut - see Original Post for more ----- first > occurrence of the word by some string unlikely to be found in the text, > like say "%*%*%*", then replace all remaining words by nothing, using "all" > , then but back the word at the place %*%*%* is. > >  If you don't have too many words, the solution is workable.

Top  |   Parent  |   Reply  |   Original Post  |   RSS Feed  |   Subscribe to this Group
Author:
Jim McAtee
05/16/2004 04:05 PM

If, as you said, you're just putting together a keyword list, then take Claude's last suggestion.  However, distinguishing between a 'word' and a 'phrase', without knowing what constitutes a phrase (that is, wihout already having a dictionary of what you want to consider to be phrases) is going to be difficult or impossible. To answer your question... yes.  You'd first put together a keyword list and then remove duplicates of those keywords from your string.  But from the sounds of what you're trying to do, once you've put together the keyword list you're finished. Try this: <cfset s = "A string, or maybe not a string. Who knows? And who cares?"> <cfset keywords = ""> <cfloop index="w" list="#s#" delimiters=" .,?!;:%$&""'/|[]{}()">   <cfif not ListFindNoCase(keywords, w)>     <cfset keywords = ListAppend(keywords, LCase(w))>   </cfif> </cfloop> <cfoutput> <pre> s:        #s# keywords: #keywords# </pre> </cfoutput> > Sorry should have said: > "Is there a way to do this with 'OUT' having to specify the words/phrases to > search for" > -Doug > > Thanks Claude, > > Taking your suggestion I put this function together. My effort works in a > > simple way, but it is not a total solution to my problem... > > What I'm attempting to do is stitch together a number of different strings > > to create a keyword list. After the strings are assembled I need to strip > > out any duplicate words or phrases delimited by commas. Is there a way to ----- Excess quoted text cut - see Original Post for more ----- front > > str = rereplaceNoCase(str,"[^[:alnum:]]*$", "");// trims space at rear > > str = ReplaceNoCase(str,"Word_1","1x1x1x1");// replace first occurance of ----- Excess quoted text cut - see Original Post for more ----- additional > > occurances of word. > > str = ReplaceNoCase(str,"1x1%1x1","Word_1");// restore original word at ----- Excess quoted text cut - see Original Post for more ----- text, ----- Excess quoted text cut - see Original Post for more -----

Top  |   Parent  |   Reply  |   Original Post  |   RSS Feed  |   Subscribe to this Group
Author:
DougF
05/16/2004 04:56 PM

Part of the difficulty is describing the problem... sometimes the description evolves as unanticipated results materialize. Better description of problem: Assemble a number of different strings with the final result being a single string of words and phases that are delimited by commas. Strip out duplicate words or phases in the result. Singular words are allowed in the phrases. Could not the comma be used to distinguish between words and phrases? Would be difficult to create a dictionary of all words that may be in original string. Will play with both Claude's and your suggestions. Thanks, Doug ----- Excess quoted text cut - see Original Post for more -----

Top  |   Parent  |   Reply  |   Original Post  |   RSS Feed  |   Subscribe to this Group
Author:
Jim McAtee
05/16/2004 05:27 PM

> Part of the difficulty is describing the problem... sometimes > the description evolves as unanticipated results materialize. Sometimes. But it's seldom that you'll find a solution to a problem that you're unable to define. > Better description of problem: > Assemble a number of different strings with the final result > being a single string of words and phases that are delimited > by commas. Strip out duplicate words or phases in the result. Just consider that there may be a big difference in the algorithm and the processing time between the two approaches of a) stripping duplicates and b) not adding duplicates to the assembled string in the first place. > Singular words are allowed in the phrases. Could not the comma > be used to distinguish between words and phrases? In the original material? No. In the final list a comma is as good a delimiter as any. What are you calling a phrase?  Check out the types and examples of English phrases at the link below and note that commas seldom delineate a phrase. http://grammar.uoregon.edu/phrases/phrases.html > Would be difficult to create a dictionary of all words that may > be in original string. No doubt.  But words are fairly easy to parse - generally anything delimited by white space or punctuation.  Short of creating a parser for English grammar, though, I'm not sure how you'd pull out phrases. I you want to call any string of words between punctuation marks a "phrase", then loop over your original string as a list, but don't include white space characters among the delimiters.

Top  |   Parent  |   Reply  |   Original Post  |   RSS Feed  |   Subscribe to this Group
Author:
DougF
05/16/2004 06:20 PM

> Just consider that there may be a big difference in the algorithm and the > processing time between the two approaches of a) stripping duplicates and > b) not adding duplicates to the assembled string in the first place. Duplicates result from the assembly of strings. They need to be removed after they are assembled. > What are you calling a phrase? A phrase in this case would be two or three words separated from other phrases and words by a comma, i.e. "word1, phrase one, word2, phrase two, phrase three". ----- Excess quoted text cut - see Original Post for more ----- I feel this solution is too complex for what is needed and would also be a processing time concern. > I you want to call any string of words between punctuation marks a > "phrase", then loop over your original string as a list, but don't include > white space characters among the delimiters. Will try the loop approach. Thanks, Doug

Top  |   Parent  |   Reply  |   Original Post  |   RSS Feed  |   Subscribe to this Group
Author:
Claude Schneegans
05/16/2004 06:29 PM

>>Sometimes. But it's seldom that you'll find a solution to a problem that you're unable to define. May be, but sometimes it is much easier to find the solution first, then the problem ;-)) -- _______________________________________ See some cool custom tags here: http://www.contentbox.com/claude/customtags/tagstore.cfm Please send any spam to this address: piegeacon@internetique.com Thanks.


<< Previous Thread Today's Threads Next Thread >>

Search cf-talk

September 09, 2010

<<   <   Today   >   >>
Su Mo Tu We Th Fr Sa
       1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30