|
Mailing Lists
|
Home /
Groups /
ColdFusion Talk (CF-Talk)
Strip multiple words from string
Author: Claude Schneegans
Short Link: http://www.houseoffusion.com/groups/cf-talk/thread.cfm/threadid:32533#163317
>>Sometimes. But it's seldom that you'll find a solution to a problem that
you're unable to define.
May be, but sometimes it is much easier to find the solution first, then the
problem ;-))
--
_______________________________________
See some cool custom tags here:
http://www.contentbox.com/claude/customtags/tagstore.cfm
Please send any spam to this address: piegeacon@internetique.com
Thanks.
Author: DougF
Short Link: http://www.houseoffusion.com/groups/cf-talk/thread.cfm/threadid:32533#163316
> Just consider that there may be a big difference in the algorithm and the
> processing time between the two approaches of a) stripping duplicates and
> b) not adding duplicates to the assembled string in the first place.
Duplicates result from the assembly of strings. They need to be removed
after they are assembled.
> What are you calling a phrase?
A phrase in this case would be two or three words separated from other
phrases and words by a comma, i.e. "word1, phrase one, word2, phrase two,
phrase three".
----- Excess quoted text cut - see Original Post for more -----
I feel this solution is too complex for what is needed and would also be a
processing time concern.
> I you want to call any string of words between punctuation marks a
> "phrase", then loop over your original string as a list, but don't include
> white space characters among the delimiters.
Will try the loop approach.
Thanks,
Doug
Author: Jim McAtee
Short Link: http://www.houseoffusion.com/groups/cf-talk/thread.cfm/threadid:32533#163312
> Part of the difficulty is describing the problem... sometimes
> the description evolves as unanticipated results materialize.
Sometimes. But it's seldom that you'll find a solution to a problem that
you're unable to define.
> Better description of problem:
> Assemble a number of different strings with the final result
> being a single string of words and phases that are delimited
> by commas. Strip out duplicate words or phases in the result.
Just consider that there may be a big difference in the algorithm and the
processing time between the two approaches of a) stripping duplicates and
b) not adding duplicates to the assembled string in the first place.
> Singular words are allowed in the phrases. Could not the comma
> be used to distinguish between words and phrases?
In the original material? No. In the final list a comma is as good a
delimiter as any.
What are you calling a phrase? Check out the types and examples of
English phrases at the link below and note that commas seldom delineate a
phrase.
http://grammar.uoregon.edu/phrases/phrases.html
> Would be difficult to create a dictionary of all words that may
> be in original string.
No doubt. But words are fairly easy to parse - generally anything
delimited by white space or punctuation. Short of creating a parser for
English grammar, though, I'm not sure how you'd pull out phrases.
I you want to call any string of words between punctuation marks a
"phrase", then loop over your original string as a list, but don't include
white space characters among the delimiters.
Author: DougF
Short Link: http://www.houseoffusion.com/groups/cf-talk/thread.cfm/threadid:32533#163308
Part of the difficulty is describing the problem... sometimes the
description evolves as unanticipated results materialize.
Better description of problem:
Assemble a number of different strings with the final result being a single
string of words and phases that are delimited by commas. Strip out duplicate
words or phases in the result. Singular words are allowed in the phrases.
Could not the comma be used to distinguish between words and phrases? Would
be difficult to create a dictionary of all words that may be in original
string.
Will play with both Claude's and your suggestions.
Thanks,
Doug
----- Excess quoted text cut - see Original Post for more -----
Author: Jim McAtee
Short Link: http://www.houseoffusion.com/groups/cf-talk/thread.cfm/threadid:32533#163307
If, as you said, you're just putting together a keyword list, then take
Claude's last suggestion. However, distinguishing between a 'word' and a
'phrase', without knowing what constitutes a phrase (that is, wihout
already having a dictionary of what you want to consider to be phrases) is
going to be difficult or impossible.
To answer your question... yes. You'd first put together a keyword list
and then remove duplicates of those keywords from your string. But from
the sounds of what you're trying to do, once you've put together the
keyword list you're finished.
Try this:
<cfset s = "A string, or maybe not a string. Who knows? And who cares?">
<cfset keywords = "">
<cfloop index="w" list="#s#" delimiters=" .,?!;:%$&""'/|[]{}()">
<cfif not ListFindNoCase(keywords, w)>
<cfset keywords = ListAppend(keywords, LCase(w))>
</cfif>
</cfloop>
<cfoutput>
<pre>
s: #s#
keywords: #keywords#
</pre>
</cfoutput>
> Sorry should have said:
> "Is there a way to do this with 'OUT' having to specify the
words/phrases to
> search for"
> -Doug
> > Thanks Claude,
> > Taking your suggestion I put this function together. My effort works
in a
> > simple way, but it is not a total solution to my problem...
> > What I'm attempting to do is stitch together a number of different
strings
> > to create a keyword list. After the strings are assembled I need to
strip
> > out any duplicate words or phrases delimited by commas. Is there a way
to
----- Excess quoted text cut - see Original Post for more -----
front
> > str = rereplaceNoCase(str,"[^[:alnum:]]*$", "");// trims space at rear
> > str = ReplaceNoCase(str,"Word_1","1x1x1x1");// replace first occurance
of
----- Excess quoted text cut - see Original Post for more -----
additional
> > occurances of word.
> > str = ReplaceNoCase(str,"1x1%1x1","Word_1");// restore original word
at
----- Excess quoted text cut - see Original Post for more -----
text,
----- Excess quoted text cut - see Original Post for more -----
Author: DougF
Short Link: http://www.houseoffusion.com/groups/cf-talk/thread.cfm/threadid:32533#163306
Sorry should have said:
"Is there a way to do this with 'OUT' having to specify the words/phrases to
search for"
-Doug
> Thanks Claude,
> Taking your suggestion I put this function together. My effort works in a
> simple way, but it is not a total solution to my problem...
> What I'm attempting to do is stitch together a number of different strings
> to create a keyword list. After the strings are assembled I need to strip
> out any duplicate words or phrases delimited by commas. Is there a way to
do
> this with having to specify the words/phrases to search for. RegExp's
maybe?
----- Excess quoted text cut - see Original Post for more -----
first
> occurrence of the word by some string unlikely to be found in the text,
> like say "%*%*%*", then replace all remaining words by nothing, using
"all"
> , then but back the word at the place %*%*%* is.
> > If you don't have too many words, the solution is workable.
Author: Claude Schneegans
Short Link: http://www.houseoffusion.com/groups/cf-talk/thread.cfm/threadid:32533#163305
>>What I'm attempting to do is stitch together a number of different
strings
to create a keyword list.
Ah ah! Now this is a bit different.
Hmmm, to do this, I would
1º remove all puctuation marks, CR, LF etc,
2º consider the text as a space delimited list,
3º in a loop, create a new list by adding any word from the first list which is
not already in the new one and which has more that 3 chars or so.
--
_______________________________________
See some cool custom tags here:
http://www.contentbox.com/claude/customtags/tagstore.cfm
Please send any spam to this address: piegeacon@internetique.com
Thanks.
Author: DougF
Short Link: http://www.houseoffusion.com/groups/cf-talk/thread.cfm/threadid:32533#163304
Thanks Claude,
Taking your suggestion I put this function together. My effort works in a
simple way, but it is not a total solution to my problem...
What I'm attempting to do is stitch together a number of different strings
to create a keyword list. After the strings are assembled I need to strip
out any duplicate words or phrases delimited by commas. Is there a way to do
this with having to specify the words/phrases to search for. RegExp's maybe?
Any suggestions?
-Doug
-----------------------
<cfscript>
function CleanKeywords(str){
str = ReReplaceNoCase(str,"^[^[:alnum:]]*", "");// trims space at front
str = rereplaceNoCase(str,"[^[:alnum:]]*$", "");// trims space at rear
str = ReplaceNoCase(str,"Word_1","1x1x1x1");// replace first occurance of
word with place holder word.
str = ReplaceNoCase(str,"Word_2","2x2x2x2");
str = ReplaceNoCase(str,"Word_3","3x3x3x3");
// add addional words to find.
str = ReplaceList(str,"Word_1,Word_2,Word_3",", , ,");// delete additional
occurances of word.
str = ReplaceNoCase(str,"1x1%1x1","Word_1");// restore original word at
place holder word.
str = ReplaceNoCase(str,"2x2%2x2","Word_2");
str = ReplaceNoCase(str,"3x3%3x3","Word_3");
// add addional words to restore.
return str;
}
</cfscript>
> Hmmmm, I think I would use the replace function to first replace the first
occurrence of the word by some string unlikely to be found in the text,
like say "%*%*%*", then replace all remaining words by nothing, using "all"
, then but back the word at the place %*%*%* is.
> If you don't have too many words, the solution is workable.
Author: Claude Schneegans
Short Link: http://www.houseoffusion.com/groups/cf-talk/thread.cfm/threadid:32533#163302
Hmmmm, I think I would use the replace function to first replace the first
occurrence of the word by some string unlikely to be found in the text, like say
"%*%*%*", then replace all remaining words by nothing, using "all" , then but
back the word at the place %*%*%* is.
If you don't have too many words, the solution is workable.
--
_______________________________________
See some cool custom tags here:
http://www.contentbox.com/claude/customtags/tagstore.cfm
Please send any spam to this address: piegeacon@internetique.com
Thanks.
Author: Douglas Fentiman
Short Link: http://www.houseoffusion.com/groups/cf-talk/thread.cfm/threadid:32533#163300
Any suggestions how to strip multiple occurances of a short list (4-8) of words
from a string. The first occurance of each word must be preserved at its
position. Using CF5.
Thanks,
Doug
|
May 24, 2012
|
Latest Fusion Authority Articles
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||