House of Fusion
Search over 2,500 ColdFusion resources here
  
Home of the ColdFusion Community

Mailing Lists
Home /  Groups /  Regular Expressions (RegEx)

Find string with more then numbers between two other strings

  << Previous Post |  RSS |  Sort Oldest First |  Sort Latest First |  Subscribe to this Group Next >> 
Top  |   Reply  |   Original Post  |   RSS Feed  |   Subscribe to this Group
Author:
Ian Skinner
03/07/2011 01:41 PM

I'm struggling with a regular expression to match this: <SITE_LOC_ID><!--help--></SITE_LOC_ID> Where the help content is any string that contains one or more characters that are NOT digits. I know [^0-9] would match a single non-digit character.  But I don't know how to allow there to be one or more such characters mixed in zero or more digit characters. I need to return the strings that match this so that I can replace them. Plus this is a xml file that will have upwards of 75,000 record nodes each with one of these <SITE_LOC_ID>...</SITE_LOC_ID> nodes along side several other nodes of each record.  So I want to make sure I match only the content of a single node. TIA P.S.  The file is too large to parse into an XML data structure, so I am doing simple string replace() and rereplace() functions to modify the XML text file.

Top  |   Parent  |   Reply  |   Original Post  |   RSS Feed  |   Subscribe to this Group
Author:
Barney Boisvert
03/07/2011 02:41 PM

Get a better parser.  SAX - is designed for stream processing - is exactly what you need.  The DOM-centric CF XML stuff is great for simple stuff, but as you've found, only works for small documents.  I haven't checked CF9, but CF8 uses the Apache XML tooling, which includes a SAX parser.  I'd expect CF9 to be the same. If you can't/won't and must pursue the regex approach, you'll again need to get a better RegEx engine than what CF ships with. Specifically one that supports lookahead and lookbehind to anchor yourself.  Again, you already have what you need as the java.util.regex package provides all this functionality for you (CF uses ORO, which doesn't have it, instead of the Java-native stuff). The gist is this (which will work from CFML as-is): newXmlString = xmlString.replaceAll(regex, replaceString); cheers, barneyb ----- Excess quoted text cut - see Original Post for more -----

Top  |   Parent  |   Reply  |   Original Post  |   RSS Feed  |   Subscribe to this Group
Author:
Ian Skinner
03/07/2011 05:02 PM

On 3/7/2011 11:40 AM, Barney Boisvert wrote: Well, I'm not really looking for a new XML parser at this time as ColdFusion is not expected to parse the file.  I am only trying to clean up an example, demonstration file that the can then be used for other testing purposes. > Specifically one that supports lookahead and lookbehind to anchor > yourself. Ok, but I can not see how to use lookahead (which does exist without going into the Java) or lookbehind to do what I need.  As best as I can tell, those would be great to get some of the numbers, if there where non-digit characters in the string.  But I do not see how to match the ENTIRE string, IF one or more of the characters in the string are a character. I.E. <tag>19984798</tag>  NOT a match. <tag>18435A89</tag> IS a match, return 18435A89. <tag>Z8457920</tag> IS a match, return Z845792. <tag>7493841-</tag> IS a match, return 7493841- ETC.

Top  |   Parent  |   Reply  |   Original Post  |   RSS Feed  |   Subscribe to this Group
Author:
Barney Boisvert
03/07/2011 05:14 PM

Actually, now that I think about it a little more.... Is it safe to assume that you have well-formed XML (even though you don't want to parse it), that the <tag> element has no child elements, and finally that there are no comments or CDATA blocks in the file? If so, you can look for <tag>([^<]*[^0-9][^<]*)</tag> and that should get you what you want. I still think a better solution would be to SAX is through a pipeline to modify it stream-wise, because any time you're manipulating XML in a non-XML-aware fashion you're just asking for pain.  But if my stated assumptions are valid and you're confident that they will always remain so, that regex should work. If those assumptions aren't valid or you don't feel comfortable relying on them, you'd gonna need an XML-aware mechanism to process the XML. cheers, barneyb ----- Excess quoted text cut - see Original Post for more -----


<< Previous Thread Today's Threads Next Thread >>

Search regex

May 25, 2013

<<   <   Today   >   >>
Su Mo Tu We Th Fr Sa
       1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31   

Designer, Developer and mobile workflow conference