|
Mailing Lists
|
Home /
Groups /
ColdFusion Talk (CF-Talk)
Extracting an email address out of a text file
I have about 1800 text files that I exported from outlook that are allBen Densmore 05/21/04 11:14 A >> I really don't wantClaude Schneegans 05/21/04 11:21 A I can vouch for cf_reextract. I use it as a spider for a web cvs I wroteBryan F. Hogan 05/21/04 11:23 A Ben Densmore wrote:Jochem van Dieten 05/21/04 11:52 A There's also a tag on the DevEx, "StripEmail" - you pass it a string andScott Weikert 05/21/04 12:11 P Thanks Claude, I bought the tag and it seems to work ok. I'm wonderingBen Densmore 05/21/04 12:19 P >> I bought the tagClaude Schneegans 05/21/04 12:38 P Thanks Claude. According to the REWizard my regular expression works butBen Densmore 05/21/04 02:04 P >>According to the REWizard my regular expression worksClaude Schneegans 05/21/04 02:29 P I have about 1800 text files that I exported from outlook that are all bounced emails. I need to extract the email addresses that were bounced back. Unfortunately not all mail servers bounce messages that look alike. I was thinking I could just read through each text file and find where the email address is but in a lot of the files there will be the email address of the person who the email was originally from, the email address of the server that bounced the email and the email address who it was originally sent to. The only time these files are consistent is when they were bounced by the same mail server software. I was thinking maybe I will loop through the ones that are in the same format, i.e. the email address that was bad looks like (someemail@somedomain.com) and strip out everything but the email address. That's the only method I can come up with at the moment, has anyone done something similar or maybe have a better solution? I really don't want to manually go through each text file and copy and paste the email address. Thanks, Ben >> I really don't want to manually go through each text file and copy and paste the email For this I would try to use CF_REextract to extract the address in its context. The address is variable, but the context is not. See CF_REextract at http://www.contentbox.com/claude/customtags/tagstore.cfm -- _______________________________________ See some cool custom tags here: http://www.contentbox.com/claude/customtags/tagstore.cfm Please send any spam to this address: piegeacon@internetique.com Thanks. I can vouch for cf_reextract. I use it as a spider for a web cvs I wrote and it works wonderfully. Claude Schneegans wrote: ----- Excess quoted text cut - see Original Post for more ----- Ben Densmore wrote: ----- Excess quoted text cut - see Original Post for more ----- That is indeed the way to go. In some cases it is easy, for instance if you want to get all AOL bounces you can do a simple regex replace on your email: REReplaceNoCase(email,"^.*----- The following addresses had permanent fatal errors -----(.+)----- Transcript of session follows -----.*$","\1") Simple patterns like these should find you most of the addresses, just look for 5xx (permanent) error codes next to an email address. But it is a case of diminishing returns and at some point it is easier to do the rest by hand. Jochem There's also a tag on the DevEx, "StripEmail" - you pass it a string and it'll pull out email addresses and return them as a list. Works great for me. Thanks Claude, I bought the tag and it seems to work ok. I'm wondering if someone can help me with a reg ex. Just playing around with the tag I was trying: <CF_REextract INPUTMODE="File" INPUT="c:\cfusionmx\wwwroot\email_blast\#name#" OUTPUTMODE="output" RE1="([[:alnum:]_\.\-]+@([[:alnum:]_\.\-]))" RE2="\.+[[:alpha:]]{2,4}" > Which I thought would extract email addresses but it doesn't seem to be working. Since I suck at them I'm sure I'm doing something wrong with it. Can anyone modify this reg ex a little for me? Thanks, Ben >> I really don't want to manually go through each text file and copy and paste the email For this I would try to use CF_REextract to extract the address in its context. The address is variable, but the context is not. See CF_REextract at http://www.contentbox.com/claude/customtags/tagstore.cfm -- _______________________________________ See some cool custom tags here: http://www.contentbox.com/claude/customtags/tagstore.cfm Please send any spam to this address: piegeacon@internetique.com Thanks. ________________________________ >> I bought the tag Congratulations: you're on the right track ! ;-) >>if someone can help me with a reg ex. Just playing around with the tag I was trying: Sure, try the REWizard, same author, same address, and this one is on line and free. Make sure you work with the CF syntax, not the Javascript. -- _______________________________________ See some cool custom tags here: http://www.contentbox.com/claude/customtags/tagstore.cfm Please send any spam to this address: piegeacon@internetique.com Thanks. Thanks Claude. According to the REWizard my regular expression works but in the REExtract tag it doesn't. Is there any way to not use the RE2 option? Thanks, Ben >> I bought the tag Congratulations: you're on the right track ! ;-) >>if someone can help me with a reg ex. Just playing around with the tag I was trying: Sure, try the REWizard, same author, same address, and this one is on line and free. Make sure you work with the CF syntax, not the Javascript. -- _______________________________________ See some cool custom tags here: http://www.contentbox.com/claude/customtags/tagstore.cfm Please send any spam to this address: piegeacon@internetique.com Thanks. ________________________________ >>According to the REWizard my regular expression works You mean RE1 works? Now what's about RE2? >> Is there any way to not use the RE2 option? No, because without RE2, the tag does not know where the zone it has to extract stops. You have to make sure that you have both RE1 and RE2 that recognize something in you file. When this is done, REExtract will retun what's between the two occurrences. -- _______________________________________ See some cool custom tags here: http://www.contentbox.com/claude/customtags/tagstore.cfm Please send any spam to this address: piegeacon@internetique.com Thanks.
|
February 09, 2012
|
Latest Fusion Authority Articles
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||