House of Fusion
Search over 2,500 ColdFusion resources here
  
Home of the ColdFusion Community

Mailing Lists
Home /  Groups /  ColdFusion Talk (CF-Talk)

Checking for duplicates in list being imported

  << Previous Post |  RSS |  Sort Oldest First |  Sort Latest First |  Subscribe to this Group Next >> 
Top  |   Reply  |   Original Post  |   RSS Feed  |   Subscribe to this Group
Author:
Les Mizzell
05/22/2008 04:12 PM

I've got an application that a client uses to import email list from Excel spreadsheets into a database for use to send e-newsletters to specific groups. The database has grown quite large, and when importing list to specific group at this point, they're forgetting if a particular email address had already been added, and needs something that prevents duplicates from being added to the list. I can think of a couple of clumsy (resource intensive) ways to check the addresses being imported against what's already in the database, but there must be an optimal way to do it. <cf_check_duplicates_before_insert> ?? Suggestions?

Top  |   Parent  |   Reply  |   Original Post  |   RSS Feed  |   Subscribe to this Group
Author:
Sonny Savage
05/23/2008 09:28 AM

You you using a data loader native to the db, or are you inserting the records one at a time?  Native loaders generally have a way to prevent duplicates.  If you're inserting one at a time, just do a query first to see if it exists.  You'll definitely want an index on that column, possibly even a unique index. On Thu, May 22, 2008 at 4:10 PM, Les Mizzell <lesmizz@bellsouth.net> wrote: ----- Excess quoted text cut - see Original Post for more -----

Top  |   Parent  |   Reply  |   Original Post  |   RSS Feed  |   Subscribe to this Group
Author:
Scott Weikert
05/23/2008 09:51 AM

I had to deal with this same issue some time back... Here's what I do. First, I create a temporary table for the import, and do a dump into that, straight up. I create an identity field so that every record has a unique integer value to use later. I run updates on the fields in that table to do a little data cleanup - trimming off bounding quotes, extra spaces or odd invisible characters on the ends, etc. Then, I'll run a query to pull out just the records with duplicate emails. <cfquery name="qryDupeList"> SELECT    Email, ImportID FROM    (import table) GROUP BY Email, ImportID HAVING    COUNT(Email) > 1 ORDER BY Email </cfquery. I then loop over this query, skipping the first occurance of each email addy, and tucking the remaining matches' ImportIDs into a list. <cfset dupeImportIDList = ""> <cfset curEmail = ""> <cfloop query="qryDupeList">     <cfif Email IS NOT curEmail>         <cfset curEmail = Email>     <cfelse>         <cfset dupeImportIDList = ListAppend(dupeImportIDList, ImportID)>     </cfif> </cfloop> If there are any dupes, I just delete all records with those ImportIDs. <cfif ListLen(dupeImportIDList)>     <cfquery>         DELETE FROM (import table)         WHERE ImportID IN (#dupeImportIDList#)     </cfquery> </cfif> The only real trick with this is, the duplicate records deleted are kinda random. So if you have other data associated with those email addresses, and that data varies from record to record, you're basically flipping coins to see which one you end up with. You could choose to display the data on a page for the user to pick and choose which record out of the duped records they want to keep, naturally.

Top  |   Parent  |   Reply  |   Original Post  |   RSS Feed  |   Subscribe to this Group
Author:
Jeff Price
05/23/2008 11:02 AM

Depending on what database you are using, the database may be able to handle this quite easily. MS SQL Server can handle this via a unique index. Take a look at INGORE_DUP_KEY. I don't know if other DB's have access to something like this, but I'd have to think most of them do. -jeff ----- Excess quoted text cut - see Original Post for more -----


<< Previous Thread Today's Threads Next Thread >>

Search cf-talk

May 24, 2012

<<   <   Today   >   >>
Su Mo Tu We Th Fr Sa
     1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31     

Designer, Developer and mobile workflow conference