Welcome to the { mindfrost82.com } forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact contact us.

Go Back   { mindfrost82.com } > Gadget Corner > Tech Newsgroups > Programming > Databases > General SQL Server Support

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 08-06-2008, 03:33 PM
DaveL
 
Posts: n/a
Deduping

Hi Hows it going , I need a Fast way to check for duplicate rows in a single
table

the identity keys and Primary Keys dont play in this scenario


i have like 10 columns
Column1 thru column10 as the columns that make it a duplicate row,

any help is welcome
Thanks


Reply With Quote
  #2 (permalink)  
Old 08-06-2008, 04:24 PM
Plamen Ratchev
 
Posts: n/a
Re: Deduping

To check you can use something like this:

SELECT col1, col2, col3, col4, col5, col6, col7, col8, col9, col10, COUNT(*)
AS cnt
FROM Table
GROUP BY col1, col2, col3, col4, col5, col6, col7, col8, col9, col10
HAVING COUNT(*) > 1;

To remove duplicates, on SQL Server 2005 you can use this query:

WITH Dups
AS
(SELECT *, ROW_NUMBER() OVER(
PARTITION BY col1, col2, col3, col4, col5, col6, col7, col8,
col9, col10
ORDER BY col1) AS seq
FROM Table)
DELETE Dups
WHERE seq > 1;


Plamen Ratchev
http://www.SQLStudio.com

Reply With Quote
  #3 (permalink)  
Old 08-06-2008, 06:34 PM
DaveL
 
Posts: n/a
Re: Deduping

Thank you very much
DaveL

"Plamen Ratchev" <Plamen@SQLStudio.com> wrote in message
news:2NmdncBSn-24XwTVnZ2dnUVZ_v_inZ2d@speakeasy.net...
> To check you can use something like this:
>
> SELECT col1, col2, col3, col4, col5, col6, col7, col8, col9, col10,
> COUNT(*) AS cnt
> FROM Table
> GROUP BY col1, col2, col3, col4, col5, col6, col7, col8, col9, col10
> HAVING COUNT(*) > 1;
>
> To remove duplicates, on SQL Server 2005 you can use this query:
>
> WITH Dups
> AS
> (SELECT *, ROW_NUMBER() OVER(
> PARTITION BY col1, col2, col3, col4, col5, col6, col7, col8,
> col9, col10
> ORDER BY col1) AS seq
> FROM Table)
> DELETE Dups
> WHERE seq > 1;
>
>
> Plamen Ratchev
> http://www.SQLStudio.com



Reply With Quote
  #4 (permalink)  
Old 08-27-2008, 08:25 PM
DaveL
 
Posts: n/a
Re: Deduping

The Responses gets counts and that lets me know i do have duplicates

What i really am after is the best Way to traverse the
duplicates to mark them as Rejected, i can't delete them
i also need to keep the oldest and reject the newest records

since i have a thread calling the Storedproc and this thread is a
MultiThread 1 to many

for instance thread 1 mite call rowset based on groupid
thread 1 groupid = 11111111
thread 2 groupid = 44444444
etc

each group has a set of rows 1 to 5000

the main table has millions of rows

so the query not only has the 10 columns but returns datecreated also

Im thinking a TmpFile is too much resource on the machine since its a multi
threaded application...
maybe a TmpFile named via the Threadid is better

Select all the dups into a tmp (physical) table and
work from there,

I would like to here all ideas pertaining to this
Thanks Very much
DaveL




"DaveL" <dvs_bis@sbcglobal.net> wrote in message
news:uximk.9259$vn7.3415@flpi147.ffdc.sbc.com...
> Hi Hows it going , I need a Fast way to check for duplicate rows in a
> single table
>
> the identity keys and Primary Keys dont play in this scenario
>
>
> i have like 10 columns
> Column1 thru column10 as the columns that make it a duplicate row,
>
> any help is welcome
> Thanks
>



Reply With Quote
  #5 (permalink)  
Old 08-27-2008, 08:55 PM
--CELKO--
 
Posts: n/a
Re: Deduping

>> What i really am after is the best Way to traverse [traverse? in a set oriented language?] the duplicates to mark them as Rejected, I can't delete them I also need to keep the oldest and reject the newest records [sic: rows are not records] <<

You are talking about a sequential file system and not SQL. We select
sets; we do not traverse records. Apparently, there is a column
which holds a timestamp from which we can determine which row in each
grouping is the oldest.

Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are.

If you know how, follow ISO-11179 data element naming conventions and
formatting rules. Put temporal data in ISO-8601 formats. Avoid
proprietary syntax when standard syntax is available.

Sample data is also a good idea, along with clear specifications. It
is very hard to debug code when you do not let us see it. If you want
to learn how to ask a question on a Newsgroup, look at:
http://www.catb.org/~esr/faqs/smart-questions.html


Reply With Quote
  #6 (permalink)  
Old 08-27-2008, 09:05 PM
Plamen Ratchev
 
Posts: n/a
Re: Deduping

It is not clear what your table structure is, but you can use the same
technique for removing duplicates to update:

WITH Dups
AS
(SELECT *, ROW_NUMBER() OVER(
PARTITION BY col1, col2, col3, col4, col5,
col6, col7, col8, col9, col10
ORDER BY col1) AS seq
FROM Table)
UPDATE Dups
SET status = 'Rejected'
WHERE seq > 1;

--
Plamen Ratchev
http://www.SQLStudio.com
Reply With Quote
  #7 (permalink)  
Old 08-27-2008, 09:28 PM
DaveL
 
Posts: n/a
Re: Deduping

I will do some testing with the scrips u provided,
i haveen't used the With / As clause
i'll have to work with it to understand it more

thanks alot
DaveL


"Plamen Ratchev" <Plamen@SQLStudio.com> wrote in message
news:vNWdnc78arcbLijVnZ2dnUVZ_jKdnZ2d@speakeasy.ne t...
> It is not clear what your table structure is, but you can use the same
> technique for removing duplicates to update:
>
> WITH Dups
> AS
> (SELECT *, ROW_NUMBER() OVER(
> PARTITION BY col1, col2, col3, col4, col5,
> col6, col7, col8, col9, col10
> ORDER BY col1) AS seq
> FROM Table)
> UPDATE Dups
> SET status = 'Rejected'
> WHERE seq > 1;
>
> --
> Plamen Ratchev
> http://www.SQLStudio.com



Reply With Quote
  #8 (permalink)  
Old 08-27-2008, 09:29 PM
Plamen Ratchev
 
Posts: n/a
Re: Deduping

The WITH keyword defines a Common Table Expression (available only on
SQL Server 2005/2008). It is pretty much the same as using a derived
table, with a few extra additions like reusing multiple times within the
query, stacking multiple CTEs, and recursive features.

--
Plamen Ratchev
http://www.SQLStudio.com
Reply With Quote
  #9 (permalink)  
Old 08-27-2008, 11:09 PM
DaveL
 
Posts: n/a
Re: Deduping

thank you very much
DaveL

"Plamen Ratchev" <Plamen@SQLStudio.com> wrote in message
news:vNWdncj8areDJCjVnZ2dnUVZ_jKdnZ2d@speakeasy.ne t...
> The WITH keyword defines a Common Table Expression (available only on SQL
> Server 2005/2008). It is pretty much the same as using a derived table,
> with a few extra additions like reusing multiple times within the query,
> stacking multiple CTEs, and recursive features.
>
> --
> Plamen Ratchev
> http://www.SQLStudio.com



Reply With Quote
Reply

  { mindfrost82.com } > Gadget Corner > Tech Newsgroups > Programming > Databases > General SQL Server Support


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are Off
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT. The time now is 12:03 PM.


Powered by vBulletin, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.1.0 ©2007, Crawlability, Inc.
© 1999-2008 mindfrost82.com v11.0


Sponsors:
Mortgage | Home Equity Loan | Per Insurance | Credit Card | Loans



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114