![]() |
|
|
Welcome to the { mindfrost82.com } forums. You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today! If you have any problems with the registration process or your account login, please contact contact us. |
|
|||||||
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
|
|||
|
Deduping
Hi Hows it going , I need a Fast way to check for duplicate rows in a single
table the identity keys and Primary Keys dont play in this scenario i have like 10 columns Column1 thru column10 as the columns that make it a duplicate row, any help is welcome Thanks |
|
|||
|
Re: Deduping
To check you can use something like this:
SELECT col1, col2, col3, col4, col5, col6, col7, col8, col9, col10, COUNT(*) AS cnt FROM Table GROUP BY col1, col2, col3, col4, col5, col6, col7, col8, col9, col10 HAVING COUNT(*) > 1; To remove duplicates, on SQL Server 2005 you can use this query: WITH Dups AS (SELECT *, ROW_NUMBER() OVER( PARTITION BY col1, col2, col3, col4, col5, col6, col7, col8, col9, col10 ORDER BY col1) AS seq FROM Table) DELETE Dups WHERE seq > 1; Plamen Ratchev http://www.SQLStudio.com |
|
|||
|
Re: Deduping
Thank you very much
DaveL "Plamen Ratchev" <Plamen@SQLStudio.com> wrote in message news:2NmdncBSn-24XwTVnZ2dnUVZ_v_inZ2d@speakeasy.net... > To check you can use something like this: > > SELECT col1, col2, col3, col4, col5, col6, col7, col8, col9, col10, > COUNT(*) AS cnt > FROM Table > GROUP BY col1, col2, col3, col4, col5, col6, col7, col8, col9, col10 > HAVING COUNT(*) > 1; > > To remove duplicates, on SQL Server 2005 you can use this query: > > WITH Dups > AS > (SELECT *, ROW_NUMBER() OVER( > PARTITION BY col1, col2, col3, col4, col5, col6, col7, col8, > col9, col10 > ORDER BY col1) AS seq > FROM Table) > DELETE Dups > WHERE seq > 1; > > > Plamen Ratchev > http://www.SQLStudio.com |
|
|||
|
Re: Deduping
The Responses gets counts and that lets me know i do have duplicates
What i really am after is the best Way to traverse the duplicates to mark them as Rejected, i can't delete them i also need to keep the oldest and reject the newest records since i have a thread calling the Storedproc and this thread is a MultiThread 1 to many for instance thread 1 mite call rowset based on groupid thread 1 groupid = 11111111 thread 2 groupid = 44444444 etc each group has a set of rows 1 to 5000 the main table has millions of rows so the query not only has the 10 columns but returns datecreated also Im thinking a TmpFile is too much resource on the machine since its a multi threaded application... maybe a TmpFile named via the Threadid is better Select all the dups into a tmp (physical) table and work from there, I would like to here all ideas pertaining to this Thanks Very much DaveL "DaveL" <dvs_bis@sbcglobal.net> wrote in message news:uximk.9259$vn7.3415@flpi147.ffdc.sbc.com... > Hi Hows it going , I need a Fast way to check for duplicate rows in a > single table > > the identity keys and Primary Keys dont play in this scenario > > > i have like 10 columns > Column1 thru column10 as the columns that make it a duplicate row, > > any help is welcome > Thanks > |
|
|||
|
Re: Deduping
>> What i really am after is the best Way to traverse [traverse? in a set oriented language?] the duplicates to mark them as Rejected, I can't delete them I also need to keep the oldest and reject the newest records [sic: rows are not records] <<
You are talking about a sequential file system and not SQL. We select sets; we do not traverse records. Apparently, there is a column which holds a timestamp from which we can determine which row in each grouping is the oldest. Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are. If you know how, follow ISO-11179 data element naming conventions and formatting rules. Put temporal data in ISO-8601 formats. Avoid proprietary syntax when standard syntax is available. Sample data is also a good idea, along with clear specifications. It is very hard to debug code when you do not let us see it. If you want to learn how to ask a question on a Newsgroup, look at: http://www.catb.org/~esr/faqs/smart-questions.html |
|
|||
|
Re: Deduping
It is not clear what your table structure is, but you can use the same
technique for removing duplicates to update: WITH Dups AS (SELECT *, ROW_NUMBER() OVER( PARTITION BY col1, col2, col3, col4, col5, col6, col7, col8, col9, col10 ORDER BY col1) AS seq FROM Table) UPDATE Dups SET status = 'Rejected' WHERE seq > 1; -- Plamen Ratchev http://www.SQLStudio.com |
|
|||
|
Re: Deduping
I will do some testing with the scrips u provided,
i haveen't used the With / As clause i'll have to work with it to understand it more thanks alot DaveL "Plamen Ratchev" <Plamen@SQLStudio.com> wrote in message news:vNWdnc78arcbLijVnZ2dnUVZ_jKdnZ2d@speakeasy.ne t... > It is not clear what your table structure is, but you can use the same > technique for removing duplicates to update: > > WITH Dups > AS > (SELECT *, ROW_NUMBER() OVER( > PARTITION BY col1, col2, col3, col4, col5, > col6, col7, col8, col9, col10 > ORDER BY col1) AS seq > FROM Table) > UPDATE Dups > SET status = 'Rejected' > WHERE seq > 1; > > -- > Plamen Ratchev > http://www.SQLStudio.com |
|
|||
|
Re: Deduping
The WITH keyword defines a Common Table Expression (available only on
SQL Server 2005/2008). It is pretty much the same as using a derived table, with a few extra additions like reusing multiple times within the query, stacking multiple CTEs, and recursive features. -- Plamen Ratchev http://www.SQLStudio.com |
|
|||
|
Re: Deduping
thank you very much
DaveL "Plamen Ratchev" <Plamen@SQLStudio.com> wrote in message news:vNWdncj8areDJCjVnZ2dnUVZ_jKdnZ2d@speakeasy.ne t... > The WITH keyword defines a Common Table Expression (available only on SQL > Server 2005/2008). It is pretty much the same as using a derived table, > with a few extra additions like reusing multiple times within the query, > stacking multiple CTEs, and recursive features. > > -- > Plamen Ratchev > http://www.SQLStudio.com |
![]() |
|
| Thread Tools | Search this Thread |
| Display Modes | |
|
|