![]() |
|
|
Welcome to the { mindfrost82.com } forums. You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today! If you have any problems with the registration process or your account login, please contact contact us. |
|
|||||||
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
|
|||
|
Finding duplicate records
Hello to all,
I have a small or big problem with a customer data base, where during a change of system we might have created duplicate records. This should be easy to find, you might think, but, we are talking about roughly 10000 records or less in a total volume of 1 MIO records or more. I have considered a strategy: The station ID and a field with something like a sequence number are supposed to be unique during that period. The sequence number will repeat roughly every two - four years, so in the period of three months, which is my problematic time, the sequence number (SeqNo) must be unique for each of the 40 stations (StationID) that I record in this data base table. I did run a simple 'create index' on these two fields, and, as to be expected, the routine fails, blahblah, meaning, there are duplicates. Fine. So, how can I write a query which lists to me all pairs (StationID, SeqNo) of data, where two or more rows are identically? If it makes things easier, I have a copy of that data base on a separate server, not on the live system I mean. I could strip off all data which is not related to that period of time. Can someone give to me a hint? Or a select statement? TS |
|
|||
|
Re: Finding duplicate records
On Aug 1, 2:07*pm, Thomas Arthur Seidel <thomas.arthur.sei...@gmx.de>
wrote: > Hello to all, > > I have a small or big problem with a customer data base, where during a change of > system we might have created duplicate records. This should be easy to find, you > might think, but, we are talking about roughly 10000 records or less in atotal > volume of 1 MIO records or more. > > I have considered a strategy: The station ID and a field with something like a > sequence number are supposed to be unique during that period. The sequence number > will repeat roughly every two - four years, so in the period of three months, > which is my problematic time, the sequence number (SeqNo) must be unique for each > of the 40 stations (StationID) that I record in this data base table. > > I did run a simple 'create index' on these two fields, and, as to be expected, the > routine fails, blahblah, meaning, there are duplicates. Fine. So, how canI write > a query which lists to me all pairs (StationID, SeqNo) of data, where twoor more > rows are identically? > > If it makes things easier, I have a copy of that data base on a separate server, > not on the live system I mean. I could strip off all data which is not related to > that period of time. > > Can someone give to me a hint? Or a select statement? > > TS You need to first figure out what you classify as a duplicate record? If only one field in a record is different from the next record is it a therefor not duplicate? If the only problem is a particular field over a given period, then you simply run a procedure to count the number of occurances for every ID over a given time. Or better yet run a procedure that only returns fields that have a count greater than 1. |
|
|||
|
Re: Finding duplicate records
Try this:
SELECT SeqNo, StationID, COUNT(*) FROM Table GROUP BY SeqNo, StationID HAVING COUNT(*) > 1; Plamen Ratchev http://www.SQLStudio.com |
|
|||
|
Re: Finding duplicate records
To classify what is a duplicate record: If (StationID,SeqNo) is repeating. Mainly,
if there is more then one record of the same SeqNo for one StationId. Each of the two values individually may repeat: Many records do come from the same station, and every station starts with the same sequence Number (SeqNo), it starts with "1", increments with every record, continues for years, until it is reset to "1" again some nice and warm day, when the accountant does decide this. For the period of three months, where I want to find my problem, for sure there is no regular way to have two or more transactions from the same station with a repeated SeqNo. The SeqNo must be unique for each station during this time. TS rhaazy wrote: > On Aug 1, 2:07 pm, Thomas Arthur Seidel <thomas.arthur.sei...@gmx.de> > wrote: >> Hello to all, >> >> I have a small or big problem with a customer data base, where during a change of >> system we might have created duplicate records. This should be easy to find, you >> might think, but, we are talking about roughly 10000 records or less in a total >> volume of 1 MIO records or more. >> >> I have considered a strategy: The station ID and a field with something like a >> sequence number are supposed to be unique during that period. The sequence number >> will repeat roughly every two - four years, so in the period of three months, >> which is my problematic time, the sequence number (SeqNo) must be unique for each >> of the 40 stations (StationID) that I record in this data base table. >> >> I did run a simple 'create index' on these two fields, and, as to be expected, the >> routine fails, blahblah, meaning, there are duplicates. Fine. So, how can I write >> a query which lists to me all pairs (StationID, SeqNo) of data, where two or more >> rows are identically? >> >> If it makes things easier, I have a copy of that data base on a separate server, >> not on the live system I mean. I could strip off all data which is not related to >> that period of time. >> >> Can someone give to me a hint? Or a select statement? >> >> TS > > You need to first figure out what you classify as a duplicate record? > If only one field in a record is different from the next record is it > a therefor not duplicate? If the only problem is a particular field > over a given period, then you simply run a procedure to count the > number of occurances for every ID over a given time. Or better yet > run a procedure that only returns fields that have a count greater > than 1. |
|
|||
|
Re: Finding duplicate records
Works !
Unfortunately it proved also, that my idea of duplicate records was false. TS Plamen Ratchev wrote: > Try this: > > SELECT SeqNo, StationID, COUNT(*) > FROM Table > GROUP BY SeqNo, StationID > HAVING COUNT(*) > 1; > > > Plamen Ratchev > http://www.SQLStudio.com |
![]() |
|
| Thread Tools | Search this Thread |
| Display Modes | |
|
|