Go Back   { mindfrost82.com } > Gadget Corner > Tech Newsgroups > Microsoft > MS Office > Excel

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 07-24-2008, 11:18 AM
=?Utf-8?B?RHVndWlkMQ==?=
 
Posts: n/a
Finding Similar entries in one column in another

Hi,
I am relatively new to using Excel with vast amounts of data and was after
any suggestions which could help me complete a problem that I have..

I have two columns with descriptions in (approximately 10 words). I need to
find descriptions from one column (Sheet 1,column C) which are similar to the
other column (Sheet 2,Column A).For example,if I could display the number of
words which are similar to both columns I could then find the ones with the
largest number of similar words.

For example, the two columns could look like...

Sheet1,column C....
Description
300g Fish fillets

Sheet2,column A
Description
Fish fingers 300g

This would then give the result that there were 2 similar words.

Any help would be much appreciated as to how to carry this out...

Reply With Quote
  #2 (permalink)  
Old 07-24-2008, 03:35 PM
JP
 
Posts: n/a
Re: Finding Similar entries in one column in another

How about this page?

http://www.dailydoseofexcel.com/arch...zy-text-match/


HTH,
JP

On Jul 24, 7:18*am, Duguid1 <Dugu...@discussions.microsoft.com> wrote:
> Hi,
> I am relatively new to using Excel with vast amounts of data and was after
> any suggestions which could help me complete a problem that I have..
>
> I have two columns with descriptions in (approximately 10 words). I need to
> find descriptions from one column (Sheet 1,column C) which are similar tothe
> other column (Sheet 2,Column A).For example,if I could display the *number of
> words which are similar to both columns I could then find the ones with the
> largest number of similar words.
>
> For example, the two columns could look like...
>
> Sheet1,column C....
> Description
> 300g Fish fillets
>
> Sheet2,column A
> Description
> Fish fingers 300g
>
> This would then give the result that there were 2 similar words.
>
> Any help would be much appreciated as to how to carry this out...


Reply With Quote
  #3 (permalink)  
Old 07-25-2008, 05:39 PM
Recursive Error
 
Posts: n/a
Re: Finding Similar entries in one column in another

On Jul 24, 7:18*am, Duguid1 <Dugu...@discussions.microsoft.com> wrote:
> Hi,
> I am relatively new to using Excel with vast amounts of data and was after
> any suggestions which could help me complete a problem that I have..
>
> I have two columns with descriptions in (approximately 10 words). I need to
> find descriptions fromonecolumn(Sheet 1,columnC) which aresimilarto the
> othercolumn(Sheet 2,ColumnA).For example,if I could display the *numberof
> words which aresimilarto both columns I could then find the ones with the
> largest number ofsimilarwords.
>
> For example, the two columns could look like...
>
> Sheet1,columnC....
> Description
> 300g Fish fillets
>
> Sheet2,columnA
> Description
> Fish fingers 300g
>
> This would then give the result that there were 2similarwords.
>
> Any help would be much appreciated as to how to carry this out...


Solution:

Tokenize the word lists [using custom VBA -- see footnote -- or a
series of FIND functions] so you have one word per column.

Now, each row in list one needs a search result for each row in list
2.

This will return the hit count of words in List1 found in each row of
List2 (array formula)

{=SUM(COUNTIF(OFFSET(List2,J$1-1,0),($B2:$G2)))}

List2 is a named range that points to /only the first row/ of the
tokenized list of words from the second list.

The OFFSET piece increments which row is actually scanned. J$1 etc.
contains the row offset. 1 means scan the first row of List2. 2 means
scan the second row if List2, etc.

$B2:$G2 is the tokenized word list for List1.

Don't forget it's an array formula! (Ctrl+Shift+Enter).

Footnote : I can post the VBA code module and a workbook that
demonstrates how the formulas work on my website later today if you
like. The VBA code is not strictly necessary, but it makes tokenizing
the words much easier.
Reply With Quote
  #4 (permalink)  
Old 07-25-2008, 09:58 PM
smartin
 
Posts: n/a
Re: Finding Similar entries in one column in another

Recursive Error wrote:
> On Jul 24, 7:18 am, Duguid1 <Dugu...@discussions.microsoft.com> wrote:
>> Hi,
>> I am relatively new to using Excel with vast amounts of data and was after
>> any suggestions which could help me complete a problem that I have..
>>
>> I have two columns with descriptions in (approximately 10 words). I need to
>> find descriptions fromonecolumn(Sheet 1,columnC) which aresimilarto the
>> othercolumn(Sheet 2,ColumnA).For example,if I could display the number of
>> words which aresimilarto both columns I could then find the ones with the
>> largest number ofsimilarwords.
>>
>> For example, the two columns could look like...
>>
>> Sheet1,columnC....
>> Description
>> 300g Fish fillets
>>
>> Sheet2,columnA
>> Description
>> Fish fingers 300g
>>
>> This would then give the result that there were 2similarwords.
>>
>> Any help would be much appreciated as to how to carry this out...

>
> Solution:
>
> Tokenize the word lists [using custom VBA -- see footnote -- or a
> series of FIND functions] so you have one word per column.
>
> Now, each row in list one needs a search result for each row in list
> 2.
>
> This will return the hit count of words in List1 found in each row of
> List2 (array formula)
>
> {=SUM(COUNTIF(OFFSET(List2,J$1-1,0),($B2:$G2)))}
>
> List2 is a named range that points to /only the first row/ of the
> tokenized list of words from the second list.
>
> The OFFSET piece increments which row is actually scanned. J$1 etc.
> contains the row offset. 1 means scan the first row of List2. 2 means
> scan the second row if List2, etc.
>
> $B2:$G2 is the tokenized word list for List1.
>
> Don't forget it's an array formula! (Ctrl+Shift+Enter).
>
> Footnote : I can post the VBA code module and a workbook that
> demonstrates how the formulas work on my website later today if you
> like. The VBA code is not strictly necessary, but it makes tokenizing
> the words much easier.


I have posted a VBA-free version of a workbook that demonstrates the
solution. The code module that contains the handy tokenizing function
(called "GetString") is offered separately for your inspection before use.

http://vfdrake.home.comcast.net/file...word_count.xls
http://vfdrake.home.comcast.net/file...t_handlers.bas

[Yes, I am Recursive Error too...]

Enjoy.
Reply With Quote
  #5 (permalink)  
Old 07-26-2008, 09:17 PM
smartin
 
Posts: n/a
Re: Finding Similar entries in one column in another

smartin wrote:
> Recursive Error wrote:
>> On Jul 24, 7:18 am, Duguid1 <Dugu...@discussions.microsoft.com> wrote:
>>> Hi,
>>> I am relatively new to using Excel with vast amounts of data and was
>>> after
>>> any suggestions which could help me complete a problem that I have..
>>>
>>> I have two columns with descriptions in (approximately 10 words). I
>>> need to
>>> find descriptions fromonecolumn(Sheet 1,columnC) which aresimilarto the
>>> othercolumn(Sheet 2,ColumnA).For example,if I could display the
>>> number of
>>> words which aresimilarto both columns I could then find the ones with
>>> the
>>> largest number ofsimilarwords.
>>>
>>> For example, the two columns could look like...
>>>
>>> Sheet1,columnC....
>>> Description
>>> 300g Fish fillets
>>>
>>> Sheet2,columnA
>>> Description
>>> Fish fingers 300g
>>>
>>> This would then give the result that there were 2similarwords.
>>>
>>> Any help would be much appreciated as to how to carry this out...

>>
>> Solution:
>>
>> Tokenize the word lists [using custom VBA -- see footnote -- or a
>> series of FIND functions] so you have one word per column.
>>
>> Now, each row in list one needs a search result for each row in list
>> 2.
>>
>> This will return the hit count of words in List1 found in each row of
>> List2 (array formula)
>>
>> {=SUM(COUNTIF(OFFSET(List2,J$1-1,0),($B2:$G2)))}
>>
>> List2 is a named range that points to /only the first row/ of the
>> tokenized list of words from the second list.
>>
>> The OFFSET piece increments which row is actually scanned. J$1 etc.
>> contains the row offset. 1 means scan the first row of List2. 2 means
>> scan the second row if List2, etc.
>>
>> $B2:$G2 is the tokenized word list for List1.
>>
>> Don't forget it's an array formula! (Ctrl+Shift+Enter).
>>
>> Footnote : I can post the VBA code module and a workbook that
>> demonstrates how the formulas work on my website later today if you
>> like. The VBA code is not strictly necessary, but it makes tokenizing
>> the words much easier.

>
> I have posted a VBA-free version of a workbook that demonstrates the
> solution. The code module that contains the handy tokenizing function
> (called "GetString") is offered separately for your inspection before use.
>
> http://vfdrake.home.comcast.net/file...word_count.xls
> http://vfdrake.home.comcast.net/file...t_handlers.bas
>
> [Yes, I am Recursive Error too...]
>
> Enjoy.


You will find the solution I gave returns undesired results when the
lists have varying word counts--which is likely. This is because any
blank cells in the tokenized List1 will match blank cells in the
tokenized List2. The following formula will return only word matches,
excluding the blank cells:

{=SUM(COUNTIF(OFFSET(List2,J$1-1,0),($B2:$G2)))-SUM(COUNTIF(OFFSET(List2,J$1-1,0),""))*SUM(COUNTIF($B2:$G2,""))}
Reply With Quote
Reply

  { mindfrost82.com } > Gadget Corner > Tech Newsgroups > Microsoft > MS Office > Excel


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT. The time now is 05:27 AM.


Powered by vBulletin, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.1.0 ©2007, Crawlability, Inc.
© 1999-2008 mindfrost82.com v11.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109