C++Talk.NET Forum Index C++Talk.NET
C++ language newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

IsAnagram

 
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C Language
View previous topic :: View next topic  
Author Message
Martin
Guest





PostPosted: Thu May 10, 2007 7:36 am    Post subject: IsAnagram Reply with quote



Hi all!

I am looking for the best algorithm to check if two words are
anagrams. Here the signature of the method :

bool IsAnagram(char[] wordOne, char[] wordTwo)

Here are the strategies I am thinking of :
- Brute-Force (One loop inside the other one) O(n^2)
- Sort both words and compare both arrays. O(n^2) considering that I
use BubbleSort and words are both short (less than 30 characters).
- Find a way to create a signature for each word (e.g. A = 1, B = 2,
etc.) However, I can have a problem here... The sum for both words can
be the same with different letters.

So I am looking for the most efficient algorithm. Do you have an idea?
Thank you!
Back to top
Peter Nilsson
Guest





PostPosted: Thu May 10, 2007 7:53 am    Post subject: Re: IsAnagram Reply with quote



Martin <martinga...@gmail.com> wrote:
Quote:
I am looking for the best algorithm to check if two words
are anagrams.

Do you have a question about the C language?

Quote:
Here the signature of the method :

bool IsAnagram(char[] wordOne, char[] wordTwo)

I see you do, although you don't realise it. The signature in
C would be closer to...

bool IsAnagram(char wordOne[], char wordTwo[])

Note that bool is not standard in C90.

Quote:
Here are the strategies I am thinking of :
- Brute-Force (One loop inside the other one) O(n^2)

How would this work in practice?

Quote:
- Sort both words and compare both arrays. O(n^2)
considering that I use BubbleSort and words are both
short (less than 30 characters).

A simple binary tree would be quicker.

Quote:
- Find a way to create a signature for each word
(e.g. A = 1, B = 2, etc.) However, I can have a problem
here... The sum for both words can be the same with
different letters.

You could use the first 26 prime numbers and a bignum
library...

sig = 1;
if (*p == 'a') mul(sig, 2);
if (*p == 'b') mul(sig, 3);
if (*p == 'c') mul(sig, 5);
...

Quote:
So I am looking for the most efficient algorithm.
Do you have an idea?

Assume you're only considering 26 letters. Create an
array of counts for each letter. Do the same for both
words and compare the arrays. O(N).

If you have a problem implementing that in C, then
feel free to ask a C question. If you have problems
with the algorithm, please post in a more appropriate
newsgroup.

--
Peter
Back to top
CBFalconer
Guest





PostPosted: Thu May 10, 2007 9:11 am    Post subject: Re: IsAnagram Reply with quote



Martin wrote:
Quote:

I am looking for the best algorithm to check if two words are
anagrams. Here the signature of the method :

bool IsAnagram(char[] wordOne, char[] wordTwo)

Here are the strategies I am thinking of :
- Brute-Force (One loop inside the other one) O(n^2)
- Sort both words and compare both arrays. O(n^2) considering that I
use BubbleSort and words are both short (less than 30 characters).
- Find a way to create a signature for each word (e.g. A = 1, B = 2,
etc.) However, I can have a problem here... The sum for both words
can be the same with different letters.

So I am looking for the most efficient algorithm. Do you have an idea?

Sort both sounds like the most efficient. You should probably use
a simple sort such as shell sort, which is much faster than bubble,
and is better than O(N*N). If one argument comes from a table then
it can be stored in sorted order.

Bear in mind that you are not passing arrays, you are passing
pointers to the first item in the array. So you need to copy the
array first. Assuming they are strings (zero terminated) you will
find the lengths in the process.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>
<http://www.aaxnet.com/editor/edit043.html>
<http://kadaitcha.cx/vista/dogsbreakfast/index.html>
cbfalconer at maineline dot net


--
Posted via a free Usenet account from http://www.teranews.com
Back to top
Richard Bos
Guest





PostPosted: Thu May 10, 2007 9:11 am    Post subject: Re: IsAnagram Reply with quote

CBFalconer <cbfalconer (AT) yahoo (DOT) com> wrote:

Quote:
Martin wrote:

I am looking for the best algorithm to check if two words are
anagrams. Here the signature of the method :

- Sort both words and compare both arrays. O(n^2) considering that I
use BubbleSort and words are both short (less than 30 characters).
- Find a way to create a signature for each word (e.g. A = 1, B = 2,
etc.) However, I can have a problem here... The sum for both words
can be the same with different letters.

Sort both sounds like the most efficient. You should probably use
a simple sort such as shell sort, which is much faster than bubble,
and is better than O(N*N).

For a max of 30 letters? Insertion or selection. At that size it doesn't
pay to be complicated.

Alternatively, do create a signature, with unsigned long longs, using
not addition but multiplication. If the signatures don't match, they're
not anagrams. Since this is the most usual case, you win a little. If
they _do_ match, fall back to the sorting solution. Yes, the signature
may wrap around for large words, but that's not going to matter because
you need the final check anyway, and all large anagrams are going to
wrap around to the same signature. You do need _unsigned_ integers for
this, mind you. And do check whether it's actually faster or not.

And yet another: if you're going to do great numbers of these checks,
consider using a database. For each word you're being asked to check,
check to see if it's already in the DB. If not, sort it _once_, then add
it to the DB - next time you don't need to sort, you can just fetch the
sorted version from the DB. If you hit two words you've already seen
before, all you need is two DB checks and a single compare. Drawback:
for this to save you anything, you really do need _lots_ of checks, and
a good database.

Richard
Back to top
Chris Dollin
Guest





PostPosted: Thu May 10, 2007 9:11 am    Post subject: Re: IsAnagram Reply with quote

Martin wrote:

Quote:
I am looking for the best algorithm to check if two words are
anagrams. Here the signature of the method :

bool IsAnagram(char[] wordOne, char[] wordTwo)

Assuming you're going to write this algorithm in C (why else post
/here/?), a couple of points to watch out for:

(a) C doesn't have a `bool` type by default. C90 -- the older but
widely supported standard -- doesn't have it at all; C99 --
the spanking-new [1] but spottily supported standard -- only
has it if you #include <stdbool.h>.

(b) It's `char wordOne[]`, not `char [] wordOne`; C is not Java.

(c) Actually `char wordOne[]` /as a function argument/ means
`char *wordOne`, ie a pointer to one (or more) character(s).
When you try to pass an array as an argument [2] it decays
into a pointer to its first element.

(d) It follows that once you're inside `isAnagram` [3] there's
no way to know /how long the words are/ without extra
information. If the words are C strings, you're OK, because
they will end with the nul character (the one with value 0).
If not, you'll have to smuggle the length in somehow -- an
extra argument or two will do. (Of course, if their lengths
are different, they're not anagrams.)

I have no advice on the algorithm, since that's not a C question;
comp.programming's perhaps the place for that.

[1] Hell's teeth, eight years already?

[2] Or use it in all but a couple of specialised contexts.

[3] I've gratuitously decapitalised your function name because
my Style Rules reserve initial caps for (some) #defines and
type names -- function names, not so. Your stylage may vary.

--
`x.sort = y.sort` Hedgehog
"It took a very long time, much longer than the most generous estimates."
- James White, /Sector General/
Back to top
Jack Klein
Guest





PostPosted: Thu May 10, 2007 9:11 am    Post subject: Re: IsAnagram Reply with quote

On 9 May 2007 19:36:45 -0700, Martin <martingagne (AT) gmail (DOT) com> wrote in
comp.lang.c:

Quote:
Hi all!

I am looking for the best algorithm to check if two words are
anagrams. Here the signature of the method :

If you're looking for an algorithm, you should ask in an algorithm
group. news:comp.programming comes to mind.

Quote:
bool IsAnagram(char[] wordOne, char[] wordTwo)

Here are the strategies I am thinking of :
- Brute-Force (One loop inside the other one) O(n^2)
- Sort both words and compare both arrays. O(n^2) considering that I
use BubbleSort and words are both short (less than 30 characters).
- Find a way to create a signature for each word (e.g. A = 1, B = 2,
etc.) However, I can have a problem here... The sum for both words can
be the same with different letters.

So I am looking for the most efficient algorithm. Do you have an idea?
Thank you!

There is nothing at all about C in your question. An algorithm is
independent of the programming language used to implement it.

Once you have decided on an algorithm, if you have difficulty writing
correct standard C code to implement it, post your problem code here
and ask for help.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.club.cc.cmu.edu/~ajo/docs/FAQ-acllc.html
Back to top
Chris Dollin
Guest





PostPosted: Fri May 11, 2007 9:12 am    Post subject: Re: IsAnagram Reply with quote

Peter Nilsson wrote:

Quote:
Chris Dollin <chris.dol...@hp.com> wrote:
Joachim Schmitz wrote:
wouldn't isAnagram invade the standard namespace?

(fx:shock)
(fx:peek)

No, because `A` isn't a lower-case letter.

(fx:mops-brow)

But C90 needn't be case sensitive with respect to linkage of
external identifiers. I've yet to see one that wasn't, but C90
still allows such implementations.

Heavens, you're right. Also there's the dread 6-character limit.

Bummer.

Well ... hm. I wouldn't use such an implementation unless there
were a gun-like to my head-like.

Quote:
Try is_anagram.

I'm leaning towards `iz` ... somebody stop me.

--
I have an apt signature for that, but this isn't it.

Hewlett-Packard Limited registered office: Cain Road, Bracknell,
registered no: 690597 England Berks RG12 1HN
Back to top
Display posts from previous:   
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C Language All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.