C++Talk.NET Forum Index C++Talk.NET
C++ language newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

strings sorting

 
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ Language (Moderated)
View previous topic :: View next topic  
Author Message
Raghavendra Mahuli
Guest





PostPosted: Wed Aug 18, 2004 6:22 pm    Post subject: strings sorting Reply with quote



Hi,
I have many strings. I have to sort them. But sorting is not according to
ascii but according to a different format (which keeps varying) . So i can
sort them based on sort-order. It is not much of a problem.

But the complex part is "equivalent-values". it can be defined that "ai" and
"ae" are equivalent-values. Then, while sorting two strings, i have to
consider that "ai" and "ae" are equal......only if they come at same
position in both the strings....
For EX1:
str1 - dfahaesdhfkj
str2 - dfahaisds
In str1, ae appears at position 4 and 5
In str2, ai appears also at position 4 and 5....

So they are equivalent.....And i can replace ae in str1 with ai......


EX2:
str3 - haehasdg
str4 - hhsdaiskdn
In str3, ae appears at position 1 and 2
In str4, ai appears also at position 4 and 5....
So here the rule "equivalent -values" does not apply......So i cant replace
"ae" in str3 with "ai".......


In short, the "equivalent-values" applies only when the
"position-matches"...

To add to that such equivalent values are many......
So can you please suggest me some way of solving this problem ....

regards,
Raghavendra



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Back to top
John Dibling
Guest





PostPosted: Fri Aug 20, 2004 12:29 am    Post subject: Re: strings sorting Reply with quote




It seems to me that the fundamental problem is the fact that character
groups like "ae" are in fact a single token, but you have them stored
as 2 tokens; e.g., 2 individual chars. This might be considered a
design flaw. That is to say, the "e" in "ae" has no meaning of its
own. Neither does the "a". Only when taken as a group -- "ae" --
does the token have meaning. If you had strings of tokens, rather
than strings of chars, you could use STL's built-in sorting tools with
little trouble. Moreover, a valid token could possibly consist of
just "a" or just "e" as well. So now we have a list of 3 different
valid tokens: "a", "e", "ae". It might also be possible that you
could have one token "a" followed immediately by another token "e",
which would be completely different that a single token "ae". I don't
know much about your particular situation, but this seems in the realm
of possibility. If this is the case, you are going to have a lot of
difficultly knowing "a" and "e" from "ae" unless you have stored some
of the original context.

If my analysis is correct, I think you have at least 3 possibilities.
First, change the way you store your strings. Instead of storing
strings of chars, store strings of tokens. You might use
vector<token> to accomplish this instead of string, where you define
what token is. Doing this, you can use sort() in the most intuitive
way, using a sort functor of your own devising.

The first solution is likely to be impossible for you, since it is a
fundamental change to what is possibly a low-level data structure.
Another solution would be to take a 3-step approach to sorting the
strings. Step 1: copy the source strings to a temporary collection
of tokens, using a hand-rolled transform()-like algorithm. You could
write your transform() to intelligently pick whole tokens from the
source string, rather than just single chars. Step 2: sort() the
temporary collection of tokens. Step 3: "flatten" the temporary
collection of tokens back to the source collection, replacing the
original contents. This would have the effect you are looking for,
with no architectural implications for the rest of your system. The
main downfall is the obvious inefficiency of doing 2 copies in
addition to the sort you have to do anyway. If you don't care about
this inefficiency (perhaps because you don't have to constantly be
sorting), this would probably be the easiest solution to implement.

The third solution avoids the inefficiencies above, but is hard to
implement. Write your own sort() routine. You could write a version
of sort() that sends the "current" and "last" iterator to the sorting
functor, rather than a reference to the actual value. Or you could
build the buisness logic of identifying tokens right in to sort().

- John


On 18 Aug 2004 14:22:25 -0400, "Raghavendra Mahuli"
<raghavendra.ma (AT) in (DOT) bosch.com> wrote:

Quote:
Hi,
I have many strings. I have to sort them. But sorting is not according to
ascii but according to a different format (which keeps varying) . So i can
sort them based on sort-order. It is not much of a problem.

But the complex part is "equivalent-values". it can be defined that "ai" and
"ae" are equivalent-values. Then, while sorting two strings, i have to
consider that "ai" and "ae" are equal......only if they come at same
position in both the strings....
For EX1:
str1 - dfahaesdhfkj
str2 - dfahaisds
In str1, ae appears at position 4 and 5
In str2, ai appears also at position 4 and 5....

So they are equivalent.....And i can replace ae in str1 with ai......


EX2:
str3 - haehasdg
str4 - hhsdaiskdn
In str3, ae appears at position 1 and 2
In str4, ai appears also at position 4 and 5....
So here the rule "equivalent -values" does not apply......So i cant replace
"ae" in str3 with "ai".......


In short, the "equivalent-values" applies only when the
"position-matches"...

To add to that such equivalent values are many......
So can you please suggest me some way of solving this problem ....

regards,
Raghavendra


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Daniel K. O.
Guest





PostPosted: Fri Aug 20, 2004 2:07 pm    Post subject: Re: strings sorting Reply with quote



John Dibling wrote:
Quote:
First, change the way you store your strings. Instead of storing
strings of chars, store strings of tokens. You might use
vector<token> to accomplish this instead of string, where you define
what token is. Doing this, you can use sort() in the most intuitive
way, using a sort functor of your own devising.

Isn't easier then to use:

---
struct Token { /* ... */ };
typedef basic_string<Token> Tstring;
---

?


Daniel K. O.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
kanze@gabi-soft.fr
Guest





PostPosted: Fri Aug 20, 2004 2:42 pm    Post subject: Re: strings sorting Reply with quote

"Raghavendra Mahuli" <raghavendra.ma (AT) in (DOT) bosch.com> wrote


Quote:
I have many strings. I have to sort them. But sorting is not according
to ascii but according to a different format (which keeps varying) .
So i can sort them based on sort-order. It is not much of a problem.

But the complex part is "equivalent-values". it can be defined that
"ai" and "ae" are equivalent-values. Then, while sorting two strings,
i have to consider that "ai" and "ae" are equal......only if they come
at same position in both the strings....

For EX1:
str1 - dfahaesdhfkj
str2 - dfahaisds
In str1, ae appears at position 4 and 5
In str2, ai appears also at position 4 and 5....

So they are equivalent.....And i can replace ae in str1 with ai......

First you talked about sorting, now you are speaking about replacing.
Do you only want to sort, or do you want to modify the strings as well?

Quote:
EX2:
str3 - haehasdg
str4 - hhsdaiskdn
In str3, ae appears at position 1 and 2
In str4, ai appears also at position 4 and 5....
So here the rule "equivalent -values" does not apply......So i cant replace
"ae" in str3 with "ai".......

In short, the "equivalent-values" applies only when the
"position-matches"...

To add to that such equivalent values are many......
So can you please suggest me some way of solving this problem ....

For sorting, the solution is relatively simple, since you can pass your
own comparison routine to std::sort. In your sorting routing, of course,
you'll need some sort of special handling for the special cases. One
classical approch is to generate canonical forms for each string, and
compare them. If you do this a lot, it might be worthwhile to maintain
the canonical representations along side of the original string, rather
than regenerate them each time.

--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
John Dibling
Guest





PostPosted: Sat Aug 21, 2004 3:26 am    Post subject: Re: strings sorting Reply with quote

On 20 Aug 2004 10:42:59 -0400, [email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:

Quote:
For sorting, the solution is relatively simple, since you can pass your
own comparison routine to std::sort. In your sorting routing, of course,
you'll need some sort of special handling for the special cases.

This won't necesarrily work for at least a couple of reasons.

First, the sort comparator takes only 2 paramaters. The OP did not
state how many individual chars make up a single token. It may be 2,
it may be 5.

Second, the standard doesn't guarantee that the first paramater sent
to the comparator occurs first in the sequence, or the second appears
after the first. All the standard says is that the comparator should
compare the 2 and return true if the first is less than the second.

Consider the case where 'aaa' and 'aab' compare equal, and 'aac'
precedes both of them. There is no way to use STL's built-in sort
algorithm and a custom-built comparator to sort these tokens correctly
if the strings are strings of individual chars, and not strings of
atomic tokens.

- John


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
John Dibling
Guest





PostPosted: Sat Aug 21, 2004 3:28 am    Post subject: Re: strings sorting Reply with quote

On 20 Aug 2004 10:07:50 -0400, "Daniel K. O." <danielko1 (AT) bol (DOT) com.br>
wrote:

Quote:
Isn't easier then to use:

---
struct Token { /* ... */ };
typedef basic_string<Token> Tstring;
---

?


Yeah, maybe it is. But I'm personally more comfortable with
vector<Token>, only becasue Iv'e never used basic_string over anything
that isn't like a char. But Token is probably at least an aggregate of
some kind, maybe even more. I wonder how basic_string<Token>::c_str()
would behave, for instance. Interesting question...

- John


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
kanze@gabi-soft.fr
Guest





PostPosted: Mon Aug 23, 2004 10:14 pm    Post subject: Re: strings sorting Reply with quote

John Dibling <jdibling (AT) computer (DOT) org> wrote


Quote:
On 20 Aug 2004 10:42:59 -0400, [email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:

For sorting, the solution is relatively simple, since you can pass
your own comparison routine to std::sort. In your sorting routing, of
course, you'll need some sort of special handling for the special
cases.

This won't necesarrily work for at least a couple of reasons.

First, the sort comparator takes only 2 paramaters.

Yes. Two std::string, in the case in question.

Quote:
The OP did not state how many individual chars make up a single token.
It may be 2, it may be 5.

True, but you're not sorting tokens, you're sorting strings.

Quote:
Second, the standard doesn't guarantee that the first paramater sent
to the comparator occurs first in the sequence, or the second appears
after the first. All the standard says is that the comparator should
compare the 2 and return true if the first is less than the second.

So where is the problem.

Quote:
Consider the case where 'aaa' and 'aab' compare equal, and 'aac'
precedes both of them. There is no way to use STL's built-in sort
algorithm and a custom-built comparator to sort these tokens correctly
if the strings are strings of individual chars, and not strings of
atomic tokens.

Again, when sorting strings, the comparison function is passed two
strings. The sort routing can do pretty much anything it wants with
these two strings, provided the results are consistent, and define an
ordering relationship. (A function where "aab" < "aac", "aac" < "aaa"
and "aaa" < "aab", for example, won't cut it.)

--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
kanze@gabi-soft.fr
Guest





PostPosted: Mon Aug 23, 2004 10:15 pm    Post subject: Re: strings sorting Reply with quote

John Dibling <jdibling (AT) computer (DOT) org> wrote


Quote:
On 20 Aug 2004 10:07:50 -0400, "Daniel K. O." <danielko1 (AT) bol (DOT) com.br
wrote:

Isn't easier then to use:

---
struct Token { /* ... */ };
typedef basic_string ---

?

Yeah, maybe it is. But I'm personally more comfortable with
vector<Token>, only becasue Iv'e never used basic_string over anything
that isn't like a char. But Token is probably at least an aggregate of
some kind, maybe even more. I wonder how basic_string<Token>::c_str()
would behave, for instance. Interesting question...

It should append Token() to the end, and return a pointer to the array.
This probably won't mean anything to a C program, but then, nor would
basic_string<Token>::data(), even if you gave it the length.

If the function doen't mean anything, just don't use it:-).

I've posted a few problems in another posting. One additional problem
is that one very wide spread implementation doesn't ensure correct
alignment for anything bigger than a size_t. Probably not a problem
with Token, but you never know. On the other hand, basic_string<double>
is good for core dumps. I reported the bug, but I did suggest that it
wasn't of the highest priority:-).

--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
kanze@gabi-soft.fr
Guest





PostPosted: Mon Aug 23, 2004 10:15 pm    Post subject: Re: strings sorting Reply with quote

"Daniel K. O." <danielko1 (AT) bol (DOT) com.br> wrote

Quote:
John Dibling wrote:

First, change the way you store your strings. Instead of storing
strings of chars, store strings of tokens. You might use
vector<token> to accomplish this instead of string, where you define
what token is. Doing this, you can use sort() in the most intuitive
way, using a sort functor of your own devising.

Isn't easier then to use:

---
struct Token { /* ... */ };
typedef basic_string<Token> Tstring;
---

?

I doubt it. First, of course, Token must be a POD for this to even
work. Secondly, there is no char_traits< Token > ; you'll have to
specialize for it. And of course, there are a couple of indirectly
documented requirements, like the fact that Token() must correspond to
something which you can use in the place of ''.

--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Allan W
Guest





PostPosted: Mon Aug 23, 2004 11:03 pm    Post subject: Re: strings sorting Reply with quote

Quote:
On 20 Aug 2004 10:42:59 -0400, [email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
For sorting, the solution is relatively simple, since you can pass your
own comparison routine to std::sort. In your sorting routing, of course,
you'll need some sort of special handling for the special cases.

John Dibling <jdibling (AT) computer (DOT) org> wrote
Quote:
This won't necesarrily work for at least a couple of reasons.

First, the sort comparator takes only 2 paramaters. The OP did not
state how many individual chars make up a single token. It may be 2,
it may be 5.

There's a lot that the OP did not state. I think that until we hear
differently, we have to assume that the OP's code can parse the
tokens.

What the OP is trying to do, is to translate character strings into
modified character strings that sort correctly. I do believe that this
can be made to work, but there's a much simpler method.

Quote:
Second, the standard doesn't guarantee that the first paramater sent
to the comparator occurs first in the sequence, or the second appears
after the first. All the standard says is that the comparator should
compare the 2 and return true if the first is less than the second.

Which shouldn't be a problem if the OP knows how to compare strings.

Quote:
Consider the case where 'aaa' and 'aab' compare equal, and 'aac'
precedes both of them. There is no way to use STL's built-in sort
algorithm and a custom-built comparator to sort these tokens correctly
if the strings are strings of individual chars, and not strings of
atomic tokens.

Just have the compare utility decide which is less.

Going back to the case where ae should equal ai (but what happens to
af and ag and ah -- do they compare as greater than ae or less than ai?),
we might end up with something like this:

// Untested
bool isLess(const char*lhs, const char*rhs) {
for(;Wink {
while (*lhs == *rhs && *lhs)
++lhs, ++rhs;
if (''==*lhs) return true; // Short strings are lower
if (''==*rhs) return false;
// We have found a difference that is NOT at the end.
// Is it letters that we consider to be the same?
if (('e'!=*lhs && 'i'!=*lhs) || ('e'!=*rhs && 'i'!=*rhs))
// They're not considered the same, so it's a "real" difference
return *lhs < *rhs;

// Move to next character
++lhs, ++rhs;
} }

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Display posts from previous:   
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ Language (Moderated) All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.