 |
C++Talk.NET C++ language newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
Raghavendra Mahuli Guest
|
Posted: Wed Aug 18, 2004 6:22 pm Post subject: strings sorting |
|
|
Hi,
I have many strings. I have to sort them. But sorting is not according to
ascii but according to a different format (which keeps varying) . So i can
sort them based on sort-order. It is not much of a problem.
But the complex part is "equivalent-values". it can be defined that "ai" and
"ae" are equivalent-values. Then, while sorting two strings, i have to
consider that "ai" and "ae" are equal......only if they come at same
position in both the strings....
For EX1:
str1 - dfahaesdhfkj
str2 - dfahaisds
In str1, ae appears at position 4 and 5
In str2, ai appears also at position 4 and 5....
So they are equivalent.....And i can replace ae in str1 with ai......
EX2:
str3 - haehasdg
str4 - hhsdaiskdn
In str3, ae appears at position 1 and 2
In str4, ai appears also at position 4 and 5....
So here the rule "equivalent -values" does not apply......So i cant replace
"ae" in str3 with "ai".......
In short, the "equivalent-values" applies only when the
"position-matches"...
To add to that such equivalent values are many......
So can you please suggest me some way of solving this problem ....
regards,
Raghavendra
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
John Dibling Guest
|
Posted: Fri Aug 20, 2004 12:29 am Post subject: Re: strings sorting |
|
|
It seems to me that the fundamental problem is the fact that character
groups like "ae" are in fact a single token, but you have them stored
as 2 tokens; e.g., 2 individual chars. This might be considered a
design flaw. That is to say, the "e" in "ae" has no meaning of its
own. Neither does the "a". Only when taken as a group -- "ae" --
does the token have meaning. If you had strings of tokens, rather
than strings of chars, you could use STL's built-in sorting tools with
little trouble. Moreover, a valid token could possibly consist of
just "a" or just "e" as well. So now we have a list of 3 different
valid tokens: "a", "e", "ae". It might also be possible that you
could have one token "a" followed immediately by another token "e",
which would be completely different that a single token "ae". I don't
know much about your particular situation, but this seems in the realm
of possibility. If this is the case, you are going to have a lot of
difficultly knowing "a" and "e" from "ae" unless you have stored some
of the original context.
If my analysis is correct, I think you have at least 3 possibilities.
First, change the way you store your strings. Instead of storing
strings of chars, store strings of tokens. You might use
vector<token> to accomplish this instead of string, where you define
what token is. Doing this, you can use sort() in the most intuitive
way, using a sort functor of your own devising.
The first solution is likely to be impossible for you, since it is a
fundamental change to what is possibly a low-level data structure.
Another solution would be to take a 3-step approach to sorting the
strings. Step 1: copy the source strings to a temporary collection
of tokens, using a hand-rolled transform()-like algorithm. You could
write your transform() to intelligently pick whole tokens from the
source string, rather than just single chars. Step 2: sort() the
temporary collection of tokens. Step 3: "flatten" the temporary
collection of tokens back to the source collection, replacing the
original contents. This would have the effect you are looking for,
with no architectural implications for the rest of your system. The
main downfall is the obvious inefficiency of doing 2 copies in
addition to the sort you have to do anyway. If you don't care about
this inefficiency (perhaps because you don't have to constantly be
sorting), this would probably be the easiest solution to implement.
The third solution avoids the inefficiencies above, but is hard to
implement. Write your own sort() routine. You could write a version
of sort() that sends the "current" and "last" iterator to the sorting
functor, rather than a reference to the actual value. Or you could
build the buisness logic of identifying tokens right in to sort().
- John
On 18 Aug 2004 14:22:25 -0400, "Raghavendra Mahuli"
<raghavendra.ma (AT) in (DOT) bosch.com> wrote:
| Quote: | Hi,
I have many strings. I have to sort them. But sorting is not according to
ascii but according to a different format (which keeps varying) . So i can
sort them based on sort-order. It is not much of a problem.
But the complex part is "equivalent-values". it can be defined that "ai" and
"ae" are equivalent-values. Then, while sorting two strings, i have to
consider that "ai" and "ae" are equal......only if they come at same
position in both the strings....
For EX1:
str1 - dfahaesdhfkj
str2 - dfahaisds
In str1, ae appears at position 4 and 5
In str2, ai appears also at position 4 and 5....
So they are equivalent.....And i can replace ae in str1 with ai......
EX2:
str3 - haehasdg
str4 - hhsdaiskdn
In str3, ae appears at position 1 and 2
In str4, ai appears also at position 4 and 5....
So here the rule "equivalent -values" does not apply......So i cant replace
"ae" in str3 with "ai".......
In short, the "equivalent-values" applies only when the
"position-matches"...
To add to that such equivalent values are many......
So can you please suggest me some way of solving this problem ....
regards,
Raghavendra
|
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Daniel K. O. Guest
|
Posted: Fri Aug 20, 2004 2:07 pm Post subject: Re: strings sorting |
|
|
John Dibling wrote:
| Quote: | First, change the way you store your strings. Instead of storing
strings of chars, store strings of tokens. You might use
vector<token> to accomplish this instead of string, where you define
what token is. Doing this, you can use sort() in the most intuitive
way, using a sort functor of your own devising.
|
Isn't easier then to use:
---
struct Token { /* ... */ };
typedef basic_string<Token> Tstring;
---
?
Daniel K. O.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Fri Aug 20, 2004 2:42 pm Post subject: Re: strings sorting |
|
|
"Raghavendra Mahuli" <raghavendra.ma (AT) in (DOT) bosch.com> wrote
| Quote: | I have many strings. I have to sort them. But sorting is not according
to ascii but according to a different format (which keeps varying) .
So i can sort them based on sort-order. It is not much of a problem.
But the complex part is "equivalent-values". it can be defined that
"ai" and "ae" are equivalent-values. Then, while sorting two strings,
i have to consider that "ai" and "ae" are equal......only if they come
at same position in both the strings....
For EX1:
str1 - dfahaesdhfkj
str2 - dfahaisds
In str1, ae appears at position 4 and 5
In str2, ai appears also at position 4 and 5....
So they are equivalent.....And i can replace ae in str1 with ai......
|
First you talked about sorting, now you are speaking about replacing.
Do you only want to sort, or do you want to modify the strings as well?
| Quote: | EX2:
str3 - haehasdg
str4 - hhsdaiskdn
In str3, ae appears at position 1 and 2
In str4, ai appears also at position 4 and 5....
So here the rule "equivalent -values" does not apply......So i cant replace
"ae" in str3 with "ai".......
In short, the "equivalent-values" applies only when the
"position-matches"...
To add to that such equivalent values are many......
So can you please suggest me some way of solving this problem ....
|
For sorting, the solution is relatively simple, since you can pass your
own comparison routine to std::sort. In your sorting routing, of course,
you'll need some sort of special handling for the special cases. One
classical approch is to generate canonical forms for each string, and
compare them. If you do this a lot, it might be worthwhile to maintain
the canonical representations along side of the original string, rather
than regenerate them each time.
--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
John Dibling Guest
|
Posted: Sat Aug 21, 2004 3:26 am Post subject: Re: strings sorting |
|
|
On 20 Aug 2004 10:42:59 -0400, [email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
| Quote: | For sorting, the solution is relatively simple, since you can pass your
own comparison routine to std::sort. In your sorting routing, of course,
you'll need some sort of special handling for the special cases.
|
This won't necesarrily work for at least a couple of reasons.
First, the sort comparator takes only 2 paramaters. The OP did not
state how many individual chars make up a single token. It may be 2,
it may be 5.
Second, the standard doesn't guarantee that the first paramater sent
to the comparator occurs first in the sequence, or the second appears
after the first. All the standard says is that the comparator should
compare the 2 and return true if the first is less than the second.
Consider the case where 'aaa' and 'aab' compare equal, and 'aac'
precedes both of them. There is no way to use STL's built-in sort
algorithm and a custom-built comparator to sort these tokens correctly
if the strings are strings of individual chars, and not strings of
atomic tokens.
- John
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
John Dibling Guest
|
Posted: Sat Aug 21, 2004 3:28 am Post subject: Re: strings sorting |
|
|
On 20 Aug 2004 10:07:50 -0400, "Daniel K. O." <danielko1 (AT) bol (DOT) com.br>
wrote:
| Quote: | Isn't easier then to use:
---
struct Token { /* ... */ };
typedef basic_string<Token> Tstring;
---
?
|
Yeah, maybe it is. But I'm personally more comfortable with
vector<Token>, only becasue Iv'e never used basic_string over anything
that isn't like a char. But Token is probably at least an aggregate of
some kind, maybe even more. I wonder how basic_string<Token>::c_str()
would behave, for instance. Interesting question...
- John
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Mon Aug 23, 2004 10:14 pm Post subject: Re: strings sorting |
|
|
John Dibling <jdibling (AT) computer (DOT) org> wrote
| Quote: | On 20 Aug 2004 10:42:59 -0400, [email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
For sorting, the solution is relatively simple, since you can pass
your own comparison routine to std::sort. In your sorting routing, of
course, you'll need some sort of special handling for the special
cases.
This won't necesarrily work for at least a couple of reasons.
First, the sort comparator takes only 2 paramaters.
|
Yes. Two std::string, in the case in question.
| Quote: | The OP did not state how many individual chars make up a single token.
It may be 2, it may be 5.
|
True, but you're not sorting tokens, you're sorting strings.
| Quote: | Second, the standard doesn't guarantee that the first paramater sent
to the comparator occurs first in the sequence, or the second appears
after the first. All the standard says is that the comparator should
compare the 2 and return true if the first is less than the second.
|
So where is the problem.
| Quote: | Consider the case where 'aaa' and 'aab' compare equal, and 'aac'
precedes both of them. There is no way to use STL's built-in sort
algorithm and a custom-built comparator to sort these tokens correctly
if the strings are strings of individual chars, and not strings of
atomic tokens.
|
Again, when sorting strings, the comparison function is passed two
strings. The sort routing can do pretty much anything it wants with
these two strings, provided the results are consistent, and define an
ordering relationship. (A function where "aab" < "aac", "aac" < "aaa"
and "aaa" < "aab", for example, won't cut it.)
--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Mon Aug 23, 2004 10:15 pm Post subject: Re: strings sorting |
|
|
John Dibling <jdibling (AT) computer (DOT) org> wrote
| Quote: | On 20 Aug 2004 10:07:50 -0400, "Daniel K. O." <danielko1 (AT) bol (DOT) com.br
wrote:
Isn't easier then to use:
---
struct Token { /* ... */ };
typedef basic_string
---
?
Yeah, maybe it is. But I'm personally more comfortable with
vector<Token>, only becasue Iv'e never used basic_string over anything
that isn't like a char. But Token is probably at least an aggregate of
some kind, maybe even more. I wonder how basic_string<Token>::c_str()
would behave, for instance. Interesting question...
|
It should append Token() to the end, and return a pointer to the array.
This probably won't mean anything to a C program, but then, nor would
basic_string<Token>::data(), even if you gave it the length.
If the function doen't mean anything, just don't use it:-).
I've posted a few problems in another posting. One additional problem
is that one very wide spread implementation doesn't ensure correct
alignment for anything bigger than a size_t. Probably not a problem
with Token, but you never know. On the other hand, basic_string<double>
is good for core dumps. I reported the bug, but I did suggest that it
wasn't of the highest priority:-).
--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Mon Aug 23, 2004 10:15 pm Post subject: Re: strings sorting |
|
|
"Daniel K. O." <danielko1 (AT) bol (DOT) com.br> wrote
| Quote: | John Dibling wrote:
First, change the way you store your strings. Instead of storing
strings of chars, store strings of tokens. You might use
vector<token> to accomplish this instead of string, where you define
what token is. Doing this, you can use sort() in the most intuitive
way, using a sort functor of your own devising.
Isn't easier then to use:
---
struct Token { /* ... */ };
typedef basic_string<Token> Tstring;
---
?
|
I doubt it. First, of course, Token must be a POD for this to even
work. Secondly, there is no char_traits< Token > ; you'll have to
specialize for it. And of course, there are a couple of indirectly
documented requirements, like the fact that Token() must correspond to
something which you can use in the place of ' '.
--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Allan W Guest
|
Posted: Mon Aug 23, 2004 11:03 pm Post subject: Re: strings sorting |
|
|
| Quote: | On 20 Aug 2004 10:42:59 -0400, [email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
For sorting, the solution is relatively simple, since you can pass your
own comparison routine to std::sort. In your sorting routing, of course,
you'll need some sort of special handling for the special cases.
|
John Dibling <jdibling (AT) computer (DOT) org> wrote
| Quote: | This won't necesarrily work for at least a couple of reasons.
First, the sort comparator takes only 2 paramaters. The OP did not
state how many individual chars make up a single token. It may be 2,
it may be 5.
|
There's a lot that the OP did not state. I think that until we hear
differently, we have to assume that the OP's code can parse the
tokens.
What the OP is trying to do, is to translate character strings into
modified character strings that sort correctly. I do believe that this
can be made to work, but there's a much simpler method.
| Quote: | Second, the standard doesn't guarantee that the first paramater sent
to the comparator occurs first in the sequence, or the second appears
after the first. All the standard says is that the comparator should
compare the 2 and return true if the first is less than the second.
|
Which shouldn't be a problem if the OP knows how to compare strings.
| Quote: | Consider the case where 'aaa' and 'aab' compare equal, and 'aac'
precedes both of them. There is no way to use STL's built-in sort
algorithm and a custom-built comparator to sort these tokens correctly
if the strings are strings of individual chars, and not strings of
atomic tokens.
|
Just have the compare utility decide which is less.
Going back to the case where ae should equal ai (but what happens to
af and ag and ah -- do they compare as greater than ae or less than ai?),
we might end up with something like this:
// Untested
bool isLess(const char*lhs, const char*rhs) {
for(; {
while (*lhs == *rhs && *lhs)
++lhs, ++rhs;
if (' '==*lhs) return true; // Short strings are lower
if (' '==*rhs) return false;
// We have found a difference that is NOT at the end.
// Is it letters that we consider to be the same?
if (('e'!=*lhs && 'i'!=*lhs) || ('e'!=*rhs && 'i'!=*rhs))
// They're not considered the same, so it's a "real" difference
return *lhs < *rhs;
// Move to next character
++lhs, ++rhs;
} }
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|