 |
C++Talk.NET C++ language newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
Aaron Bentley Guest
|
Posted: Sat Aug 23, 2003 8:21 am Post subject: Implementing an interator on an odd collection |
|
|
I'd like to implement a random-access iterator on top of a strange
collection. Because the internal representation of this class is
different from its API, it provides no direct access to its element
type. (It's the ICU UnicodeString, which has a UCS-32 API, but UTF-16
storage.)
An example of my solution is below. Basically, it requires a CharRef
class that supports assignment from and conversion to the true element
type. (I know this example iterator isn't STL-compliant yet.)
1. Is this the right way to do this?
2. Are there any gotchas I should watch out for?
3. Should all iterators support operator->, even for POD types?
4. Is there such a thing as a read-only random-access iterator?
5. I'll provide the standard iterator typedefs, but what about the
inevitable user who assumes HobbledIterator::operator* returns char?
6. Is the interface of Irksome "minimal but complete"? :-)
Thanks,
Aaron
--
Aaron Bentley
www.aaronbentley.com
#include <string>
#include <iostream>
using std::string;
class Irksome: private string
{
public:
Irksome(string const &arg)
:
string(arg)
{}
char getAt(size_t i) const
{
return operator[] (i);
}
void setAt(size_t i, char c)
{
operator [](i)=c;
}
size_t size() const
{
return string::size();
}
const char* c_str()
{
return string::c_str();
}
};
class HobbledIterator
{
class CharRef
{
Irksome &is;
size_t pos;
public:
CharRef(Irksome &a_is, size_t a_pos)
:
is(a_is), pos(a_pos)
{}
CharRef &operator=(char c)
{
is.setAt(pos, c);
return *this;
}
operator const char() const
{
return is.getAt(pos);
}
void moveby(size_t i)
{
pos+=i;
}
};
CharRef cr;
public:
HobbledIterator(Irksome &is, size_t pos=0)
:
cr(is, pos)
{}
CharRef const &operator*() const
{
return cr;
}
CharRef &operator*()
{
return cr;
}
HobbledIterator & operator+=(size_t i)
{
cr.moveby(i);
}
};
int main(void)
{
using std::cout;
Irksome irk(string("Jello Iterator?"));
HobbledIterator hi(irk);
*hi='H';
hi+=14;
*hi='!';
cout<
return 0;
}
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Tue Aug 26, 2003 7:04 pm Post subject: Re: Implementing an interator on an odd collection |
|
|
Aaron Bentley <aaron.bentley (AT) utoronto (DOT) ca> wrote
| Quote: | I'd like to implement a random-access iterator on top of a strange
collection. Because the internal representation of this class is
different from its API, it provides no direct access to its element
type. (It's the ICU UnicodeString, which has a UCS-32 API, but UTF-16
storage.)
|
I presume you mean the class documented at
http://oss.software.ibm.com/icu/apiref/classUnicodeString.html. Are you
sure that a random access iterator is possible? How do you implement
operator+= in constant time, for example? (I do not think that
moveIndex32 executes in constant time.)
| Quote: | An example of my solution is below. Basically, it requires a CharRef
class that supports assignment from and conversion to the true element
type. (I know this example iterator isn't STL-compliant yet.)
|
I can't figure out how it really relates to ICU UnicodeString, either.
| Quote: | 1. Is this the right way to do this?
2. Are there any gotchas I should watch out for?
|
I don't think that there is one "right way". I'm also unsure about
possible gotchas -- I'd say that the biggest gotcha is that the standard
doesn't really spell out the requirements as well as it should. In
§24.1/1, for example, it says "All iterators i support the expression
*i, resulting in a VALUE of some class, enumeration, or built-in type T,
called the value type of the iterator" (emphasis added), whereas in
§24.1/4 says that "Besides its category, a forward, bidirectional, or
random access iterator can also be mutable or constant depending on
whether the result of the expresion *i behaves as a reference or as a
reference to a constant. Constant iterators do not satisfy the
requirements for output iterators, and the result of the expression *i
(for constant iterator i) cannot be used in an expression where an
lvalue is required." (This last sentence means that &*i is not
guaranteed for constant iterators.)
Constant iterators can easily be implemented by caching the current
value in the iterator, and returning it, or a const reference to it.
For the rest, the standard is very vague. The second passage quoted
would seem to suggest that some sort of proxy (as you suggest) is
permissable; the first passage forbids it, maybe. And the table 74 in
§24.1.3 definitely says that the return type of *a (where a is a forward
iterator) must be T& (where T is the value type). On the other hand,
the last column of the same table explicitly requires "*a = t" to be
valid if X (the iterator type) is mutable -- a statement which is
useless *a must have the type T&? And of course, why have
iterator_traits<>::reference, if it *must* in fact be
iterator_traits<>::value_type&?
| Quote: | 3. Should all iterators support operator->, even for POD types?
|
According to the standard (§24.1/1): "All iterators i for which the
expression (*i).m is well-defined support the expression i->m with the
same semantics as (*i).m. Since std::vector<int>::iterator doesn't
support operator->, why should you.
| Quote: | 4. Is there such a thing as a read-only random-access iterator?
|
Do you mean, like std::vector<>::const_iterator? See above as well; the
standard specifically speaks of iterator types being mutable or
non-mutable.
| Quote: | 5. I'll provide the standard iterator typedefs, but what about the
inevitable user who assumes HobbledIterator::operator* returns char?
|
In your real case (ICU UnicodeString), why should he suppose such a
thing? Presumably, you will provide two iterators, one for UChar, and
another one (the one we are talking about here) for UChar32. If a user
chooses to use an iterator for UChar32, rather than one for UChar, it
would seem surprising that he expect it to return anything but a
UChar32. (Also, given the naming conventions, I would expect the actual
type name of the iterator to contain the text string "32". I can't
think of a stronger hint.)
| Quote: | 6. Is the interface of Irksome "minimal but complete"?
|
Since I don't know what Irksome is supposed to do, I can't say. My
comments address your presentation of the problem: the ICU
UnicodeString. This class definitely has a larger interface than your
Irksome:-). For pratical purposes, however, I think that get and set
are the minimal interface in this case.
Note that the ICU UnicodeString has no setChar32At function, which will
make a non-mutable iterator somewhat difficult. You could probably get
the same effect using the replace function; the specification isn't that
detailed, so it isn't clear whether you will have to check the length
yourself. But this introduces a serious gotcha -- an expression like
"*i = someChar32" could end up invalidating other iterators.
--
James Kanze GABI Software mailto:kanze (AT) gabi-soft (DOT) fr
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Aaron Bentley Guest
|
Posted: Wed Aug 27, 2003 10:17 pm Post subject: Re: Implementing an interator on an odd collection |
|
|
[email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
You're right-- I expect its performance is O(n), which makes it more
like a list. I've already written an input iterator for this class, and
I was hoping to be able to use other STL algorithms by implementing a
random access iterator on it. But it may simply be a bad idea.
| Quote: | [sn(I know this example iterator isn't STL-compliant yet.)
I can't figure out how it really relates to ICU UnicodeString, either.
|
Ow. I was confident that the interfaces were equivalent, and I thought
it would be more valuable to provide example code that was easily
compiled. The only tricky one was Irksome::setAt(), which I thought
could be achieved with moveIndex32() and replace (int32_t start, int32_t
length, UChar32 srcChar).
| Quote: | 1. Is this the right way to do this?
2. Are there any gotchas I should watch out for?
I don't think that there is one "right way". I'm also unsure about
possible gotchas -- I'd say that the biggest gotcha is that the standard
doesn't really spell out the requirements as well as it should.
[snip apparent Standard contradictions]
For the rest, the standard is very vague. The second passage quoted
would seem to suggest that some sort of proxy (as you suggest) is
permissable; the first passage forbids it, maybe. And the table 74 in
§24.1.3 definitely says that the return type of *a (where a is a forward
iterator) must be T& (where T is the value type). On the other hand,
the last column of the same table explicitly requires "*a = t" to be
valid if X (the iterator type) is mutable -- a statement which is
useless *a must have the type T&? And of course, why have
iterator_traits<>::reference, if it *must* in fact be
iterator_traits<>::value_type&?
|
At least I know know my confusion is well-founded! I'm not comfortable
with such proxies, since I assume iterator_traits<>::value_type& will
inevitably be used at some point where iterator_traits<>::reference is
wanted.
| Quote: | 5. I'll provide the standard iterator typedefs, but what about the
inevitable user who assumes HobbledIterator::operator* returns char?
In your real case (ICU UnicodeString), why should he suppose such a
thing?
|
Perhaps we're talking across each other here. I mean that he may well
define
void set(Uchar32 &var, Uchar32 val)
{
var=val;
}
and then call
set(*it32, 'A');
Which would invoke operator Uchar32(). But I guess those fears were ill
-founded, since that operator returns a const value.
| Quote: | Presumably, you will provide two iterators, one for UChar, and
another one (the one we are talking about here) for UChar32. If a user
chooses to use an iterator for UChar32, rather than one for UChar, it
would seem surprising that he expect it to return anything but a
UChar32. (Also, given the naming conventions, I would expect the actual
type name of the iterator to contain the text string "32". I can't
think of a stronger hint.)
|
I don't expect we'll have much need of utf-16, so I was intending to
treat it as though it was not the internal representation, e.g. the same
level of support as we'll provide for utf-8.
| Quote: | 6. Is the interface of Irksome "minimal but complete"? :-)
Since I don't know what Irksome is supposed to do, I can't say. My
comments address your presentation of the problem: the ICU
UnicodeString. This class definitely has a larger interface than your
Irksome:-).
|
Also, I had just been reading Sutter's Monoliths "Unstrung" GotW, in
which he points out that only 32 of basic_string's 103 methods actually
needed to be members. What is it about string classes?
| Quote: | Note that the ICU UnicodeString has no setChar32At function, which will
make a non-mutable iterator somewhat difficult. You could probably get
the same effect using the replace function; the specification isn't that
detailed, so it isn't clear whether you will have to check the length
yourself. But this introduces a serious gotcha -- an expression like
"*i = someChar32" could end up invalidating other iterators.
|
Good point. We could avoid that problem by using code point indices,
rather than code unit indices. . .
const UChar32 operator[](i)
{
return char32(move32(i, 0))
}
... . . but then perfomance would degrade O(n) with the length of the
string, which is probably unacceptable.
We're internationalising our program, and I had (probably naively) hoped
we could standardise on a single string type. But it's starting to look
like
1. we can increase speed and STL compatibility by using
basic_string<Uchar32> at the cost of memory efficiency
or
2. we can expose UnicodeString more, and lose the abstraction we were
getting
or
3. we can have 8-bit strings and 32-bit strings try to program around it
intelligently.
Thanks for your comments.
--
Aaron Bentley
www.aaronbentley.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Thu Aug 28, 2003 8:31 pm Post subject: Re: Implementing an interator on an odd collection |
|
|
Aaron Bentley <aaron.bentley (AT) utoronto (DOT) ca> wrote
| Quote: | kanze (AT) gabi-soft (DOT) fr wrote:
|
[...]
| Quote: | 5. I'll provide the standard iterator typedefs, but what about the
inevitable user who assumes HobbledIterator::operator* returns char?
In your real case (ICU UnicodeString), why should he suppose such a
thing?
Perhaps we're talking across each other here. I mean that he may well
define
void set(Uchar32 &var, Uchar32 val)
{
var=val;
}
and then call
set(*it32, 'A');
Which would invoke operator Uchar32(). But I guess those fears were
ill -founded, since that operator returns a const value.
|
Above all, it returns a value, not a reference. And it is a function.
So the results are NOT an lvalue, and cannot be bound to a non-const
reference.
| Quote: | Presumably, you will provide two iterators, one for UChar, and
another one (the one we are talking about here) for UChar32. If a
user chooses to use an iterator for UChar32, rather than one for
UChar, it would seem surprising that he expect it to return anything
but a UChar32. (Also, given the naming conventions, I would expect
the actual type name of the iterator to contain the text string
"32". I can't think of a stronger hint.)
I don't expect we'll have much need of utf-16, so I was intending to
treat it as though it was not the internal representation, e.g. the
same level of support as we'll provide for utf-8.
|
OK. I only assumed that because I misunderstood the type of error you
were worried about.
| Quote: | 6. Is the interface of Irksome "minimal but complete"? :-)
Since I don't know what Irksome is supposed to do, I can't say. My
comments address your presentation of the problem: the ICU
UnicodeString. This class definitely has a larger interface than
your Irksome:-).
Also, I had just been reading Sutter's Monoliths "Unstrung" GotW, in
which he points out that only 32 of basic_string's 103 methods
actually needed to be members. What is it about string classes?
|
Or what is it about Herb, that he doesn't like member functions? :-)
The problem is complex -- you want the functionality to be minimal, but
complete. The problem is that for something as general as a string,
what I need in a minimal interface may be superfluous to you. My
pre-standard String class had things like pad and trim, which seemed
logical to me if you consider that string is an abstraction for a bit of
text (and not a container in the classical sense). On the other hand,
*everything* was implemented in terms of just two functions, extract and
replace; alternatively, I could have used extract and concatenation.
Originally, at least -- I later implemented a few things separately for
performance reasons.
I find the argument about whether it could be a non-member a
red-herring. You should design your interface before starting the
implementation; at that point, you may not know what the private data
will be, and which functions can eventually be implemented as
non-members. On the other hand, there is some argument for a more
minimal interface that what I did: pad and trim are not fundamental
operations, and by their very nature, if the fundamental operations are
designed correctly, they do not need to be, and probably should not be,
members.
In the case of std::string, I suspect that the reason for so many
overloads of some of the functions is purely optimization concerns. A
char const* converts implicitly to an std::string, so logically, there
is no reason to provide both an "append( string const& )" and and
"append( char const* )". On the other hand, constructing the string for
the first could result in non-negligible runtime cost.
Other than that, of course, there is some contradiction in having all of
the find functions, which at least partially overlap with the various
searching functions (std::find, etc.) in <algorithm>. (I'll admit that
it never occured to me that my String class should have find functions.)
| Quote: | Note that the ICU UnicodeString has no setChar32At function, which
will make a non-mutable iterator somewhat difficult. You could
probably get the same effect using the replace function; the
specification isn't that detailed, so it isn't clear whether you
will have to check the length yourself. But this introduces a
serious gotcha -- an expression like "*i = someChar32" could end up
invalidating other iterators.
Good point. We could avoid that problem by using code point indices,
rather than code unit indices. . .
|
In the iterators, you mean. I'd like to see how you do that, and still
get constant time for operator*.
| Quote: | const UChar32 operator[](i)
{
return char32(move32(i, 0))
}
.. . . but then perfomance would degrade O(n) with the length of the
string, which is probably unacceptable.
|
At least according to STL rules. I was more worried about operator* in
the iterator, however. There is no requirement at the STL level to
provide an operator[] for the container, but an iterator without an
operator* isn't going to cut it.
| Quote: | We're internationalising our program, and I had (probably naively)
hoped we could standardise on a single string type. But it's starting
to look like
1. we can increase speed and STL compatibility by using
basic_string<Uchar32> at the cost of memory efficiency
|
Be careful. That path is full of surprises: you'll end up having to
re-implement substantial parts of <locale> as well.
Note that std::basic_string<Uchar32> is not legal according to the
standard, either. It implicitly uses std::char_traits<Uchar32>, which
simply doesn't exist. Or may exist, but do something unreasonable, or
just different from what you expect. Formally:
- The standard doesn't require a definition of std::char_traits<>,
just two specializations. Some current libraries do furnish a
definition, but the definitions I've seen aren't compatible.
- The standard forbids specializing a template unless one of the
parameters is a user defined class. UChar32 is a typedef to an
integral type, so you cannot provide a specialization for
std::char_traits< UChar32 >. That's not a big problem; you can
easily write your own class instead. On the other hand, if you're
doing any IO, you'll also need a specialization for std::codecvt;
using a user defined state type will allow this (and will probably
be necessary anyway, since you don't know what the one defined in
the library supports, if anything). If you want formatted IO,
you'll need specializations for std::ctype and std::numpunct -- and
neither of these have any other template parameter but the character
type, which means you cannot specialize them (legally, at least) on
UChar32. Pratically speaking, to be usable, any character type
other than char and wchar_t must be a user defined type (a class or
an enum).
- Finally, regardless of what the standard says, the level of
conformance, particularly in <locale>, is very variable. If you can
standarize on the Dinkumware library everywhere, you can probably
make things work, modulo the above restrictions. Otherwise, you'll
doubtlessly end up with code full of #ifdef's to work around various
library problems.
On the whole, it's not a path I'd recommend, unless you've a
particularly masochistic streak. If you're interested in persuing this
path further, however, you might check out the code in the experimental
section at my site. I started this, just to see what it would take, but
ran out of time before I got very far; some of the basics (including the
codecvt instantiations) are there, however.
| Quote: | or
2. we can expose UnicodeString more, and lose the abstraction we were
getting
or
3. we can have 8-bit strings and 32-bit strings try to program around
it intelligently.
|
I studied the question in depth about a year ago, and came to the
conclusion that if you want internationalized code, using guaranteed 32
bit Unicode characters, the easiest solution is probably to start from
scratch, and just forget about the standard library. Independantly of
the character set, iostream is useless if the messages are to be
internationalised (since it doesn't support positional parameters): for
that, you need something along the lines of my GB_Format class (which
currently only supports narrow characters:-(). And the complexities of
implementing and integrating your own facets, tied with the fact that,
since you cannot specialize standard template classes on anything but a
user defined type, you must use a class type for the character itself,
rather than a typedef, means that for the most part, it is easier to
implement what you need from scratch.
--
James Kanze GABI Software mailto:kanze (AT) gabi-soft (DOT) fr
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Aaron Bentley Guest
|
Posted: Sun Aug 31, 2003 7:58 am Post subject: Re: Implementing an interator on an odd collection |
|
|
[email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
| Quote: | Aaron Bentley <aaron.bentley (AT) utoronto (DOT) ca> wrote in message
[snip]
Also, I had just been reading Sutter's Monoliths "Unstrung" GotW, in
which he points out that only 32 of basic_string's 103 methods
actually needed to be members. What is it about string classes?
Or what is it about Herb, that he doesn't like member functions?
|
It's Stroustrup and Meyers too. And Bentley, not that I'm an expert.
The first part of our Unicode project involved untangling our homegrown
string type's convenience methods from its representation and minimal
interface. Then we were able to move representation and minterface into
a (new) superclass, and finally we reimplemented the
representation/minterface superclass in terms of UnicodeString. But it
would have been far easier if the interface had been minimal to start with.
| Quote: | The problem is complex -- you want the functionality to be minimal, but
complete. The problem is that for something as general as a string,
what I need in a minimal interface may be superfluous to you.
|
It also matters what you consider to be part of the interface. It's
been pointed out to me recently that friend functions and friend
classes, plus convenience functions could all be considered part of the
interface.
| Quote: | My
pre-standard String class had things like pad and trim, which seemed
logical to me if you consider that string is an abstraction for a bit of
text (and not a container in the classical sense). On the other hand,
*everything* was implemented in terms of just two functions, extract and
replace; alternatively, I could have used extract and concatenation.
Originally, at least -- I later implemented a few things separately for
performance reasons.
|
Having two fundamental operations sounds like a pretty good design. I
wish I could say the same for my homegrown string type.
| Quote: | I find the argument about whether it could be a non-member a
red-herring. You should design your interface before starting the
implementation; at that point, you may not know what the private data
will be, and which functions can eventually be implemented as
non-members.
|
Focusing on a minimal but complete class interface loosens the coupling
between convenience functions and the types they use. It separates
concerns, and it makes it easier to reuse the convenience functions with
another type. It can also reduce the dependencies of your class, since
you don't need declarations (or forward declarations) for the types in
convenience function arguments.
[snip]
| Quote: | In the case of std::string, I suspect that the reason for so many
overloads of some of the functions is purely optimization concerns. A
char const* converts implicitly to an std::string, so logically, there
is no reason to provide both an "append( string const& )" and and
"append( char const* )". On the other hand, constructing the string for
the first could result in non-negligible runtime cost.
|
While append ( char const* ) may occasionally have a performace
advantage over append ( string const & ), I doubt either has an
advantage over append(InputIterator begin, InputIterator end). (And
since you have insert(), append() doesn't need to be a member.)
| Quote: | Note that the ICU UnicodeString has no setChar32At function, which
will make a non-mutable iterator somewhat difficult. You could
probably get the same effect using the replace function;
But this introduces a
serious gotcha -- an expression like "*i = someChar32" could end up
invalidating other iterators.
Good point. We could avoid that problem by using code point indices,
rather than code unit indices. . .
In the iterators, you mean. I'd like to see how you do that, and still
get constant time for operator*.
[snip]
There is no requirement at the STL level to
provide an operator[] for the container, but an iterator without an
operator* isn't going to cut it.
|
I wasn't serious (well, I was serious early in the morning, but then I
woke up more and thought better).
| Quote: | 1. we can increase speed and STL compatibility by using
basic_string<Uchar32> at the cost of memory efficiency
Be careful. That path is full of surprises: you'll end up having to
re-implement substantial parts of <locale> as well.
Note that std::basic_string<Uchar32> is not legal according to the
standard, either. It implicitly uses std::char_traits<Uchar32>, which
simply doesn't exist. Or may exist, but do something unreasonable, or
just different from what you expect.
|
[snip]
| Quote: | On the whole, it's not a path I'd recommend, unless you've a
particularly masochistic streak.
|
Thanks for the warning. I'd assumed that basic_string<UChar32> was the
most correct approach, but it sounds like using vector<UChar32> would be
more appropriate, since we're using it as the storage representation,
not as a public interface.
| Quote: |
I studied the question in depth about a year ago, and came to the
conclusion that if you want internationalized code, using guaranteed 32
bit Unicode characters, the easiest solution is probably to start from
scratch, and just forget about the standard library.
|
I wouldn't say "the standard library" here, just "standard strings". I
see plenty of things in the library that are useful in re-stringing
ourselves. For instance, I've got a case-insesitive comparison that
uses the ICU C api with std::lexicographical_compare().
For the past couple of years, I've been mildly embarassed that we used a
non-standard string in our program. But now that we want to change the
character type of those strings, I'm glad we have our own string. But
I'm not interested in implementing it by hand. Having it behave like an
STL container, not an STL string, seems like a good compromise.
Aaron
--
Aaron Bentley
www.aaronbentley.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Tue Sep 02, 2003 5:57 am Post subject: Re: Implementing an interator on an odd collection |
|
|
Aaron Bentley <aaron.bentley (AT) utoronto (DOT) ca> wrote
| Quote: | kanze (AT) gabi-soft (DOT) fr wrote:
Aaron Bentley <aaron.bentley (AT) utoronto (DOT) ca> wrote in message
[snip]
Also, I had just been reading Sutter's Monoliths "Unstrung" GotW, in
which he points out that only 32 of basic_string's 103 methods
actually needed to be members. What is it about string classes?
Or what is it about Herb, that he doesn't like member functions? :-)
It's Stroustrup and Meyers too. And Bentley, not that I'm an expert.
|
My own philosophy is to use member functions where member functions are
appropriate, and non-member functions where non-member functions are
appropriate. Of course, determining what is and what is not appropriate
is a non-trivial task:-).
In the case of a string class, the problem is that we have many masters,
each with his own idea as to what is appropriate.
| Quote: | The first part of our Unicode project involved untangling our
homegrown string type's convenience methods from its representation
and minimal interface. Then we were able to move representation and
minterface into a (new) superclass, and finally we reimplemented the
representation/minterface superclass in terms of UnicodeString. But
it would have been far easier if the interface had been minimal to
start with.
|
Maybe. Suppose that the interface had been minimal, and that all of the
"helpers" had been non-members. Then you change the implementation, and
suddenly, some of the non-members should be members, if only for
performance reasons.
| Quote: | The problem is complex -- you want the functionality to be minimal,
but complete. The problem is that for something as general as a
string, what I need in a minimal interface may be superfluous to
you.
It also matters what you consider to be part of the interface. It's
been pointed out to me recently that friend functions and friend
classes, plus convenience functions could all be considered part of
the interface.
|
Quite. The real problem is deciding what the role of the class is.
After that, functions which are necessary to fulfill that role should be
members (whether they need access to private members or not); functions
which are not necessary are convenience functions, and can (and probably
should) be made global. Which are which is obvious to all uses, so the
problem of having some functions which are called with the member
syntax, and others which are called with non-member syntax, is moot.
The problem with string is, as I said, it serves many masters. As a
result, the functions which are necessary for the role you assign to it
are secondary helpers for the role I assign to it. And nobody really
knows what its role is: is it to represent text, or is it a specialized
container? And if to represent text, are we concerned with multibyte
characters, or not?
| Quote: | My pre-standard String class had things like pad and trim, which
seemed logical to me if you consider that string is an abstraction
for a bit of text (and not a container in the classical sense). On
the other hand, *everything* was implemented in terms of just two
functions, extract and replace; alternatively, I could have used
extract and concatenation. Originally, at least -- I later
implemented a few things separately for performance reasons.
Having two fundamental operations sounds like a pretty good design. I
wish I could say the same for my homegrown string type.
|
In a certain sense (since it supports construction from iterators, or
with another string, plus a start position and length), std::string has
only one basic operator -- replace. Except, of course, that begin and
end are also basic operators.
But the real interest is in terms of conception -- I actually wrote all
of the other functions in the interface, be they member or not, in terms
of these two basic operators.
| Quote: | I find the argument about whether it could be a non-member a
red-herring. You should design your interface before starting the
implementation; at that point, you may not know what the private
data will be, and which functions can eventually be implemented as
non-members.
Focusing on a minimal but complete class interface loosens the
coupling between convenience functions and the types they use.
|
Question: what is a convenience function? In general, not just for the
specific case of string.
| Quote: | It separates concerns, and it makes it easier to reuse the convenience
functions with another type.
|
Again, what is a convenience function. Generalized algorithms should
definitly be free functions (and the find functions DON'T belong in
std::string -- the most useful ones are in boost::reg_exp anyway). But
is append any more a convenience function than replace, just because I
can write it in terms of replace (e.g. replace( size(), 0, s ) for
append( s )). In my pre-standard string class, append was written in
terms of insert, and insert and remove were written in terms of replace;
I think that that is what I would do in an implementation of std::string
as well (although my interface was different, in that replace and the
others were const functions which returned a new string, rather than
doing the modification in place).
| Quote: | It can also reduce the dependencies of your class, since you don't
need declarations (or forward declarations) for the types in
convenience function arguments.
|
I'd still like to see a good example of a convenience function. In the
context of std::string, for example, I don't see any functions whose
removal would reduce the number of external symbols needed.
| Quote: | [snip]
In the case of std::string, I suspect that the reason for so many
overloads of some of the functions is purely optimization concerns.
A char const* converts implicitly to an std::string, so logically,
there is no reason to provide both an "append( string const& )" and
and "append( char const* )". On the other hand, constructing the
string for the first could result in non-negligible runtime cost.
While append ( char const* ) may occasionally have a performace
advantage over append ( string const & ), I doubt either has an
advantage over append(InputIterator begin, InputIterator end). (And
since you have insert(), append() doesn't need to be a member.)
|
And since you have replace, insert doesn't need to be a member:-). And
append( string const& ) may compiler a lot quicker than the form with
two iterators, which depends on meta-programming techniques to attain
acceptable speed for random_access iterators. But the real problem is
that append is implemented in terms of insert, insert in terms of
replace, and there isn't any replace( size_type, size_type,
ForwardIterator, ForwardIterator) .
| Quote: | [snip]
On the whole, it's not a path I'd recommend, unless you've a
particularly masochistic streak.
Thanks for the warning. I'd assumed that basic_string<UChar32> was
the most correct approach, but it sounds like using vector<UChar32
would be more appropriate, since we're using it as the storage
representation, not as a public interface.
|
Using basic_string
approach. And some forms of masochism do seem to be in vogue. But I
can assure you, vector<UChar32> will be a lot less work. (Of course, if
you're paid by the hour...)
| Quote: | I studied the question in depth about a year ago, and came to the
conclusion that if you want internationalized code, using
guaranteed 32 bit Unicode characters, the easiest solution is
probably to start from scratch, and just forget about the standard
library.
I wouldn't say "the standard library" here, just "standard strings".
I see plenty of things in the library that are useful in re-stringing
ourselves. For instance, I've got a case-insesitive comparison that
uses the ICU C api with std::lexicographical_compare().
|
Yes. I was thinking about the standard library in terms of interfaces:
std::basic_string, but also the std::iostream stuff and std::locale. I
would probably use std::vector for the underlying container, though.
And any of the functions in <algorithm> that made sense.
| Quote: | For the past couple of years, I've been mildly embarassed that we used
a non-standard string in our program. But now that we want to change
the character type of those strings, I'm glad we have our own string.
But I'm not interested in implementing it by hand. Having it behave
like an STL container, not an STL string, seems like a good
compromise.
|
Having it behavior like a string, that is a piece of text, rather than a
container, is probably an even better idea. Or would be, if there were
the slightest degree of concensus as to how a piece of text should
behave. Using an STL container (probably std::vector, although
std::deque is also a candidate) within the implementation is definitly a
good idea -- the way std::string class requires the use of traits (for
assignment, no less) makes it infeasable for std::string, but I wouldn't
let the design errors in the standard library influence you for your own
classes. (In either direction -- there are many situations where traits
*ARE* a good idea. Just not here.)
--
James Kanze GABI Software mailto:kanze (AT) gabi-soft (DOT) fr
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|