C++Talk.NET Forum Index C++Talk.NET
C++ language newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

peek() and tellg()

 
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ Language (Moderated)
View previous topic :: View next topic  
Author Message
wizofaus@hotmail.com
Guest





PostPosted: Wed Sep 28, 2005 10:13 am    Post subject: peek() and tellg() Reply with quote



Is the any reason according to the standard that calling tellg() on an
std::ifstream after a call to peek() could place the filebuf in an
inconsistent state?
I think it's a bug in the VC7 dinkumware implementation (and I've
reported to them as such), but the following code

std::ofstream ofs("test.txt");
ofs << "0123456789";
ofs.close();
std::wifstream ifs("test.txt");
std::wcout << wchar_t(ifs.peek());
ifs.tellg();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << std::endl;

Prints out 00246, when I would expect 00123. Remove the tellg() (or
move it to after a get) and it prints exaclty that.


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
P.J. Plauger
Guest





PostPosted: Wed Sep 28, 2005 5:39 pm    Post subject: Re: peek() and tellg() Reply with quote



<wizofaus (AT) hotmail (DOT) com> wrote


Quote:
Is the any reason according to the standard that calling tellg() on an
std::ifstream after a call to peek() could place the filebuf in an
inconsistent state?
I think it's a bug in the VC7 dinkumware implementation (and I've
reported to them as such), but the following code

std::ofstream ofs("test.txt");
ofs << "0123456789";
ofs.close();
std::wifstream ifs("test.txt");
std::wcout << wchar_t(ifs.peek());
ifs.tellg();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << std::endl;

Prints out 00246, when I would expect 00123. Remove the tellg() (or
move it to after a get) and it prints exaclty that.

Actually, I would expect 01234, and that's what our latest
library gives, both in our shipped product and VC++ V8
(Whidbey) which is soon to be formally released. V7.0
and earlier "fail" in a different way than V7.0.

I put "fail" in quotes because the above code is asking
for trouble. First, it writes a text line with no
terminating newline. That's not a problem here, but it
generally causes trouble. More important, it mixes two
different ways of accessing a stream:

-- as a one-pass input stream with limited pushback

-- as a random-access sequence with bookmarks

It has been known for decades that trying to access
the same stream both ways is fraught with peril.
Whether you call the resulting surprising behavior
buggy or regrettable is a matter of taste.

The biggest stress point in the code above is the
initial peek followed by a tell. It's hard enough
pushing back a character and still generating a
proper seek offset; if you push back a character
at the beginning of a file it's way harder to get
"right". The C I/O model, which underlies C++,
permits the implementation to discard any pushed
back characters when determining a seek offset.
That's why we read the "0" only once. It may still
not be what you want, but I believe that it's
defensible.

FWIW, you'll find this code terribly nonportable.
Other Standard C++ library implementations go off
in all sorts of interesting directions in this
area. If you want robust code, don't mix peek
and seek/tell.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
kanze
Guest





PostPosted: Thu Sep 29, 2005 8:20 pm    Post subject: Re: peek() and tellg() Reply with quote



[email]wizofaus (AT) hotmail (DOT) com[/email] wrote:
Quote:
Is the any reason according to the standard that calling
tellg() on an std::ifstream after a call to peek() could place
the filebuf in an inconsistent state?

I think it's a bug in the VC7 dinkumware implementation (and
I've reported to them as such), but the following code

std::ofstream ofs("test.txt");
ofs << "0123456789";
ofs.close();
std::wifstream ifs("test.txt");

Careful. You're reading a file written with narrow characters
as if it contained wide characters. Any results will depend on
the locale; whether they're useful or sensible is almost pure
luck.

On most modern machines, narrow characters use an encoding in
which all of the characters in the basic character set are
ASCII. If this is the case, imbuing a UTF-8 locale should allow
reading them correctly. Still, IMHO, if you want the file to
contain UTF-8, you should write it with an wofstream imbued with
a UTF-8 locale.

The "C" locale depends on a lot of things; I don't think the
standard actually says what it should be in this case. And of
course, most programs will have done a std::locale::globale(
std::locale( "" ) ) as the first thing in main; under Unix, at
least, this sets the locale to a value determined by environment
variables.

Off hand, from what little I know of Windows, I would expect the
default to use UTF-16LE, not UTF-8. In which case, you're likely
to get some very strange results: letters from strange
alphabets, or illegal characters.

Quote:
std::wcout << wchar_t(ifs.peek());
ifs.tellg();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << std::endl;

Prints out 00246, when I would expect 00123. Remove the
tellg() (or move it to after a get) and it prints exaclty
that.

I'm not sure what effect the tellg() has -- that part seems
strange. But for the rest, I'd say that you're playing with
undefined, or poorly defined behavior. (Not necessarily
undefined behavior in the sense of the standard, but in the
sense that there really isn't any requirements as to what the
results should be.)

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
wizofaus@hotmail.com
Guest





PostPosted: Sat Oct 01, 2005 1:51 am    Post subject: Re: peek() and tellg() Reply with quote


kanze wrote:
Quote:
wizofaus (AT) hotmail (DOT) com wrote:
Is the any reason according to the standard that calling
tellg() on an std::ifstream after a call to peek() could place
the filebuf in an inconsistent state?

I think it's a bug in the VC7 dinkumware implementation (and
I've reported to them as such), but the following code

std::ofstream ofs("test.txt");
ofs << "0123456789";
ofs.close();
std::wifstream ifs("test.txt");

Careful. You're reading a file written with narrow characters
as if it contained wide characters. Any results will depend on
the locale; whether they're useful or sensible is almost pure
luck.

Yes, I should have provided the example using std::wofstream (and
L"0123456789"). Exactly the same problem occurs. (But not when using
narrow streams for both in and output).

Quote:

Off hand, from what little I know of Windows, I would expect the
default to use UTF-16LE, not UTF-8. In which case, you're likely
to get some very strange results: letters from strange
alphabets, or illegal characters.

The Dinkumware implementation, and indeed I think all the others I've
used default to simply converting wchar_t's to char's (thus any values
over 255 cannot be stored).


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
P.J. Plauger
Guest





PostPosted: Sun Oct 02, 2005 1:33 am    Post subject: Re: peek() and tellg() Reply with quote

<wizofaus (AT) hotmail (DOT) com> wrote


Quote:
Off hand, from what little I know of Windows, I would expect the
default to use UTF-16LE, not UTF-8. In which case, you're likely
to get some very strange results: letters from strange
alphabets, or illegal characters.

The Dinkumware implementation, and indeed I think all the others I've
used default to simply converting wchar_t's to char's (thus any values
over 255 cannot be stored).

Yes, we use the same conversions as the Standard C library by default.
But we also supply just about every conversion you can imagine for
C++ with our codecvt library in our CoreX product.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Display posts from previous:   
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ Language (Moderated) All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.