 |
C++Talk.NET C++ language newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
Daniel Krügler (ne Spange Guest
|
Posted: Sat Sep 25, 2004 5:38 am Post subject: eof issues - or: where is the end? |
|
|
Hello,
sometimes it seems that even simple things are not that simple as they look:
Due to the fact, that the call of eof() on a given (basic_)istream is no
reliable method
to determine whether the stream is at its eof position I tried to write
a portable
function with the interface
// Precondition: is.fail() != true and the stream must be repositional,
e.g. stringstream or
// fstream. A counter example, which does not work is cin:
bool IsEof(std::istream& is);
using the positioning facilities of the stream family:
bool IsEof(std::istream& is)
{
if (is) {
std::streambuf* buf = is.rdbuf();
// Get the current position:
const std::ios::pos_type curr = buf->pubseekoff(0,
std::ios_base::cur, std::ios_base::in);
if (curr == std::ios::pos_type(-1)) {
// Error handling
}
// Jump to the end of the file:
buf->pubseekoff(0, std::ios_base::end, std::ios_base::in);
const std::ios::pos_type atEnd = buf->pubseekoff(0,
std::ios_base::cur, std::ios_base::in);
if (atEnd == std::ios::pos_type(-1)) {
// Error handling
}
const bool result = (curr == atEnd);
if (!result) {
// Reset to the original stream position:
const std::ios::pos_type check = buf->pubseekpos(curr,
std::ios_base::in);
if (check == std::ios::pos_type(-1)) {
// Error handling
}
}
return result;
}
else {
// Error handling
}
}
and wrote some unit tests for empty and non-empty files as well as
stringstreams with empty and
non-empty contents. Interestingly these tests failed for the empty
stringstream case on my Dinkwumware implementation
of the standard library (VC 7.1) for the first pubseekoff call which
return fpos(-1). Reading the corresponding
chapters of stringstream (27.7.1.3) I am no longer sure any more whether
the observed (mis)behaviour is
intended/conforming or not: Paragraph 7 says:
"For a sequence to be positioned, if its next pointer (either gptr() or
pptr()) is a null pointer, the positioning
operation fails."
So, its seems that the behaviour is conforming, also I don't understand
why: Consider a an iterator related example:
cont_type cont;
cont_type::iterator it = cont.end();
it += 0;
Shouldn't this be portable code (for iterators with random access)?
(Note that I neither access the memory like
the famous &(*end()) nor do I overstep the end).
Back to the stream: Obviously I have to check gptr() here, but that is
not possible since this is a protected member
of the stream. So what is left?
- Is the observed behaviour is conforming? (I guess) Does there exist
any method to implement IsEof() for repositional streams
in a portable manner?
- Is my understanding of "repositional" wrong, such that I assume, that
it is dependent on a stream type (but
it seems, that it also is dependent on the stream state)?
- Last but not least: Can I rely on the proposed implementation working
portably for basic_ifstreams (and least
on (i)fstreams?)
- Do you suggest a better solution? Which?
Thanks for any enlightening ideas,
Daniel Krügler
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Frank Birbacher Guest
|
Posted: Mon Sep 27, 2004 10:30 am Post subject: Re: eof issues - or: where is the end? |
|
|
Hi!
Daniel Krügler (ne Spangenberg) wrote:
| Quote: | sometimes it seems that even simple things are not that simple as they look:
Due to the fact, that the call of eof() on a given (basic_)istream is no
reliable method
|
How do you know that?
Might it be that you are not using streams correctly?
An example of usage of streams:
#include <iostream>
#include <string>
#include <list>
struct not_ready {};
struct io_error {};
void extract_lines(
istream& is,
std::list<std::string> &lines)
{
if(!is) throw not_ready();
lines.clear();
//extract line by line
std::string tmp;
while(std::getline(is, tmp))
lines.push_back(tmp);
//check why getline failed
if(!is.eof())
throw io_error();
}
Notice that eof is not the loop condition.
Frank
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Mon Sep 27, 2004 10:42 am Post subject: Re: eof issues - or: where is the end? |
|
|
"Daniel Krügler (ne Spangenberg)" <dsp (AT) bdal (DOT) de> wrote
| Quote: | sometimes it seems that even simple things are not that simple as they
look:
|
Welcome to C++ .
| Quote: | Due to the fact, that the call of eof() on a given (basic_)istream is
no reliable method to determine whether the stream is at its eof
position I tried to write a portable function with the interface
|
What's wrong with simply: input.peek() == EOF ?
The real problem, of course, is to define "reliable" in this case. If
your next input is << to an int, the fact that input.peek() doesn't
return EOF doesn't guarantee that you won't encounter end of file. The
problem is that extractors read an undefined number of characters
(skipping blanks) before trying to read the value. And that the istream
cannot know in advance whether the next input will be a formatting
extractor (operator <<) or a non-formatting extractor (getline(), get(),
etc.). If only blanks remain in the stream, a predictive eof should
return true if the next input is a <<, and false if it is getline(),
get() or something like that.
If you are doing strictly unformatted input, the solution with peek(),
above should be sufficient.
Considering your code, even supposing it worked, it will give a false
result if all of the remaining characters are white space, and the next
input operatoin is a <<.
--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Daniel Krügler (ne Spange Guest
|
Posted: Mon Sep 27, 2004 8:15 pm Post subject: Re: eof issues - or: where is the end? |
|
|
Hello Frank Birbacher,
Frank Birbacher schrieb:
| Quote: | How do you know that?
Might it be that you are not using streams correctly?
Possibly valid reasons, but not in this case, see below. |
| Quote: | An example of usage of streams:
#include <iostream
#include
#include
struct not_ready {};
struct io_error {};
void extract_lines(
istream& is,
std::list
{
if(!is) throw not_ready();
lines.clear();
//extract line by line
std::string tmp;
while(std::getline(is, tmp))
lines.push_back(tmp);
//check why getline failed
if(!is.eof())
throw io_error();
}
This function **attempts** to read something from the string, **before** |
checking, whether the stream is eof. This case is the only valid usage
of eof.
**This** is not the use case for the proposed IsEof(), which should return
a reasonable answer independent of the fact whether attempt was taken to
read
something from the stream.
Greetings from Bremen,
Daniel Krügler
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Daniel Krügler (ne Spange Guest
|
Posted: Mon Sep 27, 2004 9:01 pm Post subject: Re: eof issues - or: where is the end? |
|
|
Hello James Kanze,
[email]kanze (AT) gabi-soft (DOT) fr[/email] schrieb:
| Quote: | What's wrong with simply: input.peek() == EOF ?
Yes, I also tested this approach, but got problems due to the |
description of the peek clause:
"After constructing a sentry object, reads but does not extract the
current input character."
The construction of the sentry failed in cases, where input was
**already** eof. Now again
inspired by your proposal, it seems, that the following should be right:
input.eof() || input.peek() == EOF
To prevent unwanted side effects by the test funtion itself I changed it
as follows:
"
if (!input) do_some_error_handling;
if (input.eof()) return true;
const bool result = (input.peek() == EOF);
if (result) input.clear(input.rdstate() & ~std::ios_base::eofbit);
return result;
"
The alternative code would **only** use the streambuffer and thus preventing
the state changes in the fiorst place:
" if (input) {
std::streambuf* buf = input.rdbuf();
const bool result = (buf->sgetc() == EOF);
return result;
}
else {
do_some_error_handling;
}
"
What do you think?
| Quote: | The real problem, of course, is to define "reliable" in this case. If
your next input is << to an int, the fact that input.peek() doesn't
return EOF doesn't guarantee that you won't encounter end of file. The
problem is that extractors read an undefined number of characters
(skipping blanks) before trying to read the value. And that the istream
cannot know in advance whether the next input will be a formatting
extractor (operator <<) or a non-formatting extractor (getline(), get(),
etc.). If only blanks remain in the stream, a predictive eof should
return true if the next input is a <<, and false if it is getline(),
get() or something like that.
If you are doing strictly unformatted input, the solution with peek(),
above should be sufficient.
Considering your code, even supposing it worked, it will give a false
result if all of the remaining characters are white space, and the next
input operatoin is a <<.
The behaviour in the cases you mention is OK. |
Thank you very much for your input! Although I also tried your proposed
solution in an earlier
development cycle, I was too early discouraged by a minimal side effect,
which seems actually easy
to fix (as I hope). Your proposal brought me back to that early attempt
and showed me two things:
- Look again at the code and fix the **actual** error.
- Be aware of behaviour anomalies, which might occur for users in case
of formated IO and whitespaces.
Besides that hopefully "correct" solution, my question concerning the
current behaviour of the
"positional" functions of the stream buffer classes remains.
Why is the request for repositioning an arbitrary stream buffer for a
relative step of 0 (independent of the
origin) not well-defined in every case? (Problems with the empty
stringstream).
Greetings from Bremen,
Daniel
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Tue Sep 28, 2004 10:36 am Post subject: Re: eof issues - or: where is the end? |
|
|
Frank Birbacher <bloodymir.crap (AT) gmx (DOT) net> wrote
| Quote: | Daniel Krügler (ne Spangenberg) wrote:
sometimes it seems that even simple things are not that simple as
they look:
Due to the fact, that the call of eof() on a given (basic_)istream
is no reliable method
How do you know that?
|
Because it is a well known fact.
| Quote: | Might it be that you are not using streams correctly?
|
It might be because he wants a function whose name is eof() to be usable
to test whether the stream is at eof() or not.
| Quote: | An example of usage of streams:
|
I think you chose a bad example:-).
| Quote: | #include <iostream
#include
#include
struct not_ready {};
struct io_error {};
void extract_lines(
istream& is,
std::list
{
if(!is) throw not_ready();
lines.clear();
//extract line by line
std::string tmp;
while(std::getline(is, tmp))
lines.push_back(tmp);
//check why getline failed
if(!is.eof())
throw io_error();
|
And when will !is.eof() be true? Except in exceptional cases, the only
possible error on a getline is when the last line in the file doesn't
end with a 'n'. But in that case, is.eof() is true -- there is, in
fact, no way of detecting an error on getline.
A similar case exists when reading floating point values, and the stream
ends with something like "2.3e+" (and no trailing 'n'). There is a
format error, but eof() will still return true.
It is, in fact, impossible to reliably distinguish between an error and
a real end of file with the standard streams.
| Quote: | }
Notice that eof is not the loop condition.
|
Notice that you are using a function with the name of eof() to test
whether there was an error or not. I doubt that Daniel would have made
his comment if the name of the function had been no_error_seen().
--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Tue Sep 28, 2004 10:36 am Post subject: Re: eof issues - or: where is the end? |
|
|
"Daniel Krügler (ne Spangenberg)" <dsp (AT) bdal (DOT) de> wrote
| Quote: | kanze (AT) gabi-soft (DOT) fr schrieb:
What's wrong with simply: input.peek() == EOF ?
Yes, I also tested this approach, but got problems due to the
description of the peek clause:
"After constructing a sentry object, reads but does not extract the
current input character."
The construction of the sentry failed in cases, where input was
**already** eof.
|
But if the construction of the sentry object fails, peek() is specified
to retunr EOF.
| Quote: | Now again inspired by your proposal, it seems, that the following
should be right:
input.eof() || input.peek() == EOF
To prevent unwanted side effects by the test funtion itself I changed it
as follows:
|
Speaking of unwanted side effects, there is one problem with peek(): it
doesn't set eofbit when rdbuf()->sgetc() returns EOF. And I have used
actual implementations where rdbuf()->sgetc() was not reproductable --
it could return EOF on one call, and something else on another. IMHO,
this is an error in the implementation, but the fact remains that some
real implementations behave like this. (The only case I'm familiar with
concerned inputting from a keyboard under Unix. In that case, entering
control D at the start of a line caused rdbuf()->sgetc() to return EOF
exactly once; the next invocation tried to read the keyboard again.)
| Quote: | "
if (!input) do_some_error_handling;
if (input.eof()) return true;
const bool result = (input.peek() == EOF);
if (result) input.clear(input.rdstate() & ~std::ios_base::eofbit);
return result;
"
|
Very good. You avoid the multiple read problem.
| Quote: | The alternative code would **only** use the streambuffer and thus
preventing the state changes in the fiorst place:
" if (input) {
std::streambuf* buf = input.rdbuf();
const bool result = (buf->sgetc() == EOF);
return result;
}
else {
do_some_error_handling;
}
"
What do you think?
|
I would go with the first. But I'm not sure what state change you are
talking about. Constructing the sentry object with the second argument
true (as does peek()) should never cause a state change in the stream.
[...]
| Quote: | Thank you very much for your input! Although I also tried your
proposed solution in an earlier development cycle, I was too early
discouraged by a minimal side effect, which seems actually easy to fix
(as I hope).
|
I'm curious as to what the side effect was. I use this idiom a lot
myself, and I've never had any problem when reading from a file. The
only problem is keyboard input, IMHO due to an error in the
implementation, and you've fixed that by using a function which sets the
eofflag when you see eof.
[...]
| Quote: | Besides that hopefully "correct" solution, my question concerning the
current behaviour of the "positional" functions of the stream buffer
classes remains.
Why is the request for repositioning an arbitrary stream buffer for a
relative step of 0 (independent of the origin) not well-defined in
every case? (Problems with the empty stringstream).
|
I'm not sure, but you do have to pay attention. If you initialize a
stringstream with an empty stream, you are at eof. The standard does
not require an implementation to set the eofbit until the first attempt
to read the stream, but it doesn't forbid it from setting it earlier.
And of course, if eofbit is set, good() is false, the sentry object will
be false, and seek will return an error without even trying.
IMHO, this is an obvious defect in the standard, an accidental result of
putting seekg() in the section on unformatted input. IMHO, the intent
is clear from the fact that the Effects clause of seekg() starts with
"If fail() != true". The obvious intent is to test only fail() (which
doesn't include the eofbit), and not good() (which does, and which is
what the sentry object tests).
Of course, regardless of what the standard says, or what an eventual TC
might say, you still have to deal with real implementations, and if some
get it wrong... If you really want to get it working (say, for personal
satisfaction), then clearing eofbit before seeking might do the trick.
--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Hyman Rosen Guest
|
Posted: Wed Sep 29, 2004 9:37 am Post subject: Re: eof issues - or: where is the end? |
|
|
[email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
| Quote: | It might be because he wants a function whose name is eof() to be
usable to test whether the stream is at eof() or not.
|
But that is an impossible wish. Just like a hot table in craps,
you can only know whether eof() was, not whether eof() will be.
Whether for terminals or for files, nothing prevents more input
from appearing in the future. That is, suppose I ask the system
for the length of a file I'm reading, and keep track of how many
characters I have already read. If the two amounts match, I may
wish an anticipatory eof() to return true. But in between that
call and the next read, some other process may have come along
and added more characters to the file.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Daniel Krügler (ne Spange Guest
|
Posted: Wed Sep 29, 2004 10:10 am Post subject: Re: eof issues - or: where is the end? |
|
|
Hello James Kanze,
[email]kanze (AT) gabi-soft (DOT) fr[/email] schrieb:
| Quote: | "Daniel Krügler (ne Spangenberg)" <dsp (AT) bdal (DOT) de> wrote in message
news:<41580be6$0$289$4d4ebb8e (AT) businessnews (DOT) de.uu.net>...
[email]kanze (AT) gabi-soft (DOT) fr[/email] schrieb:
What's wrong with simply: input.peek() == EOF ?
[snip] |
| Quote: | "After constructing a sentry object, reads but does not extract the
current input character."
The construction of the sentry failed in cases, where input was
**already** eof.
But if the construction of the sentry object fails, peek() is specified
to retunr EOF.
The fact, that peek() returns EOF in both cases was not the problem. The |
problem occured in my test case,
which was (also) described by the following code-snippet:
std::fstream file;
file.open("empty.txt"); // nomen est omen...
CPPUNIT_ASSERT(!file.fail());
CPPUNIT_ASSERT(IsEof(file));
CPPUNIT_ASSERT(IsEof(file)); // (a)
CPPUNIT_ASSERT(!file.fail()); // Here the test failed!
Consider the case that peek is called for a stream which state is
already eof (a). According to
the standard (and to my Dinkumware lib), the istream::sentry object is
constructed (27.6.1.1.2/p. 2)..
"[..] If is.rdbuf()->sbumpc() or is.rdbuf()->sgetc() returns traits::eof(),
the function calls setstate(failbit | eofbit) (which may throw
ios_base::failure)."
... and adds the failbit flag to the stream state, which causes the final
test to fail.
Actually I am not sure, whether this attempt (to sgetc) is according to
the standard, because the same paragraph
begins with the sentence:
"If is.good() is true, prepares for formatted or unformatted input.[..]"
What can we conclude from this? Does it mean, that no further actions
should take place, if good() is false? Which part of the
remaining clause belongs to cases where the initial state of the stream
was not good? The Dinkumware lib
does indeed add the failbit flag in any case of a stream where good()
returns false.
The clause seems rather fuzzy, and paragraph 5 does not really enlighten
me further, if ist says:
"[..] During preparation, the constructor may call setstate(failbit)[..]"
But independent of the theory (Standard), the current state was, that
the 2nd attempt to get the eof information of
the stream (in my original attempt) caused the stream to go into the
fail state, which is the side effect, I mentioned.
| Quote: | Speaking of unwanted side effects, there is one problem with peek(): it
doesn't set eofbit when rdbuf()->sgetc() returns EOF. And I have used
actual implementations where rdbuf()->sgetc() was not reproductable --
it could return EOF on one call, and something else on another.
Is that observation the reason for your advice to stay away from the |
implementation, which acts on the buffer
instead of the whole stream?
| Quote: | "
if (!input) do_some_error_handling;
if (input.eof()) return true;
const bool result = (input.peek() == EOF);
if (result) input.clear(input.rdstate() & ~std::ios_base::eofbit);
return result;
"
Very good. You avoid the multiple read problem.
|
OK, I will use that. Thanks again!
| Quote: | I would go with the first. But I'm not sure what state change you are
talking about. Constructing the sentry object with the second argument
true (as does peek()) should never cause a state change in the stream.
Interesting. Actually my observations where contrary to that opinion. Do |
you think that this
points to an errornous implementation of the Dinkumware lib (as
deascribed above)?
| Quote: | I'm curious as to what the side effect was.
See above. |
| Quote: | [...]
Besides that hopefully "correct" solution, my question concerning the
current behaviour of the "positional" functions of the stream buffer
classes remains.
Why is the request for repositioning an arbitrary stream buffer for a
relative step of 0 (independent of the origin) not well-defined in
every case? (Problems with the empty stringstream).
I'm not sure, but you do have to pay attention. If you initialize a
stringstream with an empty stream, you are at eof.
D'accord! |
| Quote: | The standard does
not require an implementation to set the eofbit until the first attempt
to read the stream, but it doesn't forbid it from setting it earlier.
And of course, if eofbit is set, good() is false, the sentry object will
be false, and seek will return an error without even trying.
See the side-effect above. So that supports the view that clearing the |
eofbit flag in the proposed
way seems necessary, right?
| Quote: | IMHO, this is an obvious defect in the standard, an accidental result of
putting seekg() in the section on unformatted input. IMHO, the intent
is clear from the fact that the Effects clause of seekg() starts with
"If fail() != true". The obvious intent is to test only fail() (which
doesn't include the eofbit), and not good() (which does, and which is
what the sentry object tests).
I think so, too. |
| Quote: | Of course, regardless of what the standard says, or what an eventual TC
might say, you still have to deal with real implementations, and if some
get it wrong... If you really want to get it working (say, for personal
satisfaction), then clearing eofbit before seeking might do the trick.
I support that point of view. But now, reaching the end of your posting, |
your own argumentation
seems to support the interpretation that clearing the eofbit flag seems
necessary for a general
implementation.
I thank you very much for your very thorough answers,
Daniel
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Daniel Krügler (ne Spange Guest
|
Posted: Thu Sep 30, 2004 10:38 am Post subject: Re: eof issues - or: where is the end? |
|
|
Hello Hyman Rosen,
Hyman Rosen schrieb:
| Quote: | kanze (AT) gabi-soft (DOT) fr wrote:
It might be because he wants a function whose name is eof() to be
usable to test whether the stream is at eof() or not.
But that is an impossible wish. Just like a hot table in craps,
you can only know whether eof() was, not whether eof() will be.
Whether for terminals or for files, nothing prevents more input
from appearing in the future. That is, suppose I ask the system
for the length of a file I'm reading, and keep track of how many
characters I have already read. If the two amounts match, I may
wish an anticipatory eof() to return true. But in between that
call and the next read, some other process may have come along
and added more characters to the file.
While your argumenation is true, the proposed IsEof() should also return |
a reliable
result if the tested file is modified **after** the (first) test. Just
repeat the test and you will see,
that (ensuring proper flushing of the external writer) IsEof() should
return false now.
There was no requirement, that IsEof() should return the same result
twice in the sequence
in the volatile world of general streams:
const bool result1 = IsEof(is);
const bool result2 = IsEof(is);
assert(result1 == result2); // Not guaranteed in general!
My requirement was, that under the (controlled) conditions of my test,
where I know, that
the test stream is not modified by any external event (besides
uncontrollable ones, like hardware
crashes and so on), the above tested postcondition should be fulfilled.
This requirement is quite reasonable, I assume. Please take a look at
boost::filesystem library
which also provides functions like "exist" under the provision that
under not controlled
conditions the assertion of the following code sequence
boost::path path("Myfile.txt");
const bool result1 = exists(path);
const bool result2 = exists(path);
assert(result1 == result2); // Not guaranteed in general!
might fail.
Greetings from Bremen,
Daniel Krügler
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Fri Oct 01, 2004 5:06 pm Post subject: Re: eof issues - or: where is the end? |
|
|
Hyman Rosen <hyrosen (AT) mail (DOT) com> wrote
| Quote: | kanze (AT) gabi-soft (DOT) fr wrote:
It might be because he wants a function whose name is eof() to be
usable to test whether the stream is at eof() or not.
But that is an impossible wish.
|
It depends somewhat on your definition of being at eof(). I did point
out some of the restrictions as to what is possible.
| Quote: | Just like a hot table in craps, you can only know whether eof() was,
not whether eof() will be.
|
What I think he is asking is for the ability to be able to tell whether
eof() is. That is: at this very moment, when I call the function, am I
positionned behind the last character in the stream or not.
| Quote: | Whether for terminals or for files, nothing prevents more input from
appearing in the future. That is, suppose I ask the system for the
length of a file I'm reading, and keep track of how many characters I
have already read. If the two amounts match, I may wish an
anticipatory eof() to return true. But in between that call and the
next read, some other process may have come along and added more
characters to the file.
|
What you say is true in the absolute, but there are a lot of cases where
the approximation is sufficient. Even if you actually have a read that
fails, you can't be sure that it is definitive -- at least one program
under Unix (tail -f) waits, and tries again later after a failure.
There are a lot of programs, however, that can realistically ignore this
possibility. When the C++ compiler has finished reading your source
file, it has finished; it doesn't take into account the fact that you
could append more text to the file (even if the file is missing a
closing }). Ditto when I read a configuration file, and ditto for most
of my smaller tools. When doing low-level (character) IO, constructs
using a predictive eof() are a perfectly valid solution. (That said,
for such programs, I've never had any problem with simply using
in.peek() == EOF.)
--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Fri Oct 01, 2004 5:07 pm Post subject: Re: eof issues - or: where is the end? |
|
|
"Daniel Krügler (ne Spangenberg)" <dsp (AT) bdal (DOT) de> wrote
| Quote: | Hello James Kanze,
[email]kanze (AT) gabi-soft (DOT) fr[/email] schrieb:
"Daniel Krügler (ne Spangenberg)" <dsp (AT) bdal (DOT) de> wrote in message
news:<41580be6$0$289$4d4ebb8e (AT) businessnews (DOT) de.uu.net>...
[email]kanze (AT) gabi-soft (DOT) fr[/email] schrieb:
What's wrong with simply: input.peek() == EOF ?
[snip]
"After constructing a sentry object, reads but does not extract
the current input character."
The construction of the sentry failed in cases, where input was
**already** eof.
But if the construction of the sentry object fails, peek() is
specified to retunr EOF.
The fact, that peek() returns EOF in both cases was not the
problem. The problem occured in my test case, which was (also)
described by the following code-snippet:
std::fstream file;
file.open("empty.txt"); // nomen est omen...
CPPUNIT_ASSERT(!file.fail());
CPPUNIT_ASSERT(IsEof(file));
CPPUNIT_ASSERT(IsEof(file)); // (a)
CPPUNIT_ASSERT(!file.fail()); // Here the test failed!
Consider the case that peek is called for a stream which state is
already eof (a). According to the standard (and to my Dinkumware lib),
the istream::sentry object is constructed (27.6.1.1.2/p. 2)..
"[..] If is.rdbuf()->sbumpc() or is.rdbuf()->sgetc() returns
traits::eof(), the function calls setstate(failbit | eofbit) (which
may throw ios_base::failure)."
|
I can't find this in my copy of the standard, but my copy is still the
1999 version; the specification of sentry is so bad there that I imagine
that it has been changed. Normally, I wouldn't have expected sentry to
change the state of the local stream unless it was skipping blanks. But
I don't know what the current version of the standard says about this.
(sentry shouldn't call is.rdbuf()->sbumpc() or is.rdbuf()->sgetc() if
it's second parameter is false, but the Dinkumware site says that
failbit will be set "if, after any such preparation, istr.good() is
false", i.e. it does the check systematically, and not as a result of
trying to input anything. Given who Dinkumware is, I think it is safe
to assume that this implementation is conforming, at the very least, and
probably required.
In practice, I've always just tested in.peek() == EOF, and let it go.
Some compilers (e.g. Sun CC) do get this wrong, in that having seen EOF
once, the next call to peek() may see something else, but in practice,
I've not encountered this phenomena other than with keyboard input, and
the fact that my code might require two or more control-D's to terminate
keyboard input never really bothered me.
Anyway, I did some tests with the compiles available to me (Sun CC 5.1,
g++ 2.95.2, g++ 3.4.0 and VC++ 6.0). In general, there are two aspects
to consider: if peek() == EOF is guaranteed repeatable, you don't need
to set eofbit. So we have two potential versions of isEof, one which
sets eofbit (for compilers with broken filebuf), and one which doesn't
(for compilers which set failbit in sentry if eofbit is set). Which
means a compiler dependant solution, and will fail if you encounter a
compiler where filebuf is broken, but where sentry sets eofbit. Of the
compilers mentionned above:
filebuf broken sets failbit in sentry
g++ 2.95.2 : no no
g++ 3.4.0 : no yes
Sun CC 5.1 : yes no
VC++ 6.0 : no yes
At least among the compilers I have access to, one solution or the other
always works. And in fact, the only compiler where simply using peek()
(without setting eofbit) doesn't work is a very old version of Sun CC.
(On the other hand, the division doesn't follow the lines classic
iostream/standard iostream -- so I need yet another break-down in my
implementation dependencies.)
| Quote: | .. and adds the failbit flag to the stream state, which causes the
final test to fail.
Actually I am not sure, whether this attempt (to sgetc) is according
to the standard, because the same paragraph begins with the sentence:
"If is.good() is true, prepares for formatted or unformatted
input.[..]"
What can we conclude from this? Does it mean, that no further actions
should take place, if good() is false? Which part of the remaining
clause belongs to cases where the initial state of the stream was not
good? The Dinkumware lib does indeed add the failbit flag in any case
of a stream where good() returns false.
The clause seems rather fuzzy,
|
You don't say. Implicitly, if I read the C++99 standard rigorously, if
!good(), sentry does absolutely nothing (thus, does not set failbit),
but evaluates false. And if sentry evaluates false, the extractors do
absolutely nothing (and thus, do not set failbit). This is obviously
not right, since it means that if eofbit is true (and nothing else), no
input takes place, but failbit doesn't get set either.
There is actually an open issue (419) about this in the library defects
list. The proposed resolution says in part that "We believe that the
sentry's constructor should always set failbit when ok is false, and we
also think the standard already says that." If they are referring to
C++99, I really don't see how they can say that the standard already
says that; the only mention of failbit with regards to the constructor
of sentry says "During preparation, the constructor may call
setstate(failbit) (which may throw ios_base::failure)". Since
preparation doesn't occur if eofbit is set before construction of the
sentry object, this not only doesn't require failbit to be set if eofbit
is set on entry, it practically forbids it.
There is also an interesting citations of comments from Jerry Schwarz
(who is responsible for the original design of the classic iostreams),
concerning defect 195, which suggests that the "broken" filebuf's I
mention above aren't really broken according to his original intention,
and the peek should definitly set eofbit if it sees an eof (and of
course, no istream function should ever go to the streambuf if eofbit is
set).
I find the two a bit contradictory. IMHO, peek() should never set
failbit, regardless of what it reads. (But this is just an opinion.)
Perhaps more significantly, whether it sets failbit or not must be
coherent; it certainly shouldn't set failbit sometimes, and not others,
when it returns EOF.
I think the best solution would be to specify that peek() never calls
failbit. A possibly acceptable alternative would be that it always sets
failbit if it returns EOF. Having it set failbit sometimes, and not
others, is IMHO not acceptable.
But the problem is in the hands of the committee. The issue is still
open.
In the meantime, and regardless of what the committee does, you have to
deal with existing compilers. My first reaction would be to simply
specify that if a call to isEof() returns true, it is unspecified
whether failbit is set on the stream or not. Does this actually cause
any problems? Failing that, it would be easy to ensure that it is
always set. And finally, if you absolutely must not have it set,
nothing prevents you from resetting it in isEof. Something along the
lines of:
bool
isEof( std::istream& s )
{
bool result = ! is.good() ;
if ( ! result && is.peek() == EOF ) {
result = true ;
is.clear( std::ios::eofbit ) ;
}
return result ;
}
| Quote: | and paragraph 5 does not really enlighten me further, if ist says:
"[..] During preparation, the constructor may call setstate(failbit)[..]"
But independent of the theory (Standard), the current state was, that
the 2nd attempt to get the eof information of the stream (in my
original attempt) caused the stream to go into the fail state, which
is the side effect, I mentioned.
Speaking of unwanted side effects, there is one problem with peek():
it doesn't set eofbit when rdbuf()->sgetc() returns EOF. And I have
used actual implementations where rdbuf()->sgetc() was not
reproductable -- it could return EOF on one call, and something else
on another.
Is that observation the reason for your advice to stay away from the
implementation, which acts on the buffer instead of the whole stream?
|
Yes.
I've always considered this a bug, but the comments by Jerry Schwarz,
above, seem to indicate that this was originally the intended behavior.
| Quote: | "
if (!input) do_some_error_handling;
if (input.eof()) return true;
const bool result = (input.peek() == EOF);
if (result) input.clear(input.rdstate() & ~std::ios_base::eofbit);
return result;
"
Very good. You avoid the multiple read problem.
OK, I will use that. Thanks again!
|
Of course, if some day someone decides the peek() should set failbit
anytime it returns EOF...:-)
It's very hard to write robust code when you have to use poorly
specified interfaces. That's one of the problems a standard is supposed
to fix. Look's like it failed here, though.
| Quote: | I would go with the first. But I'm not sure what state change you
are talking about. Constructing the sentry object with the second
argument true (as does peek()) should never cause a state change in
the stream.
Interesting. Actually my observations where contrary to that
opinion. Do you think that this points to an errornous implementation
of the Dinkumware lib (as deascribed above)?
|
Currently, the wording in §27.6.1.1.2 is so vague (and parts of what is
clear are so obviously wrong) that I don't think any implementation
could be erroneous. The authors of the Dinkumware library may have
implemented a different interpretation that what I read, but in the end,
it is certainly a possible interpretation. Given that at least one of
the authors is very active in the library group of the standards, their
interpretation is probably more in line with the general direction that
the library group is taking than ours is.
[...]
| Quote: | I support that point of view. But now, reaching the end of your
posting, your own argumentation seems to support the interpretation
that clearing the eofbit flag seems necessary for a general
implementation.
|
Now that you've pointed out the problem you were having, I'd say that at
least testing it before trying anything else is necessary. While I
think that peek() == EOF alone should work, it most obviously has random
and implementation dependant side effects. If these side effects cause
a problem, then you have to work around them somehow. Regardless of
what the standard says or what I think is right. (I am, of course,
supposing that you are in a situation similar to mine, in which you are
paid to produce working code, and not excuses.)
--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|