 |
C++Talk.NET C++ language newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
Sherrie Laraurens Guest
|
Posted: Tue Jun 20, 2006 2:54 pm Post subject: Trouble with eof() |
|
|
Hi all,
I'm having some difficulty understanding the difference in
interpratation of the fstream's eof() method.
I basically try and read a file in and count the number of
characters I read in, using the code in the main routine I
always get one more (count1) than the second count I get
from using the code in load file.
Even when the file is empty, the eof() for the first loop
round returns true, which i believe is somewhat incorrect.
void loadfile(const std::string& file_name, std::string& buffer)
{
std::ifstream file(file_name.c_str(), std::ios::binary);
if (!file) return;
buffer.assign(std::istreambuf_iterator <char>
(file),std::istreambuf_iterator <char>());
file.close();
}
void main()
{
std::string file_name = "data.dat";
std::ifstream file(file_name.c_str(),std::ios::binary);
if (!file) return false;
unsigned int count = 0;
while (!file.eof())
{
file.get();
count++;
}
file.close();
std::string buffer;
loadfile(file_name, buffer);
std::cout << "count1: " << count << std::endl;
std::cout << "count2: " << buffer.length() << std::endl;
}
The only solution i can think up of at this time is to check eof once
i've
called get. if eof is true then to break, like so:
while (true)
{
file.get();
if (file.eof()) break;
count++;
}
I was just hoping someone on this list could explain to me
why eof doesn't work the way i'm thinking it should be working,
and if there is anyplace in the standards that describes how
eof should work, when it should return true and when it should
return false.
any help would be very much apprecaited.
regards
Sherrie
PS: I've tested it with gcc 3.4 and vs 2003 they both seem to
give the same results.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ] |
|
| Back to top |
|
 |
Hyman Rosen Guest
|
Posted: Tue Jun 20, 2006 7:07 pm Post subject: Re: Trouble with eof() |
|
|
Sherrie Laraurens wrote:
| Quote: | I'm having some difficulty understanding the difference in
interpratation of the fstream's eof() method.
|
Eof() on a stream is like a hot table at craps. You never know
whether either is true until after the fact.
OK, in less picturesque terms, eof() doesn't tell you whether
your next read will fail for lack of input, it tells you whether
your last read failed for lack of input. This makes more sense
from an implementation's point of view, because often it is not
possible to tell whether a read will fail without actually trying
it, and trying to read from some sources will block unless special
pains are taken.
This is the way it's always been in C and UNIX - the canonical cat
program is
int c;
while ((c = getchar()) != EOF)
putchar(c);
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ] |
|
| Back to top |
|
 |
Guest
|
Posted: Wed Jun 21, 2006 3:09 pm Post subject: Re: Trouble with eof() |
|
|
Sherrie Laraurens wrote:
| Quote: | I basically try and read a file in and count the number of
characters I read in, using the code in the main routine I
always get one more (count1) than the second count I get
from using the code in load file.
Even when the file is empty, the eof() for the first loop
round returns true, which i believe is somewhat incorrect.
[snip]
The only solution i can think up of at this time is to check eof once
i've
called get. if eof is true then to break, like so:
[snip]
I was just hoping someone on this list could explain to me
why eof doesn't work the way i'm thinking it should be working,
and if there is anyplace in the standards that describes how
eof should work, when it should return true and when it should
return false.
|
This is a case where a lot of people's intuition leads them astray.
Think about it this way, if it worked the way you expected, then
'feof' would be a prediction that a future read would fail. However,
the future read could still succeed, for example, if some outside
arrangement delayed the read until another process had a chance to
write more data to the file.
So ''eof' is acting like an error flag indicating that you overran the
end of the file. Not a prescience flag indicating that your next read
will overrun the end of the file, which it of course might not because
the file might be larger by then.
How could the system know that your next read will encounted an
end-of-file until it knows when you will perform that read and how many
bytes you will try to read?
DS
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ] |
|
| Back to top |
|
 |
kanze Guest
|
Posted: Wed Jun 21, 2006 3:15 pm Post subject: Re: Trouble with eof() |
|
|
Hyman Rosen wrote:
| Quote: | Sherrie Laraurens wrote:
I'm having some difficulty understanding the difference in
interpratation of the fstream's eof() method.
Eof() on a stream is like a hot table at craps. You never know
whether either is true until after the fact.
OK, in less picturesque terms, eof() doesn't tell you whether
your next read will fail for lack of input, it tells you
whether your last read failed for lack of input.
|
Not really. If eof() returns true, your next read will fail
(but if it returns false, it doesn't mean that your next read
will succeed). If fail() is true, eof() might be a clue as to
the reason---but it's very possible to construct cases where
fail() is set because of a format error, but eof() still returns
true.
| Quote: | This makes more sense from an implementation's point of view,
because often it is not possible to tell whether a read will
fail without actually trying it, and trying to read from some
sources will block unless special pains are taken.
|
It's almost necessary internally; once streambuf::sgetc() has
returned EOF, you're on unsure grounds if you access the
streambuf again. Basically, every access to the streambuf
should be wrapped with something like:
if ( ! stream.eof() ) {
ch = stream.rdbuf()->sgetc() ;
if ( ch == EOF ) {
stream.setstate( std::ios::eofbit ) ;
}
}
But such code is normally only found in operator<< and
operator>> functions. And not always there: most user defined
operator<< and operator>> simply decompose the operation is
lower level << and >>.
In practice, outside of operator<< and operator>>, about the
only place eof() might get called is after a failed input, e.g.:
while ( source >> someType ) {
// process read data...
}
if ( source.bad() ) {
// Hard read error...
} else if ( ! source.eof() ) {
// Format error...
} else {
// Maybe end of file, maybe format error in the last
// record of the file...
}
Note that in some cases, the check for eof() here is useless.
If you're reading lines, for example (with "getline( source,
line ) )" in the while), eof() will be set even if the last
line is missing the '\n' (a format error).
| Quote: | This is the way it's always been in C and UNIX - the canonical
cat program is
int c;
while ((c = getchar()) != EOF)
putchar(c);
|
Yes, but the real reason EOF has the semantics it has is that
the stream idiom mainly addresses formatted IO, and is designed
to support interactive devices as well. If you're reading
character by character from a file, implementing the predictive
eof() of Pascal is easy. If you're reading integers (where
leading white space is insignificant), it's slightly harder
(since you have to skip whitespace in order to know if you're at
end of file), and if you support mixing the two, it becomes
impossible: the next request to getchar() might legitimately
return ' ' or '\n', where as the next request to scanf("%d")
would result in failure due to end of file. And of course, any
predictive eof() implies reading ahead, which can lead to
problems when reading from an interactive device.
If you're doing pure character input from a file (or from an
interactive device if the read ahead won't be a problem), you
can use something like:
while ( source.peek() != EOF ) {
// a single character can now be read and processed...
}
I often use something like this for simple parsing jobs; I
suspect that it would have performance problems for larger jobs,
however, and will usually use some sort of higher abstraction.
--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ] |
|
| Back to top |
|
 |
kanze Guest
|
Posted: Fri Jun 23, 2006 2:59 pm Post subject: Re: Trouble with eof() |
|
|
Ulrich Eckhardt wrote:
| Quote: | Sherrie Laraurens wrote:
I'm having some difficulty understanding the difference in
interpratation of the fstream's eof() method.
I basically try and read a file in and count the number of
characters I read in, using the code in the main routine I
always get one more (count1) than the second count I get
from using the code in load file.
Even when the file is empty, the eof() for the first loop
round returns true, which i believe is somewhat incorrect.
void loadfile(const std::string& file_name, std::string& buffer)
{
std::ifstream file(file_name.c_str(), std::ios::binary);
if (!file) return;
buffer.assign(std::istreambuf_iterator <char>(file),
std::istreambuf_iterator <char>());
file.close();
}
Correct code, though there's no need for close()ing the
fstream
|
And how do you handle errors on the close. (Admittedly, for
input, there's not much chance of a new error occuring in the
close, so presumably, you could check before.)
| Quote: | and it lacks error-signalling. Other than that, take a look at
std::distance if you just need the length.
void main()
Please read the FAQ.
unsigned int count = 0;
while (!file.eof())
{
file.get();
count++;
}
Now, this code is broken. The point is that eof() only returns
true if (during some operation) the end of the file was
encountered, so you need to check between calling get() and
incrementing the counter. Alternatively, you could check the
returnvalue of get() whether it matches the eof value defined
in the char_traits (IIRC, need to look it up).
Idiomatic IOStream input then looks like this in C++:
while(true) {
in >> record;
if(in) {
handle(record);
continue;
}
|
Idiomatically, I'd say that as much as I dislike it in general,
this is one case where the update and the test are always
embedded:
while ( in >> record ) {
handle( record ) ;
}
| Quote: | if(in.eof())
// okay, reached end of file
|
Maybe. You don't know that for sure, however. The input could
just as easily have failed because of a format error.
| Quote: | break;
// not okay, parsing error
throw std::runtime_error("invalid stream content");
|
Or a hard IO error (disk read error, etc.).
| Quote: | }
The short form, using the implicit conversion of a stream to a
void* (which then can be used as boolean) is this:
while( in >> record)
handle(record);
if(!in.eof())
// not okay, parsing error before end of file
throw std::runtime_error("invalid stream content");
I was just hoping someone on this list could explain to me
why eof doesn't work the way i'm thinking it should be
working, and if there is anyplace in the standards that
describes how eof should work, when it should return true
and when it should return false.
Well, I dare say that any better documentation says that this
member function returns true when an input operation failed
(note the past tense) due to reaching EOF. It's probably just
that it doesn't do what you would expect it to do.
|
The names of the functions aren't necessarily well chosen:
good(), for example, is not the inverse of bad() (nor of
fail()). But the real problem is that some functions, like
eof(), are present principally for the use of operator<< and
operator>>, whereas others (e.g. fail()) are designed to give
useful information to the user. In the case of EOF, there is
also the problem that the streams don't distinguish between
internal eof (which means that the >> or << operator should no
longer try to read from the streambuf), and external eof()
(which means that the last input failed because there was no
more data to be read, and cannot occur on output).
--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ] |
|
| Back to top |
|
 |
kanze Guest
|
Posted: Fri Jun 23, 2006 3:01 pm Post subject: Re: Trouble with eof() |
|
|
davids (AT) webmaster (DOT) com wrote:
| Quote: | Sherrie Laraurens wrote:
I basically try and read a file in and count the number of
characters I read in, using the code in the main routine I
always get one more (count1) than the second count I get
from using the code in load file.
Even when the file is empty, the eof() for the first loop
round returns true, which i believe is somewhat incorrect.
[snip]
The only solution i can think up of at this time is to check
eof once i've called get. if eof is true then to break, like
so:
[snip]
I was just hoping someone on this list could explain to me
why eof doesn't work the way i'm thinking it should be
working, and if there is anyplace in the standards that
describes how eof should work, when it should return true
and when it should return false.
This is a case where a lot of people's intuition leads them
astray. Think about it this way, if it worked the way you
expected, then 'feof' would be a prediction that a future read
would fail.
|
Which is more or less what the standard requires. The problem
is the opposite: ! eof() guarantees in no way that a future read
will succeed.
| Quote: | However, the future read could still succeed, for example, if
some outside arrangement delayed the read until another
process had a chance to write more data to the file.
|
Once eof() returns true, any future read is guaranteed to fail
until you clear the condition.
The problem is more one of the difference between formatted and
unformatted reads. Suppose that the file contains just a single
'\n' character after the last read. Should a predictive eof()
return true or false? If the next read is formatted, it will
fail (because formatted reads always start by skipping white
space); if it is unformatted, it will succeed, and return '\n'
(or "\n"... or "", if the function in question is getline).
| Quote: | So ''eof' is acting like an error flag indicating that you
overran the end of the file.
|
Not quite. It is an internal signal to the stream that the end
of file has been seen. It may be because the stream was trying
to read user data, but it may also be because the stream was
reading ahead internally, in order to determine where the
currently requested data end.
Consider something like the following:
istringstream s( "123" ) ;
int i ;
s >> i ;
// At this point eof() is almost certainly true,
// although the last read succeeded.
Or worse:
istringstream s( "1.32e-" ) ;
double d ;
s >> d ;
// Failure, because the format of the double is incorrect,
// but eof() is almost certainly true, because the stream
// had to try to read before it could detect the error.
| Quote: | Not a prescience flag indicating that your next read will
overrun the end of the file, which it of course might not
because the file might be larger by then.
How could the system know that your next read will encounted
an end-of-file until it knows when you will perform that read
and how many bytes you will try to read?
|
The real key is: until it knows whether you want to skip white
space or not.
--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ] |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|