C++Talk.NET Forum Index C++Talk.NET
C++ language newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

help needed using ifstream::seekg with windows text file

 
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ language (comp.lang.c++)
View previous topic :: View next topic  
Author Message
wtnt
Guest





PostPosted: Fri Jan 30, 2004 12:25 am    Post subject: help needed using ifstream::seekg with windows text file Reply with quote



Hello.
I've searched all over and haven't seen another thread with this
problem. Please bear with me as I try to explain. thanks. :)

I have some programs that need to be cross-platform compatible (unix
and windowsXP). The first program parses a text file and records where
snippets are in terms of where it begins (char offset from begin of
the file) and length (number of chars).

One can almost use "byte" and "char" interchangeably here, given that
sizeof(char) is 1, however it doesn't quite work that way.

The second program tries to get things from this text file with the
information collected from program1. program2 tries to use seekg and
read. something like:

char* snippet = new char[length + 1];
ifstream read(file); // OR ifstream read(file, ios::binary);
read.seekg(offset, ios::beg);
read.read(snippet,length);
read.close();:
snippet[length] = '';

The problem is that when the code is as above in text mode, while read
actually reads in the number of characters, seekg seeks to the number
of bytes. So, if I seek to 0, I read in exactly what I need. If I need
to seek to any char length > 0, it doesn't seek to my next offset
correctly, and it appears that the number of characters it's missing
(short) is correlated with the number of newlines there are previous
to that point in the file. (read still reads in the correct number of
characters from that point)

I know that windows and unix treat newlines differently. But I can't
quite understand the behavior on windows to get it to do what I want.
It behaves as if a newline is 1 char with sizeof 2 bytes. (although if
I go through it character by character and print out sizeof(*c) I
never get anything that is 2, it's always 1.)

If I set it to binary mode, both seekg and read work in terms of
bytes. Successive seeks read in snippets one right after the other as
they appear in the text file with no overlap (actually, it drops 1
character in between for which I have no explanation). And each
snippet has fewer visible number of characters than its length.

This problem does not occur on unix and does not occur on windows if I
transfer a text file from unix in "binary" mode so that the n's don't
get replaced by windows. In these instances, program2 behaves as
expected regardless of whether it's in binary or text mode.

Is there any way to get a file pointer to seek to a place in a text
file according to the number of characters, like the way read behaves
in text mode? This would be the simplest solution.

I guess one could say get program1 to store the snippet information in
terms of bytes and not number of characters. I'm not sure how to do
that. What it does is count characters, and on windows the newline is
still counting as 1 character. I guess I could add an extra byte every
time there is a windows-style newline, but I'm not even sure my
assessment that a windows newline = 1 char of 2 bytes is actually
true.

What would be a more elegant solution that would work on both
platforms?

Thank you for your help.
Back to top
Jonathan Turkanis
Guest





PostPosted: Fri Jan 30, 2004 6:15 am    Post subject: Re: help needed using ifstream::seekg with windows text file Reply with quote



"wtnt" <wtnt (AT) konzoo (DOT) com> wrote

Quote:
Hello.
I've searched all over and haven't seen another thread with this
problem. Please bear with me as I try to explain. thanks. :)

I have some programs that need to be cross-platform compatible (unix
and windowsXP). The first program parses a text file and records
where
snippets are in terms of where it begins (char offset from begin of
the file) and length (number of chars).

One can almost use "byte" and "char" interchangeably here, given
that
sizeof(char) is 1, however it doesn't quite work that way.

You simply can't store offsets into files portably in text mode. In
binary mode you can, if you remember that there may be an arbitrary
number of null characters added at the end of the file. See the
Dinkumware online documentation for an explanation.
(http://www.dinkumware.com/refxcpp.html. Go to the C++ table of
contents, and look under "Files and Streams"). Alternatively, see
P.J.Plauger's book on the C standard library.

Jonathan



Back to top
Display posts from previous:   
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ language (comp.lang.c++) All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.