C++Talk.NET Forum Index C++Talk.NET
C++ language newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Parsing large files

 
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ language (comp.lang.c++)
View previous topic :: View next topic  
Author Message
aditya.raghunath@gmail.co
Guest





PostPosted: Wed Sep 13, 2006 9:10 am    Post subject: Parsing large files Reply with quote



hi,

I'm trying to read text files and then parse them. Some of these files
are of several 100 Mbytes or in some cases GBytes. Reading them using
the getline function slows down my program a lot, takes more than 15-20
min to read them. I want to know efficient ways to read these files.
Any ideas??

TIA
Aditya
Back to top
Phlip
Guest





PostPosted: Wed Sep 13, 2006 9:10 am    Post subject: Re: Parsing large files Reply with quote



aditya.raghunath wrote:

Quote:
I'm trying to read text files and then parse them. Some of these files
are of several 100 Mbytes or in some cases GBytes. Reading them using
the getline function slows down my program a lot, takes more than 15-20
min to read them. I want to know efficient ways to read these files.
Any ideas??

Getline is reading them as strings, copying each string. That wastes time
both allocating a random sized block of memory, then copying in the CPU. A
hard drive has a DMA channel that its driver can exploit, but strings
probably can't use this.

Then, your OS and possibly your C++ are buffering the file ahead of the
string. This is partly because the read-write head keeps flying over the
file, so the drive buffer might as well take it in, and partly because some
Standard Library systems also buffer the file.

One way to fix this is not use getline(), and not copy the string. You
should stream each byte of your file into your program, and your program
should use a state table to parse and figure out what to do with each one.
This technique makes better use of the read-ahead buffers, and it ought to
lead to a better design.

Another way is to use OS-specific functions (which are off-topic here), to
map the file into memory. Then you can point into the file with a real C++
pointer. If you can then run this pointer from one end of the file to the
other, you should accurately exploit the DMA channel between the hard drive
and the CPU. Then, if your pointer instead skips around, you will at least
only use the OS's virtual paging mechanism to read and write the actual
file, with no intervening OS or C++ buffers.

Then next way is to use OS-specific functions that batch together many
commands to the driver of your hard drive. Obviously only an OS-specific
newsgroup can even advise you about these situations.

--
Phlip
http://www.greencheese.us/ZeekLand <-- NOT a blog!!!
Back to top
Jerry Coffin
Guest





PostPosted: Wed Sep 13, 2006 9:10 am    Post subject: Re: Parsing large files Reply with quote



In article <1158123721.689135.27420 (AT) h48g2000cwc (DOT) googlegroups.com>,
aditya.raghunath (AT) gmail (DOT) com says...
Quote:
hi,

I'm trying to read text files and then parse them. Some of these files
are of several 100 Mbytes or in some cases GBytes. Reading them using
the getline function slows down my program a lot, takes more than 15-20
min to read them. I want to know efficient ways to read these files.

You might try opening them with fopen and reading them with fgets
instead. There are quite a few standard libraries for which that
provides a substantial speed improvement.

There are also quite a few platform-dependent optimizations. For
example, on Windows you can often gain a substantial amount of speed by
opening files in binary (untranslated) mode, but doing the same on UNIX
or anything very similar normally won't make any difference at all.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Back to top
Display posts from previous:   
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ language (comp.lang.c++) All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.