C++Talk.NET Forum Index C++Talk.NET
C++ language newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

YARD : Generic regular expression parser

 
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ language (comp.lang.c++)
View previous topic :: View next topic  
Author Message
christopher diggins
Guest





PostPosted: Fri Dec 10, 2004 11:04 pm    Post subject: YARD : Generic regular expression parser Reply with quote



There seems to be a gazillion regular expression libraries. Most of them
only work on text, but I wanted something that also worked on arbitrary
sequences of data ( this is useful, for instance, in building parse trees
from token lists ). This is possible, I think, using the Spirit library from
Boost, but the syntax and complexity again is too much for me. I almost
finished the YARD ( yet another recursive descent ) parser which is a really
lightweight truly generic regex parser (and runs like a bat out of hell).
Anyway, the syntax is essentially as follows:

You define rules as follows:

typedef CharRange_parser<'a', 'z'> LowerCaseLetter_parser;
typedef CharRange_parser<'A', 'Z'> UpperCaseLetter_parser;
typedef CharRange_parser<'0', '9'> Number_parser;
typedef re_or<LowerCaseLetter_parser, UpperCaseLetter_parser> Letter_parser;
typedef re_or<Letter_parser, Char_parser<'''> > WordChar_parser;
typedef re_plus<WordChar_parser> Word_parser;
typedef re_or<Letter_parser, Char_parser<'_'> > IdentFirstChar_parser;
typedef re_or<IdentFirstChar_parser, Number_parser> IdentOtherChar_parser;
typedef re_and<IdentFirstChar_parser, re_star
Ident_parser;

Then you hand them to a tokenizer as follows:

int main ()
{
nBufSize = GetFileSize(sFileName);
pBuf = static_cast<char*>(calloc(nBufSize, 1));
ifstream f;
f.open(sFileName);
f.read(pBuf, nBufSize);
f.close();
Tokenizer<Word_parser> tknzr;
tknzr.Parse(pBuf, nBufSize);
OutputTokens(tknzr.Begin(), tknzr.End());
free(pBuf);
getchar();
return 0;
}

A tokenizer in this case is really simple:

template<typename Parser_T>
struct Tokenizer {
void Parse(char* pText, int nSize)
{
ParseInputStream stream(pText, nSize);
while (!stream.AtEnd()) {
int index = stream.GetIndex();
if (Rules_T::Accept(stream)) {
mTkns.push_back(Token(index, stream.GetIndex()));
}
stream.GotoNext();
}
}
TokenIter Begin() { return mTkns.begin(); }
TokenIter End() { return mTkns.end(); }
private:
TokenList mTkns;
};

What I want to know is this obvious to programmers how it works and how to
use it? Is the verbosity acceptable? Also, would it interest people more if
I showed some benchmarks comparing it to other libraries?

TIA

--
Christopher Diggins
http://www.cdiggins.com
http://www.heron-language.com


Back to top
Markus Elfring
Guest





PostPosted: Wed Jan 05, 2005 7:34 pm    Post subject: Re: YARD : Generic regular expression parser Reply with quote



Can the definitions that are described in the section "7 Regular expressions
[tr.re]" of the document "(Draft) Technical Report on Standard Library
Extensions" be changed with other template parameters to match your
suggested use cases?
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1687.pdf


Back to top
christopher diggins
Guest





PostPosted: Wed Jan 05, 2005 8:05 pm    Post subject: Re: YARD : Generic regular expression parser Reply with quote




"Markus Elfring" <Markus.Elfring (AT) web (DOT) de> wrote

Quote:
Can the definitions that are described in the section "7 Regular
expressions
[tr.re]" of the document "(Draft) Technical Report on Standard Library
Extensions" be changed with other template parameters to match your
suggested use cases?
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1687.pdf


Sorry but I don't quite understand the question ( nor the document ), could
you explain more?

--
Christopher Diggins
http://www.cdiggins.com
http://www.heron-language.com



Back to top
Markus Elfring
Guest





PostPosted: Sun Jan 09, 2005 6:53 pm    Post subject: Re: YARD : Generic regular expression parser Reply with quote

Quote:
Sorry but I don't quite understand the question ( nor the document ),
could
you explain more?

What don't you understand from the referenced document?
Would you like to reuse anything from this template library for regular
expressions that is in development?

When do you want a regexp to be evaluated?
Compile (Boost::Spirit / Phoenix) or run time?

Regards,
Markus



Back to top
Display posts from previous:   
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ language (comp.lang.c++) All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.