 |
C++Talk.NET C++ language newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
christopher diggins Guest
|
Posted: Fri Dec 10, 2004 11:04 pm Post subject: YARD : Generic regular expression parser |
|
|
There seems to be a gazillion regular expression libraries. Most of them
only work on text, but I wanted something that also worked on arbitrary
sequences of data ( this is useful, for instance, in building parse trees
from token lists ). This is possible, I think, using the Spirit library from
Boost, but the syntax and complexity again is too much for me. I almost
finished the YARD ( yet another recursive descent ) parser which is a really
lightweight truly generic regex parser (and runs like a bat out of hell).
Anyway, the syntax is essentially as follows:
You define rules as follows:
typedef CharRange_parser<'a', 'z'> LowerCaseLetter_parser;
typedef CharRange_parser<'A', 'Z'> UpperCaseLetter_parser;
typedef CharRange_parser<'0', '9'> Number_parser;
typedef re_or<LowerCaseLetter_parser, UpperCaseLetter_parser> Letter_parser;
typedef re_or<Letter_parser, Char_parser<'''> > WordChar_parser;
typedef re_plus<WordChar_parser> Word_parser;
typedef re_or<Letter_parser, Char_parser<'_'> > IdentFirstChar_parser;
typedef re_or<IdentFirstChar_parser, Number_parser> IdentOtherChar_parser;
typedef re_and<IdentFirstChar_parser, re_star
Ident_parser;
Then you hand them to a tokenizer as follows:
int main ()
{
nBufSize = GetFileSize(sFileName);
pBuf = static_cast<char*>(calloc(nBufSize, 1));
ifstream f;
f.open(sFileName);
f.read(pBuf, nBufSize);
f.close();
Tokenizer<Word_parser> tknzr;
tknzr.Parse(pBuf, nBufSize);
OutputTokens(tknzr.Begin(), tknzr.End());
free(pBuf);
getchar();
return 0;
}
A tokenizer in this case is really simple:
template<typename Parser_T>
struct Tokenizer {
void Parse(char* pText, int nSize)
{
ParseInputStream stream(pText, nSize);
while (!stream.AtEnd()) {
int index = stream.GetIndex();
if (Rules_T::Accept(stream)) {
mTkns.push_back(Token(index, stream.GetIndex()));
}
stream.GotoNext();
}
}
TokenIter Begin() { return mTkns.begin(); }
TokenIter End() { return mTkns.end(); }
private:
TokenList mTkns;
};
What I want to know is this obvious to programmers how it works and how to
use it? Is the verbosity acceptable? Also, would it interest people more if
I showed some benchmarks comparing it to other libraries?
TIA
--
Christopher Diggins
http://www.cdiggins.com
http://www.heron-language.com
|
|
| Back to top |
|
 |
Markus Elfring Guest
|
Posted: Wed Jan 05, 2005 7:34 pm Post subject: Re: YARD : Generic regular expression parser |
|
|
Can the definitions that are described in the section "7 Regular expressions
[tr.re]" of the document "(Draft) Technical Report on Standard Library
Extensions" be changed with other template parameters to match your
suggested use cases?
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1687.pdf
|
|
| Back to top |
|
 |
christopher diggins Guest
|
Posted: Wed Jan 05, 2005 8:05 pm Post subject: Re: YARD : Generic regular expression parser |
|
|
"Markus Elfring" <Markus.Elfring (AT) web (DOT) de> wrote
Sorry but I don't quite understand the question ( nor the document ), could
you explain more?
--
Christopher Diggins
http://www.cdiggins.com
http://www.heron-language.com
|
|
| Back to top |
|
 |
Markus Elfring Guest
|
Posted: Sun Jan 09, 2005 6:53 pm Post subject: Re: YARD : Generic regular expression parser |
|
|
| Quote: | Sorry but I don't quite understand the question ( nor the document ),
could
you explain more?
|
What don't you understand from the referenced document?
Would you like to reuse anything from this template library for regular
expressions that is in development?
When do you want a regexp to be evaluated?
Compile (Boost::Spirit / Phoenix) or run time?
Regards,
Markus
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|