2010/04/27

Boost.Spirit in_state with Wide Character

I am writing grammar in wide characters. The following code is example3 of Spirit using wchar_t instead of char. (example.hpp is expended here.) Now this in_state seems not to work well using wchar_t.

The in_state causes compiling errors by the code.

Sure, in_state_skipper template class requires the template parameters, template <typename Skipper, typename String = char const*>, which String seems necessary to wide character instead of the single byte character. But, may the Skipper is something right?

//  Copyright (c) 2001-2010 Hartmut Kaiser
// 
//  Distributed under the Boost Software License, Version 1.0. (See accompanying 
//  file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

//  This example shows how to create a simple lexer recognizing a couple of 
//  different tokens and how to use this with a grammar. This example has a 
//  heavily backtracking grammar which makes it a candidate for lexer based 
//  parsing (all tokens are scanned and generated only once, even if 
//  backtracking is required) which speeds up the overall parsing process 
//  considerably, out-weighting the overhead needed for setting up the lexer.
//
//  Additionally, this example demonstrates, how to define a token set usable 
//  as the skip parser during parsing, allowing to define several tokens to be 
//  ignored.
//
//  This example recognizes couplets, which are sequences of numbers enclosed 
//  in matching pairs of parenthesis. See the comments below to for details
//  and examples.

// #define BOOST_SPIRIT_LEXERTL_DEBUG
// #define BOOST_SPIRIT_DEBUG

#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>

#include <iostream>
#include <fstream>
#include <string>

//#include "example.hpp"
//  Copyright (c) 2001-2010 Hartmut Kaiser
//  Copyright (c) 2001-2007 Joel de Guzman
// 
//  Distributed under the Boost Software License, Version 1.0. (See accompanying 
//  file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

#include <iostream>
#include <fstream>
#include <string>

#if 1
typedef wchar_t  Ch;
#define C(s)  L##s
#define Err   wcerr
#else
typedef char  Ch;
#define C(s)  s
#define Err   cerr
#endif

///////////////////////////////////////////////////////////////////////////////
//  Helper function reading a file into a string
///////////////////////////////////////////////////////////////////////////////
inline std::basic_string<Ch>
read_from_file(char const* infile)
{
    std::basic_ifstream<Ch> instream(infile);
    if (!instream.is_open()) {
        std::Err << C("Couldn't open file: ") << infile << std::endl;
        exit(-1);
    }
    instream.unsetf(std::ios::skipws);      // No white space skipping!
    return std::basic_string<Ch>(std::istreambuf_iterator<Ch>(instream.rdbuf()),
                       std::istreambuf_iterator<Ch>());
}



using namespace boost::spirit;

///////////////////////////////////////////////////////////////////////////////
//  Token definition
///////////////////////////////////////////////////////////////////////////////
template <typename Lexer>
struct example3_tokens : lex::lexer<Lexer>
{
    example3_tokens()
    {
        // define the tokens to match
        ellipses = C("\\.\\.\\.");
        number = C("[0-9]+");

        // associate the tokens and the token set with the lexer
        this->self = ellipses | C('(') | C(')') | number;

        // define the whitespace to ignore (spaces, tabs, newlines and C-style 
        // comments)
        this->self(C("WS")) 
   =   lex::token_def<lex::unused_type, Ch>(C("[ \\t\\n]+"))          // whitespace
            |   C("\\/\\*[^*]*\\*+([^/*][^*]*\\*+)*\\/")   // C style comments
            ;
    }

    // these tokens expose the iterator_range of the matched input sequence
 lex::token_def<lex::unused_type, Ch> ellipses, identifier, number;
};

///////////////////////////////////////////////////////////////////////////////
//  Grammar definition
///////////////////////////////////////////////////////////////////////////////
template <typename Iterator, typename Lexer>
struct example3_grammar 
  : qi::grammar<Iterator, qi::in_state_skipper<Lexer> >
{
    template <typename TokenDef>
    example3_grammar(TokenDef const& tok)
      : example3_grammar::base_type(start)
    {
        start
            =  +(couplet | tok.ellipses)
            ;

        //  A couplet matches nested left and right parenthesis.
        //  For example:
        //    (1) (1 2) (1 2 3) ...
        //    ((1)) ((1 2)(3 4)) (((1) (2 3) (1 2 (3) 4))) ...
        //    (((1))) ...
        couplet
            =   tok.number
            |   C('(') >> +couplet >> C(')')
            ;

        BOOST_SPIRIT_DEBUG_NODE(start);
        BOOST_SPIRIT_DEBUG_NODE(couplet);
    }

    qi::rule<Iterator, qi::in_state_skipper<Lexer> > start, couplet;
};

///////////////////////////////////////////////////////////////////////////////
int main()
{
    // iterator type used to expose the underlying input stream
    typedef std::basic_string<Ch>::iterator base_iterator_type;

    // This is the token type to return from the lexer iterator
    typedef lex::lexertl::token<base_iterator_type> token_type;

    // This is the lexer type to use to tokenize the input.
    // Here we use the lexertl based lexer engine.
    typedef lex::lexertl::lexer<token_type> lexer_type;

    // This is the token definition type (derived from the given lexer type).
    typedef example3_tokens<lexer_type> example3_tokens;

    // this is the iterator type exposed by the lexer 
    typedef example3_tokens::iterator_type iterator_type;

    // this is the type of the grammar to parse
    typedef example3_grammar<iterator_type, example3_tokens::lexer_def> example3_grammar;

    // now we use the types defined above to create the lexer and grammar
    // object instances needed to invoke the parsing process
    example3_tokens tokens;                         // Our lexer
    example3_grammar calc(tokens);                  // Our parser

    std::basic_string<Ch> str (read_from_file("example3.input"));

    // At this point we generate the iterator pair used to expose the
    // tokenized input stream.
    std::basic_string<Ch>::iterator it = str.begin();
    iterator_type iter = tokens.begin(it, str.end());
    iterator_type end = tokens.end();

 std::basic_string<Ch> ws = C("WS");

    // Parsing is done based on the the token stream, not the character 
    // stream read from the input.
    // Note how we use the lexer defined above as the skip parser.
 bool r = qi::phrase_parse(iter, end, calc,
//  qi::in_state< Ch const*>( C("WS"))[tokens.self] );
//  qi::in_state( C("WS"))[tokens.self] );
  qi::in_state(ws)[tokens.self] );

    if (r && iter == end)
    {
        std::cout << "-------------------------\n";
        std::cout << "Parsing succeeded\n";
        std::cout << "-------------------------\n";
    }
    else
    {
        std::cout << "-------------------------\n";
        std::cout << "Parsing failed\n";
        std::cout << "-------------------------\n";
    }

    std::cout << "Bye... :-) \n\n";
    return 0;
}

VC9 tell me the following:

error C2440 at lex/qi/state_switcher.hpp
'conversion' : cannot convert from 'const wchar_t *' to 'const char *const'.
at /qi/nonterminal/rule.hpp
'boost::spirit::qi::state_switcher_context<Subject,State>::state_switcher_context<const std::basic_string<_Elem,_Traits,_Ax>>(const boost::spirit::qi::state_switcher_context<Subject,const std::basic_string<_Elem,_Traits,_Ax>> &)' with Subject=boost::spirit::lex::reference<const boost::spirit::lex::detail::lexer_def_<boost::spirit::lex::lexer>>, State=const char *const , Elem=wchar_t, Traits=std::char_traits<wchar_t>, _Ax=std::allocator<wchar_t> ]
at qi/reference.hpp
'bool boost::spirit::qi::rule<Iterator,T1,T2,T3>::parse<Context,Skipper,Attribute>(Iterator &,const Iterator &,Context &,const Skipper &,Attribute &) const'
....
error C2439 at lex/qi/state_switcher.hpp
member could not be initialized
'boost::spirit::qi::state_switcher_context<Subject,State>::state' :
with
[
Subject=boost::spirit::lex::referencelt;const boost::spirit::lex::detail::lexer_def_lt;boost::spirit::lex::lexer>>, State=const char *const
at lex/qi/state_switcher.hpp
'boost::spirit::qi::state_switcher_contextlt;Subject,State>::state'
with
Subject=boost::spirit::lex::referencelt;const boost::spirit::lex::detail::lexer_def_lt;boost::spirit::lex::lexerlt;lexer_type>>>, State=const char *const

state_switcher_context's State is char...

qi::in_state<example3_tokens::lexer_def, Ch>(ws)[tokens.self] );
  • error C2275: 'example3_tokens' : illegal use of this type as an expression
  • error C2679: binary '[' : binary 'operator' : no operator defined which takes a right-hand operand of type 'boost::spirit::lex::detail::lexer_def_<LexerDef>' (or there is no acceptable conversion)
qi::in_state<example3_tokens::lexer_def(), Ch>(ws)[tokens.self] );
  • error C2512: 'boost::spirit::lex::detail::lexer_def_<LexerDef>::lexer_def_' : no appropriate default constructor available
    with
    [
    LexerDef=boost::spirit::lex::lexer
    ]
  • error C2679: ...

Explicit template parameter would not be used. I tried this on compiled single byte character version:
std::Err << typeid(qi::in_state(ws)).name() << std::endl << typeid(qi::in_state(ws)[tokens.self]).name() << std::endl;
The result is this:

struct boost::proto::exprns_::expr<struct boost::proto::tag::terminal,struct boo st::proto::argsns_::term<struct boost::spirit::terminal_ex<struct boost::spirit: :tag::in_state,struct boost::fusion::vector1<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > > >,0> struct boost::proto::exprns_::expr<struct boost::proto::tag::subscript,struct bo ost::proto::argsns_::list2<struct boost::proto::exprns_::expr<struct boost::prot o::tag::terminal,struct boost::proto::argsns_::term,class std::allocator<char> > > > >,0> &,struct boost::spirit::lex::detail::lexer_def_<class boost::spirit::le x::lexer,class std::allocator<char> >,struct boost::mpl::vector0<struct boost::mpl::na>,struct boost::mpl::bool_<1> >,class std::_String_iterator<char,struct std::char_traits <char>,class std::allocator<char> >,class boost::spirit::lex::lexertl::functor<s truct boost::spirit::lex::lexertl::token<class std::_String_iterator<char,struct std::char_traits<char>,class std::allocator<char> >,struct boost::mpl::vector0< struct boost::mpl::na>,struct boost::mpl::bool_<1> >,struct boost::spirit::lex:: lexertl::detail::data,class std::_String_iterator<char,struct std::char_traits<c har>,class std::allocator<char> >,struct boost::mpl::bool_<0>,struct boost::mpl: :bool_<1> > > > > &>,2>

No comments:

How to set parameters to debugging program on Visual Studio 2019 with CMake

Solution: MSDN Sometimes the "Debug and Launch Settings for CMake" bottun is disabled. In this case, change to the target view. ...