PXML **** An XML-like markup language for ParGen definitions. .. contents:: :local: Format ====== * File extention: ``.pxml`` * The 1st line of pxml file should be ```` * The root element should be ```` in every pxml file, even if the files are included instead of direct use. * Tags can be inlined by using ``/>`` * Supported attribute value types: - boolean (true | false) - string - double * Comments are enclosed with ``{%`` ``%}``, can be nested * Character ``<`` should be escaped as ``\<`` in text Example: .. code-block:: pxml {% comment %} {% header file definitions %}
#include \ #include \ #include \
====== The root element of PXML document **Attributes** * namespace : The outmost namespace for generated lexer/parser - default : ``"Pargen"`` **Children** Any tags and texts except for ```` ========= Include other PXML document to this document **Attributes** * src : ``[Required]`` Path of included document ======== Define tokens **Attributes** * class : Token class name - default : ``"Token"`` * namespace : Namespace of token types - default : ``"Tokens"`` * headerFile : Path of output token header file - default : ``"Token.hpp"`` * sourceFile : Path of output token source file - default : ``"Token.cpp"`` **Children** ``
``, ````, ````, ````, ````, ```` ======= Define token **Attributes** * name : ``[Required]`` Token name **Children** ````, ````, ````, ````
======== C++ codes appending into header file **Attributes** * position <"top"|"bottom">: Appending position - default : ``"top"`` * indent : Code indention - default : ``4`` - ``0`` : no indention - negative value : keep same indention as pxml file (like HTML `
`)

**Children**

C++ codes


========

C++ codes appending into source file

**Attributes**

* position  : Appending position, ``"top"`` or ``"bottom"``
    - default : ``"top"``

* indent  : Code indention
    - default : ``4``
    - ``0`` : no indention
    - negative value : keep same indention as pxml file (like HTML `
`)

**Children**

C++ codes


========

Definition of class member

**Attributes**

* indent  : Code indention
    - default : ``4``
    - ``0`` : no indention
    - negative value : keep same indention as pxml file (like HTML `
`)

**Children**

C++ class member definition


==========

Definition of class member function

**Attributes**

* indent  : Code indention
    - default : ``4``
    - ``0`` : no indention
    - negative value : keep same indention as pxml file (like HTML `
`)

**Children**

C++ function definition


======

Indicate a C++ type

**Children**

C++ type


=======

Define lexer

A special rule without any attributes can specify custom end-of-file rule.

Only one end-of-file is allowed in lexer.

**Attributes**

* class  : Lexer class name
    - default : ``"Lexer"``

* headerFile  : Path of output lexer header file
    - default : ``"Lexer.hpp"``

* sourceFile  : Path of output lexer source file
    - default : ``"Lexer.cpp"``

* newLine  : Define substring as new-line
    - default : ``\"``

* return  : Return type of get()
    - default : ``void`` if ```` not present, or same as ```` class

**Children**

````, ````, ````, ``
``, ````, ````, ```` ====== Define a rule in lexer **Attributes** * id : A unique id for , can only be lower/upper case alphabetic, digits and _ * pattern : Token match pattern, support the following grammar: Characters: - alphabetic: a-z, A-Z - underscore: _ - space - punctuators: ``~``, `````, ``!``, ``@``, ``#``, ``%``, ``&``, ``=``, ``:``, ``"``, ``'``, ``<``, ``>``, ``/`` - digits: 0-9 - escape characters: + ``\t`` : horizontal tab + ``\r`` : return + ``\v`` : vertical tab + ``\f`` : line feed + ``\n`` : new line + hexadecimal character : like ``\x0a``, should be 2 digits + ``\\``, ``\?``, ``\^``, ``\$``, ``\(``, ``\)``, ``\*``, ``\+``, ``\-``, ``\{``, ``\}``, ``\|``, ``\.``, ``\,`` : punctuators - character class: + ``\d`` : [0-9] + ``\D`` : NOT [0-9] + ``\w`` : [0-9a-zA-Z] + ``\W`` : NOT [0-9a-zA-Z] + ``\s`` : [ \\t\\r\\v\\f\\n] + ``\S`` : NOT [ \\t\\r\\v\\f\\n] + ``\a`` : [a-zA-Z] + ``\A`` : NOT [a-zA-Z] Ranges - range: like ``[0-9]`` OR operation: Example: ``(lhs|rhs)``: ``lhs`` or ``rhs`` Group: Example: ``(lhs)?``: one or zero ``lhs`` Repeat: - ``?``: one or zero times - ``+``: one or more times - ``*``: zero or more times - ``{N}``: ``N`` times (``N`` is an integer) - ``{N,}``: ``N`` or more times (``N`` is an integer) - ``{N, M}``: ``N`` to ``M`` times (``N``, ``M`` are integers) Wildcard: - ``.``: any supported character - ``$``: end-of-file * push : State name to push into stack * pop : Pop current group from stack * more : Consume the matched text for further $$ If both push and pop specified, stack will pop current group then push new group. * indent : Code indention - default : ``4`` - ``0`` : no indention - negative value : keep same indention as pxml file (like HTML `
`)

**Children**

C++ codes that may return a token.

The following replacement variables can be used in the codes:

* _text : The matched text, from current pattern and previous ``more``

* _pos : The location of matched text




=======

Define a group in lexer

**Attributes**

* name  : ``[Required]`` Group name

**Children**

````, ````, ````


=====

Use a rule in lexer

**Attributes**

* id  : ``[Required]`` The rule name to use


========

Define parser

**Attributes**

* class  : Parser class name
    - default : ``"Parser"``

* headerFile  : Path of output parser header file
    - default : ``"Parser.hpp"``

* sourceFile  : Path of output parser source file
    - default : ``"Parser.cpp"``

* start  : Start grammar
    - default : the first ````

* mode  : Mode of parser table (Auto \| LALR \| GLR)
    - default : Auto

    In ``Auto`` mode, it will first use LALR, then switch to GLR if there're conflicts

* return  : Return type of parse()
    - default : ``void``

**Children**

````, ````, ``
``, ````, ````, ```` ======== Define a group of grammers that generate the same non-terminal as target **Attributes** * name : ``[Required]`` Target name * type : C++ type of generated target object **Children** ```` ========= Define a grammar in parser Grammar without pattern can be used to specify empty generation. **Attributes** * pattern : Grammar generation pattern, as a space-separated sequence of token or target Example: "Token1 target1 Token2" A special token, ``EOF``, is used for end-of-file * indent : Code indention - default : ``4`` - ``0`` : no indention - negative value : keep same indention as pxml file (like HTML `
`)

**Children**

C++ codes that may return a generated object.

The following replacement variables can be used in the codes:

* _this : Reference of parser.

* _op : The operands of grammar. Use ``_opN`` to access parameter at index ``N``.

* _pos : The location of matched text as array. Use ``_pos[N]`` to access position at index N.