Alexander Morou
Sponsor
Well,
I'm finally to the syntactical analysis phase, and let me tell you, it's no joke.
On top of requiring a good understanding of the overall language's interconnected structure, you have to know other things, like where/how the data-dependencies lie. Right as it is now, I'm focusing on the token look ahead exploration for each of the rules, as well as the individual rule data exports, in two forms. The data-exports are basically the information the rules provide to whomever uses the parser, I've decided on two formats: Verbatim and Declared. Declared is a bare-bones version of the original, containing information on a need-to-know basis, unnamed elements are discarded. Verbatim is as it sounds: a data set in the order in which the rule was defined, and in which the data was encountered. This way if you're interested in stepping the data-structure you can do so in the order in which it was parsed. Depending on whether I have any issues with the Verbatim format, determines whether I drop it or not.
First thing's first: Look-ahead exploration. So far I've been able to do the simplest exploration, and know from the start rule what's available in each individual rule's starting context. The next step involves using what I have and comparing it to what's next, based upon what's next will determine how far the look-ahead goes. On top of all this, I need to build a relational map token->token. This map will tell me at a moment's glance what overlaps where, so if keyword and identifier are in the same context, it knows they're equivalent to one another, or if two keyword-like tokens are encountered together, where they overlap, so only cases where the overlap is pertinent are explored.
I'm thinking each parse method will have a 'parseState' parameter, since a lot of the initial look-ahead analysis will overlap, it'd be pointless to regularly re-check the look-ahead to make the same decisions over and over. The parseState will relate to what level of look-ahead has been performed, this will determine where into the code it jumps.
This doesn't even begin to cover ambiguity cases, I'll have to figure that out when I hit them, I'm guessing it'll be here shortly. If anyone here has any experience in this, please, don't hold back.
I'm finally to the syntactical analysis phase, and let me tell you, it's no joke.
On top of requiring a good understanding of the overall language's interconnected structure, you have to know other things, like where/how the data-dependencies lie. Right as it is now, I'm focusing on the token look ahead exploration for each of the rules, as well as the individual rule data exports, in two forms. The data-exports are basically the information the rules provide to whomever uses the parser, I've decided on two formats: Verbatim and Declared. Declared is a bare-bones version of the original, containing information on a need-to-know basis, unnamed elements are discarded. Verbatim is as it sounds: a data set in the order in which the rule was defined, and in which the data was encountered. This way if you're interested in stepping the data-structure you can do so in the order in which it was parsed. Depending on whether I have any issues with the Verbatim format, determines whether I drop it or not.
First thing's first: Look-ahead exploration. So far I've been able to do the simplest exploration, and know from the start rule what's available in each individual rule's starting context. The next step involves using what I have and comparing it to what's next, based upon what's next will determine how far the look-ahead goes. On top of all this, I need to build a relational map token->token. This map will tell me at a moment's glance what overlaps where, so if keyword and identifier are in the same context, it knows they're equivalent to one another, or if two keyword-like tokens are encountered together, where they overlap, so only cases where the overlap is pertinent are explored.
I'm thinking each parse method will have a 'parseState' parameter, since a lot of the initial look-ahead analysis will overlap, it'd be pointless to regularly re-check the look-ahead to make the same decisions over and over. The parseState will relate to what level of look-ahead has been performed, this will determine where into the code it jumps.
This doesn't even begin to cover ambiguity cases, I'll have to figure that out when I hit them, I'm guessing it'll be here shortly. If anyone here has any experience in this, please, don't hold back.