pouët.net

+ ? * in bnf specification

category: code [glöplog]
 
Dear lazyweb, could someone tell me what the * + ? means in this BNF? I don't think they're tokens
http://www.verilog.com/VerilogBNF.html
added on the 2013-05-16 02:48:22 by sigflup sigflup
It says in the definition table what they mean. * is 0 or more. + is 1 or more, and ? means it may or may not exist.
thanks
added on the 2013-05-16 02:56:14 by sigflup sigflup
Verilog .. NOOOOOOOOOOOooooooooooooo.............
added on the 2013-05-16 10:48:47 by trc_wm trc_wm
Ok, another thing. How do I deal with operator matching when the operator tokens overlap? Look at BINARY_OPERATOR and UNARY_OPERATOR. They both have '+' and '-', for instance. My tokenizer isn't aware of the parser so I have to match it with zero-knowledge.
added on the 2013-05-17 00:25:54 by sigflup sigflup
You can introduce a third token kind.

UNARY_OR_BINARY is one of the following tokens: + -
<unary_operator>
::= UNARY_OPERATOR
||= UNARY_OR_BINARY

<binary_operator>
::= BINARY_OPERATOR
||= UNARY_OR_BINARY


But I usually have a different token for each operator (PLUS, MINUS, TIMES...) and let the parser deal with it. It's useful in case all the binary operators don't have all the same precedence. (which happens almost all the time).
added on the 2013-05-17 10:07:15 by LLB LLB
Binary and unary operators won't necessarily occur in the same context, so if your parser is LL(x) the fact that they overlap may not be blocking.
added on the 2013-05-17 10:11:08 by ponce ponce
I'm very curious what you are making, sigflup. Please tell us more.
added on the 2013-05-17 10:34:30 by trc_wm trc_wm
LLB> yeah that's what I was thinking about doing.

ponce> but my tokenizer (lex) is not aware of context. I could write it into the parser grammar (yacc), if it's not consumed by a different token rule, which I don't think it is. Thank you

trc_wm> slowly a schematic editor for verilog. Did I mention slowly?
added on the 2013-05-17 16:53:05 by sigflup sigflup
Hmm.. OUTPUT_SYMBOL and LEVEL_SYMBOL overlap too. This can't be written into the parser grammar because they could very well be tokenized and IDENTIFIERS.

I think there is something fundamental that I'm not getting here.
added on the 2013-05-17 17:48:24 by sigflup sigflup
Yes, it's the same: Handle it in the grammar.

<level_symbol> ::= <OUTPUT_SYMBOL> | ? | b | B

Lexers are dumb, do the clever thing in the parser (btw, I never liked the distinction between lexers and parsers - I try to get rid of the lexer when I can).
added on the 2013-05-17 20:32:17 by LLB LLB
LLB> you'd like dparser then. It very much blurs the lexer and the parser. I finally ended up settling on it because there was an already written grammar.
added on the 2013-05-21 19:35:54 by sigflup sigflup
This is about one of my favourite subjects of computer science: formal languages.

There is a hierarchy of formal languages named after the famous linguist Noam Chomsky. The top of the hierarchy is the class of regular languages. Such languages can be specified by regular expressions, that is expressions that consist of literals as well as the operators + (allows the previous literal to be appended once or not at all) and * (allows the previous literal to be appended as many times as desired or not at all). Moreover, it is possible to use brackets in order to signify that several literals together form one kind of "super-literal" to which the operators + and * can be applied as well.

BNF is for context-free languages, which is the next class of languages below regular languages. Since any regular language is also context-free, regular expressions may be used in BNF as well. So this explains why these symbols appear in your code.

The next class of languages would be context-sensitive languages, followed by general languages. Moreover, theoretical computer science knows recursive languages, which are a proper superset of context-sensitive languages, and recursively enumerable languages, which are a proper superset of recursive languages. Furthermore, there are co-recursively enumerable languages, which are a proper superset of recursive languages as well. The class of recursive languages is the intersection of the classes of recursively enumerable and co-recursively enumerable languages. For recursive languages it is possible to specify abstract machines known as Turing deciders which accept a word if it is an element of the language and reject it if it is not. For recursively enumerable languages it is possible to specify Turing machines which accept a word if it is an element of the language and either reject it or enter an endless loop if it is not. For co-recursively enumerable languages it is possible to specify Turing machines which either accept a word or enter an endless loop if it is an element of the language and reject it if it is not.

A decision problem is a problem that is about deciding whether a given word is part of a given language. If the language is recursive, the problem is called decidable. If the language is recursively enumerable, the problem is called semi-decidable. This is the link between formal languages and computability theory.
added on the 2013-05-23 11:28:50 by Adok Adok

login