Rendered at 11:55:33 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
sph 5 hours ago [-]
Beautiful work! I'm not even gonna wonder if any of it was AI-generated, because the code is clearly crafted meticulously by an experienced C engineer, very readable, and shorter than I expected.
RossBencina 2 hours ago [-]
Now all it needs is a parser in 'examples/' that parses EBNF grammars and emits a parser in terms of these combinators.
zombot 2 hours ago [-]
So many parser combinators operate on bytes assuming ASCII input only. I'd be more interested in a parser combinator lib that has UTF-8 decoding already abstracted away, operating on `wchar_t`, or even polymorphic input stream element types.
Joker_vD 22 minutes ago [-]
I'd rather not. Most of the time, you don't need it, and when you do, it's for a very small part of the input. And `wchar_t` is an abomination (it's UTF-32 on Linux, UTF-16 on Windows, and all of that is allowed); you probably really want `char32_t`, and again, not for the whole of the input; streaming such data a single rune/codepoint at a time is probably fine as well for most uses.
On the other hand, if your parser combinators process char-by-char, then maintaining a small "is this valid UTF-8 so far" context on the side should be pretty simple, so providing it would be an useful option, but actually decoding? Please don't.
lokeg 2 hours ago [-]
Isn't working with the utf8 stream sufficient? Especially if you only have ASCII keywords/operators/brackets, I feel a ASCII parser should work with utf8 out of the box
t-3 38 minutes ago [-]
Yeah, a parser has no need to understand what a string or glyph is, let alone ASCII or UTF-8. The point is to take a stream of arbitrary data and process it into something that can be reasoned about. Unless you know your input stream is regular in some way, processing it at the finest level of granularity (usually bytes) is probably the only thing to do.
On the other hand, if your parser combinators process char-by-char, then maintaining a small "is this valid UTF-8 so far" context on the side should be pretty simple, so providing it would be an useful option, but actually decoding? Please don't.