Work in progress: Writing a replacement parser for Tridactyl's command-line and rcfiles, in the spirit of Vi's command-line / Vimscript / ex.
Documentation: https://excmd.js.org/
Notable entry points to spelunking:
- Complete JavaScript module index: https://excmd.js.org/globals.html
- Complete OCaml module index: https://excmd.js.org/excmd/Excmd/index.html
- A note on the resolution of ambiguous shellwords: https://excmd.js.org/excmd/Excmd/Expression/index.html#reso
- Parsing entry-point functions: https://excmd.js.org/excmd/Excmd/Parser/index.html#parsing-entry-points
Under development. Probably.
For now (read: until I, or somebody else, publishes a packaged copy of Menhir to npm!), a local OCaml development-environment, matching the version of BuckleScript's fork of OCaml, is required.
Here's a quick, up-to-date bootstrapping process for ~spring 2021:
git clone https://github.com/ELLIOTTCABLE/excmd.js.git
cd excmd.js
sh <(curl -sL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)
# (... or install opam using your platform's package-manager)
opam switch create ./packages/bs-excmd --deps-only --locked --ignore-constraints-on=ocaml
eval $(opam env --switch=./packages/bs-excmd --set-switch)
# Finally, install JavaScript dependencies, BuckleScript, and kick off the initial build
npm run bootstrap
Thereafter, when returning to the project, and before running lerna run build
or any other OCaml-
dependant commands, you have to remember to run eval $(opam env)
to add the OCaml binaries to
your shell's $PATH
:
# After i.e. `cd ~/Code/excmd.js`
eval $(opam env --switch=./packages/bs-excmd --set-switch)
Finally, after all of the above, you can let Lerna kick off the rest of the build, orchestrated by
bsb
, tsc
, and Ninja, variously:
lerna run build
There are two packages comprising this project, to be published separately to npm:
-
packages/bs-excmd/
: the lexer and parser themselves; written in OCaml using Sedlex and Menhir, and compiled to JavaScript using the ReScript compiler (née BuckleScript), published to npm asbs-excmd
... -
packages/excmd/
: ... and a thin TypeScript wrapper providing idiomatic JavaScript interfaces to the parser modules, published to npm as the primary package,excmd
.
Lerna, a JavaScript-ecosystem monorepo/multi-package management tool, orchestrates the building of these two interdependent subpackages.
If you're hacking on this (or writing something other than JavaScript), it's useful to know that the project has a hybrid build-system, and can be built from the OCaml side (via Dune) or the JavaScript side (via ReScript.)
Handily, Dune supports dynamically building an OCaml interactive toplevel with any/all OCaml modules included:
cd packages/bs-excmd/
dune utop src
For my own expediency when iterating (sry not sry), the actual parser tests (as opposed to tests for the JavaScript interface, the lexing, or the string-handling minutiae) are also written in native OCaml, and evaluated by Dune:
cd packages/bs-excmd/
dune runtest
# After making changes, and verifying that the output is as-expected,
dune promote
Finally, the test-executable can interrogate arbitrary input, dumping the result in the same JSON-format as the tests use:
cd packages/bs-excmd/
dune exec test/parser_test.exe expression "hello"
dune exec test/parser_test.exe script "hello; there; friend"
-
To debug the parser, these Menhir flags are particularly useful:
--log-automaton 1 --log-code 1 --log-grammar 1 --trace
. I've added those to an alternative"generator"
inpackages/bs-excmd/bsconfig.json
; simply swap the"name": "menhir"
generator belonging to the"parserAutomaton.ml"
edge with the"menhir-with-logging"
one:--- i/packages/bs-excmd/bsconfig.json +++ w/packages/bs-excmd/bsconfig.json @@ -9,7 +9,7 @@ { "name": "prepend-uax31", "edge": ["lexer.ml", ":", "uAX31.ml", "lexer.body.ml"] }, { "name": "menhir-tokens", "edge": ["tokens.ml", "tokens.mli", ":", "parserAutomaton.mly", "tokens.tail.ml", "tokens.tail.mli"] }, { "name": "menhir-lib", "edge": ["menhirLib.ml", "menhirLib.mli", ":", "parserAutomaton.mly"] }, - { "name": "menhir", "edge": ["parserAutomaton.ml", ":", "parserAutomaton.mly", "parserUtils.mly", "tokens.ml"] } + { "name": "menhir-with-logging", "edge": ["parserAutomaton.ml", ":", "parserAutomaton.mly", "parserUtils.mly", "tokens.ml"] } ] } ],
... then re-build all libraries with
lerna run prepare
. -
To debug OCaml implementation-code, it's useful to know that ReScript has a debugging mode that vastly improves the inspector output for data-structures. One thing those docs do not mention, however, is that you only need to add
[%%debugger.chrome]
to a single ML file in the current code-path — this is useful information when debugging a JavaScript interface like ours. (i.e. add the[%%debugger.chrome]
expression tosrc/parser.ml
, even if you're debugging something likesrc/interface.ts
that importsparser.bs.js
.) -
To debug OCaml implementation-code, it's useful to know that ReScript has a debugging mode that vastly improves the inspector output for data-structures. This can be enabled by passing
-bs-g
tobsc
, most easily by adding it to the"bsc-flags"
inbsconfig.json
:--- i/packages/bs-excmd/bsconfig.json +++ w/packages/bs-excmd/bsconfig.json @@ -95,4 +95,5 @@ "suffix": ".bs.js", "bsc-flags": [ + "-bs-g", "-bs-super-errors", "-bs-no-version-header",
I'm going to be broadly following Unicode 11's UAX #31 “Unicode Identifier And Pattern Syntax”; speaking formally, this implementation is planned to conform to requirements ...
-
R1, Default Identifiers: Identifiers begin with
XID_Start
, continue withXID_Continue += [U+200C-U+200D]
(subject to the restrictions below), allowing for medial (non-repeated, non-terminating) instances of the following characters:U+002D
:-
HYPHEN-MINUS,U+002E
:.
FULL STOP,U+00B7
:·
MIDDLE DOT,
... and excluding characters belonging to a script listed in “Candidate Characters for Exclusion from Identifiers” (UAX 31, Table 4).
-
R1a, Restricted Format Characters:
U+200C
&D
, that is, the zero-width non-joiners, shall only be parsed in a context necessary to handling the appropriate Farsi, Malayalam, etc. phrases: when breaking a cursive connection (context A1), and in a conjunct (context B.) (NYI!) -
R3,
Pattern_White_Space
andPattern_Syntax
Characters: Arguments and flags (unique to Tridactyl, and not occurring in the original Vimscript) are separated with whitespace, which is exactly the UnicodePattern_White_Space
category. -
R4, Equivalent Normalized Identifiers: The parser yields both display-form (what you typed) and normalized-form (what you meant) output. Where possible, your input should be displayed as-typed; but should be utilized as normalized before comparisons and references.