www

Unnamed repository; edit this file 'description' to name the repository.
Log | Files | Refs | Submodules | README | LICENSE

template.scrbl (21932B)


      1 #lang scribble/manual
      2 
      3 @(require (for-label typed/racket/base
      4                      syntax/parse
      5                      ;"template.rkt"
      6                      ))
      7 
      8 @(define ellipses (racket ...))
      9 
     10 @title[#:tag "template-lib"]{Versatile parser and template library}
     11 
     12 Keywords: grammar, parser, template.
     13 
     14 @defform[(parse expr [pattern body …] …)]{
     15  Analogous to @racket[syntax-parse], except it isn't
     16  specialized for syntax, but rather works for arbitrary
     17  s-expressions, including syntax ones (denoted by 
     18  @racket[#'(…)] in the pattern).}
     19 
     20 @defform[#:literals (: :: ... else struct)
     21          (tmpl template)
     22          #:grammar
     23          [(template variable
     24                     [variable : type] ;; (ann variable type)
     25                     ;; cons-template
     26                     (template . template)
     27                     (template :: template)
     28                     
     29                     ;; list
     30                     (template**)
     31                     ;; list*
     32                     template**-dotted
     33                     
     34                     ;; vector
     35                     #(template**)
     36                     (vector . template**-dotted)
     37                     
     38                     ;; hash-template: template** must expand to a list of pairs.
     39                     (hash . template**-dotted)   ;; TODO: how to distinguish
     40                     (hasheq . template**-dotted) ;; mutable and immutable?
     41                     (hasheqv . template**-dotted)
     42                     #hash([template . template])
     43                     #hasheq([template . template])
     44                     #hasheqv([template . template])
     45                     
     46                     ;; struct-template
     47                     (struct-id template …)
     48                     (struct struct-id template …)
     49                     #s(prefab-id template …)
     50                     #s(template template …) ;; Only allowed in untyped racket
     51 
     52                     ;; box
     53                     #&template
     54                     
     55                     ;; call-template
     56                     (~identifier args …) ;; calls (identifier args …)
     57                     (~ expr args …)      ;; calls (expr args …)
     58                     
     59                     ;; unquote-template
     60                     ,expr
     61                     ,@(list expr)
     62                     ,@(list* expr) ;; must appear in last position.
     63                     
     64                     
     65                     ;; template-expander
     66                     template-expander-id
     67                     (template-expander-id args …)
     68                     
     69                     ;; maybe-template (should all be template expanders
     70                     ;; which means the system is extensible enough to express
     71                     ;; these special cases).
     72                     (?? alt-template …)
     73                     (?@ . template**-dotted)
     74                     (??@ . template**-dotted)
     75                     (?if condition template template)
     76                     (|@if| condition template template)
     77                     (if@ condition template template)
     78                     (|@cond| [condition template] …)
     79                     (|@cond| [condition template] … [else template])
     80                     (cond@ condition template template)
     81                     
     82                     ;; like #,@(with-syntax ([meta-var #'template])
     83                     ;;           #'(template**))
     84                     (~let ([meta-var+args template])
     85                           . template**-dotted)
     86                     
     87                     (~sort key template ooo)
     88                     (~loc stxloc . template)
     89                     ;; Like (template . template), but discards the first and
     90                     ;; keeps just the second. If the first contains pattern
     91                     ;; variables which are repeated, this has the effect of
     92                     ;; repeating the second as many times as the first. Example:
     93                     ;; #'(vector (~each some-pattern-var '()))
     94                     ;; => (vector '() '() '() '() '())
     95                     (~each template template)
     96                     
     97                     ;; escaped
     98                     (ddd escaped)
     99                     
    100                     ;; 
    101                     
    102                     ;; literal
    103                     #t
    104                     #f
    105                     string
    106                     bytes
    107                     number
    108                     char
    109                     keyword
    110                     regexp
    111                     pregexp)
    112 
    113           (meta-var+args meta-var
    114                          (meta-var meta-arg …))
    115           
    116           (tail-template template)
    117           
    118           ;; specialize mid-sequence in repetition (diagonal-matrix-style)
    119           
    120           (variable identifier)
    121 
    122           (template**-dotted (template* … . template)
    123                              template**)
    124           (template** (code:line template* …)
    125                       (code:line template* … :: template)
    126                       (code:line template* … (~rest . template)))
    127           (template* template
    128                      (code:line template ooo)
    129                      special-cased-template)
    130           (special-cased-template (code:line template vardd)
    131                                   (code:line template ddvar))
    132           ;; Where var is an iterated variable.
    133           (vardd var..   ;; exclude the current iteration
    134                  var...) ;; include the current iteration
    135           (ddvar ..var   ;; exclude the current iteration
    136                  ...var) ;; include the current iteration
    137           
    138           (ooo #,ellipses ;; TODO: make it a hyperlink
    139                ___
    140                ..k ;; k positive integer
    141                __k ;; k positive integer
    142                (code:line .. expr)  ;; expr must return a positive integer
    143                (code:line __ expr)) ;; expr must return a positive integer
    144           (ddd #,ellipses)
    145           ]]{
    146  TODO: implement the versatile template library. 
    147  @racket[...]
    148  
    149  TODO: support for typed/racket.
    150 
    151  TODO: optimization feature: would it be useful if the
    152  expanded code could be optimized? For example, when looking
    153  at the output of syntax-parse, the code is far from being
    154  concise.
    155  
    156  The patterns for @racket[parse] should all have a way to
    157  create a symmetric counterpart for @racket[tmpl], which
    158  produces the original value. This symmetry is important
    159  because it allows lens-like macros, which operate on only a
    160  part of the data structure, leaving everything else
    161  intact.
    162  
    163  @racket[??] works like @racket[??] from 
    164  @racket[syntax/parse/experimental/template], except it
    165  allows any number of alternatives (including 0, to avoid
    166  special-casing in macros). It is more or less equivalent to
    167  @racket[(?? a (?? b (?? c …)))], following syntax/parse's
    168  semantics.
    169  
    170  @racket[?@] has the same meaning as in syntax/parse.
    171  
    172  @racket[(??@ t* …)] is a shortcut for 
    173  @racket[(?? (?@ t* …))]
    174  
    175  For better compatibility with at-exp, @racket[|@if|] can be
    176  written @racket[if@], and the same goes for 
    177  @racket[|@cond|] etc.
    178  
    179  TODO: what's the difference between @racket[~], 
    180  @racket[template-expander] and @racket[unquote]? 
    181  @racket[template-expander] runs at compile-time and should
    182  treat its arguments as syntax.
    183  
    184  Concerning unquoting, unlike @racket[racket]'s default
    185  behaviour in @RACKET[#'([x #,(y …)] …)], unquoting should
    186  not break the nesting of ellipses. How should we express
    187  voluntary variation of the level of nesting? @racket[~let]
    188  already allows expanding part of the template at some level
    189  and inserting it verbatim somewhere below, but it's not a
    190  silver bullet. One case which comes to mind is when some of
    191  the nested data should be mixed with less-nested data, for
    192  example going from 
    193  @racket[([10 1 2 3] [100 4 5] [1000 6])] to 
    194  @racket[([10 20 30] [400 500] [6000])] should be relatively
    195  easy to express. Maybe @racket[~let] with parameters can be
    196  a suitable generalized solution: 
    197  @RACKET[({~let ([(addx v) #,(+ x v)]) [(addx y) …]} …)]
    198  
    199  The special-cased template syntax should allow special
    200  treatment of the @racket[i]-th iteration in a doubly-nested
    201  loop: matching @racket[x] on @racket[(1 2 3 4 5)], and
    202  using the template @racket[(0 x.. ,(* x x) ..x 1) …] will
    203  produce @racket[(1 1 1 1 1)
    204                  (0 4 1 1 1)
    205                  (0 0 9 1 1)
    206                  (0 0 0 16 1)
    207                  (0 0 0 0 24)]. The pattern before 
    208  @racket[x..] and the pattern after @racket[..x] can expand
    209  to multiple items which will be spliced in by wrapping it
    210  with @racket[?@].}
    211 
    212 @section{Ideas for implementation}
    213 
    214 @subsection{Extensibility (expanders)}
    215 
    216 Allow normal, inline-prefix, inline-postfix and inline-infix
    217 expanders, which can bind using regular expressions. This
    218 allows implementing exotic syntax like @racket[var..]
    219 (postfix, operates on the pattern preceeding it), 
    220 @racket[..var] (postfix, operates on the pattern after it),
    221 @racket[(… escaped-pattern)] (normal, operates on the
    222 containing s-exp)
    223 
    224 @subsection{Customization}
    225 
    226 For things that are likely to be customized by the user in
    227 the whole file scope, define a grammar/custom module, used
    228 as follows:
    229 
    230 @racketblock[(require grammar/custom)
    231              (grammar/custom option …)]
    232 
    233 The @racket[grammar/custom] macro expands to 
    234 @racket[(require grammar/core)] followed by a bunch of 
    235 @racket[define-syntax] which wrap the core macros, providing
    236 them the custom options:
    237 
    238 @racketblock[(require grammar/core)
    239              (define-syntax-rule (parse . rest)
    240                (parse/core #:global-options (option …) . rest))
    241              (define-syntax-rule (tmpl . rest)
    242                (parse/core #:global-options (option …) . rest))]
    243 
    244 This can also be used to rename the @racket[parse] and 
    245 @racket[tmpl] macros, if desired (for example, 
    246 @racket[tmpl] could be renamed to @racket[quasisyntax], or
    247 something similar).
    248 
    249 Otherwise, @racket[grammar/custom] could just @racket[set!]
    250 some for-syntax variable which stores the options. A second
    251 boolean for-syntax variable could be used to check if 
    252 @racket[grammar/custom] was called twice, and throw an error
    253 in that case.
    254 
    255 Or maybe we should just use units? Can they be customized in
    256 a similar way?
    257 
    258 The idea is to avoid having to wrap the whole file in a 
    259 @racket[(parameterize …)], and be able to easily 
    260 @racket[provide] a customized variation of this library:
    261 
    262 @racketblock[(provide (customized-out grammar/custom))]
    263 
    264 @subsection{Unsorted ideas}
    265 
    266 @subsubsection{Global pattern constraints}
    267 
    268 For patterns, have global constraints: @racket[(~global-or id)] binds 
    269 @racket[id] to true if the enclosing pattern was matched at least once, and
    270 false otherwise. Multiple occurrences of the same @racket[(~global-or id)] make
    271 the @racket[id] true if any of the containing clauses was matched at least
    272 once.
    273 
    274 Inside a @racket[{~no-order}], it should be possible to impose some partial
    275 order constraints, so that we can say:
    276 
    277 @racketblock[
    278  {~no-order
    279   {~optional pat-a}
    280   {~optional pat-b}
    281   pat-c
    282   {~optional {~constrain pat-d {~after pat-a}}}}]
    283 
    284 The above code means that @racket[pat-a], @racket[pat-b] and @racket[pat-d] are
    285 optional (but not @racket[pat-c]), the four patterns can appear in any order,
    286 but if @racket[pat-a] and @racket[pat-d] are both present, then @racket[pat-d]
    287 must appear after @racket[pat-a].
    288 
    289 Scopes: the global constraints apply within a scope. By default, there is an
    290 implicit top-level scope, and some forms might implicitly introduce a catch-all
    291 scope unless otherwise specified, like the implicit @racket[~demimit-cut] for 
    292 @racket[define-syntax-class] from @racket[syntax/parse]. There could be two
    293 kinds of scopes: unhygienic catch-all scopes which scope all "global"
    294 constraints within, and naming scopes, which explicitly say which identifiers
    295 they scope.
    296 
    297 @racketblock[
    298  {~scope {a}
    299   {~vector
    300    {~scope {b} {~no-order {~once a} {~optional b}}}
    301    {~scope {b} {~no-order {~once a} {~optional b}}}}}]
    302 
    303 The code above matches against a vector of two @racket[~no-order] lists. The 
    304 @racket[a] pattern must appear exactly once, either in the first list or in the
    305 second, but not in both. On the other hand, the @racket[b] pattern may appear
    306 zero or one time in the first list, zero or one time in the second list, and may
    307 appear in both since its constraint is scoped for each list. Although it is less
    308 clear, the following code is semantically identical:
    309 
    310 @racketblock[
    311  {~scope {a b}
    312   {~vector
    313    {~no-order {~once a} {~optional b}}
    314    {~scope {b} {~no-order {~once a} {~optional b}}}}}]
    315 
    316 Since the @racket[b] in the @racket{~no-order} is bound to the enclosing 
    317 @racket[{~scope {b} …}], it does not interact in any way with the outer scope.
    318 The @racket[~optional] constraint on the @racket[b] in the first 
    319 @racket[~no-order] therefore does not interact withe the @racket[~optional]
    320 constraint in the second @racket[~no-order].
    321 
    322 @subsubsection{Generalization of pattern/template kinds}
    323 
    324 Nearly all patterns and templates should work equally well for regular lists and
    325 syntax objects. It should be possible and easy enough to create new "kinds" of
    326 data, which modify how patterns and templates work all the way through the
    327 pattern or template tree, until it switches to a new kind. As an example, the
    328 following pattern starts as a normal s-expr pattern, and switches to syntax in
    329 two nodes:
    330 
    331 @racketblock[
    332  {~s-expr 1 2 (buckle {~optional my} shoe)
    333   3 4 {~syntax (knock {~optional at the} door)}
    334   5 6 (pick {~optional-wrap (up _) (sticks)})
    335   7 8 {~syntax (lay {~optional-wrap (them _) (straight)})}}]
    336 
    337 That pattern should match the following value:
    338 
    339 @racketblock[
    340  `(1 2 (buckle shoe)
    341      3 4 ,#'(knock door)
    342      5 6 (pick (up (sticks)))
    343      7 8 ,#'(lay (them (straight))))]
    344 
    345 The @racket[~syntax] indicates that the whole subtree should start matching (or
    346 producing) syntax objects, instead of regular s-expressions. It is worht noting
    347 that syntax objects have extra information (source location, syntax properties)
    348 that regular s-expressions lack. One way of implementing this would be to make
    349 the pattern directives operate on "enhanced" s-expressions. Enhanced
    350 s-expressions are s-expressions with arbitrary kind-specific data attached to
    351 them. The @racket[~s-expr] simply translates s-expressions into enhanced
    352 s-expressions with an empty data attached, while @racket[~syntax] is a sort of
    353 pre-processor which turns syntax objects into enhanced s-expressions with source
    354 location and syntax properties attached. These "kind" pre-processors run before
    355 the normal pattern directives are applied. Some kind-specific pattern directives
    356 can access those properties (if they are used in within the scope of the
    357 appropriate @racket[~kind]), so that a @racket[(~loc srcloc . pattern)] matches
    358 @racket[pattern] and saves its source location into the variable 
    359 @racket[srcloc].
    360 
    361 Kinds should also be able to alter how the pattern variables are bound: 
    362 @racket[~s-expr] simply binds (in patterns) and uses (in templates) normal
    363 Racket variables. On the other hand, @racket[~syntax] binds and uses syntax
    364 pattern variables, so that the bound variables are used as @racket[#'var]
    365 instead of @racket[var].
    366 
    367 Different pattern and template forms can specify a default kind (possibly by
    368 simply wrapping their pattern or tempalte with the appropriate @racket[~kind]).
    369 For example, a @racket[define/match] form would use @racket[~s-expr] by default,
    370 whereas a @racket[define-syntax/match] would use @racket[~syntax]. The same
    371 would apply for re-implementations of Racket's @racket[match] and 
    372 @racket[syntax-parse].
    373 
    374 Do the "kinds" form some sort of monad? TODO: Think about this, and try to see
    375 if there are some monads which can be translated to pattern/template kinds
    376 usefully.
    377 
    378 @subsubsection{Lenses}
    379 
    380 It should be possible to describe lenses using the patterns: you can work on
    381 the focused part of the match, possibly access (read-only) other parts, and
    382 return a new value. What should happen when the focused part is under an
    383 ellipsis and has more than one match ? Implicitly execute the code n times, like
    384 a sort of @racket[for/list]?
    385 
    386 @subsubsection{Backtracking}
    387 
    388 Since the parser may need to backtrack, we need to expose the backtracking
    389 mechanism to the user in some way, so that the user can:
    390 @itemlist[
    391  @item{Cut the current branch}
    392  @item{Perform some side-effects and undo them when backtracking (dangerous)}
    393  @item{Record a side-effectful lambda which is executed when the match succeeds
    394   or when the current branch is @racket[~commit]ted.}
    395  @item{Querry information about the previously failed branches}
    396  @item{Maybe affect the order in which non-deterministic branches are taken.
    397   This feature would mainly be used by optimizers.
    398 
    399   As a toy "just because we can" example, the backtracking mechanism should be
    400   configurable enough that some CSP algorithm like AC2003 can be expressed by
    401   the user, turning the pattern library into a CSP solver (where the CSP problem
    402   is expressed as a pattern over an empty object). Another toy "just because we
    403   can" example would be a datalog implementation built upon this library, where
    404   the deduction rules are expressed as patterns.
    405 
    406   The goal is that the parser's backtracking mechanism should be modular enough
    407   to allow us to implement a dead-simple unoptimized backtracker, and allow
    408   optimizers to be written as plug-ins. For example, an optimiazer could
    409   statically detect branches that can be cut due to a prior failure (e.g. if the
    410   two-element-list pattern @racket[(foo:id bar:number)] failed because the first
    411   element was not an @racket[identifier?], there's no point in trying 
    412   @racket[(baz:id quux:string fuzz:number)] on the same term.
    413 
    414   Extensive configurability of the backtracking mechanism and optimization
    415   features may interact badly with partial application and partial compilation,
    416   see below. Think it through before giving too much or too little expressivity
    417   to the user.}]
    418 
    419 @subsubsection{Partial application}
    420 
    421 It should be possible to give a partial input with holes to a pattern or
    422 template form, and, for optimization purposes, request that the pattern or
    423 template processes the input as much as it can (for the parser, it would
    424 potentially open a bounded number of backtracking branches, ready to switch to
    425 the next one if one fails), leaving an efficient "continuation".
    426 
    427 @subsubsection{Partial compilation}
    428 
    429 One of the drawbacks of @racketmodname[syntax/parse] is that compiling a 
    430 @racket[syntax-parse] form takes some non-negligible time. This means that if a
    431 macro generates another macro, and the generated macro code uses syntax-parse,
    432 each call to the "generator" macro will be expensive. A complex macro generating
    433 syntax which contains hundreds of uses of syntax-case will be reasonnably fast.
    434 The same code using syntax-parse will be much slower. Since the generated uses
    435 of @racket[syntax-parse] will all have the same "shape" with a few identifiers
    436 etc. changing, it would be nice to be able to partially pre-expand a use of 
    437 @racket[syntax-parse], leaving only the "holes" to be expanded. With a bottom-up
    438 expansion mechanism there's not much to do, so we have to try hard to make the
    439 pattern / template expander top-down as much as possible, and/or use a lazy
    440 language (for which most things can be evaluated, leaving a continuation for the
    441 few things that actually depend on the holes).
    442 
    443 Although partial compilation sounds like a very interesting academic project,
    444 it might be too difficult to get something useful out of it in practice. An
    445 alternative, which would procude the sought performance benefits for macros
    446 generating code which uses the pattern/template library, would be to make as
    447 many of the concepts first-class, so that they can easily be supplied as a
    448 parameter. Note that firs-class in this case does not necessarily mean "run-time
    449 first-class", but possibly "compile-time first-class": we only need to be able
    450 to pre-declare parametric templates, then use them in the code generated by a
    451 macro. As long as the parametric templates support a form of "separate
    452 compilation" and optimization, filling in the parameters can be handled by a
    453 fast macro.
    454 
    455 Some of the optimization plug-ins may however rely on a closed-world assumption
    456 (i.e. they want to have the whole, final pattern or template, in order to
    457 optimize it). If such an optimization plug-in is used, we may have to fall back
    458 to the idea of using partial compilation, or simply accept that macros which
    459 generate such code will take a while to expand.
    460 
    461 @subsubsection{QuickCheck test generation}
    462 
    463 It should be possible to generate random data that matches (and does not match,
    464 too, that's a distinct problem) a pattern (unless there's a user-provided
    465 predicate that is opaque to the library, in which case we can just ignore it and
    466 generate instances at random, hoping that some will match and some won't).
    467 
    468 Combined with the fact that pattern directives should be reversible into
    469 template directives, and vica versa, it means that each directive should also
    470 express its set of accepted values in terms of its contents. Of course, we don't
    471 expect to be able to uniformly sample random instances, nor do we expect to be
    472 able to support in a useful way complex patterns with lots of opaque predicates.
    473 
    474 @subsubsection{Error messages}
    475 
    476 @racketmodname[syntax/parse] generates good error messages, but it does not
    477 work as well when the patterns become complex. Think this through, so that the
    478 annotation burden is minimal, and so that users don't have to think too hard
    479 about where to put a @racket[~describe] (I frequently had the problem with 
    480 @racket[syntax/parse] where I wrote a @racket[~describe], but it wasn't taken
    481 into account.
    482 
    483 @subsection{Things to look at}
    484 
    485 @itemlist[
    486  @item{@racket[math/arry], for @racket[::] and array
    487   broadcasting.}
    488  @item{Quasipatterns in @racket[match].}
    489  @item{The @racket[lens] library}
    490  @item{@url{https://github.com/racket/racket/issues/1304}
    491   non-linear matching (with repeated binding variables, for
    492   example, that should be eq? or equal?)}]