template.scrbl (21932B)
1 #lang scribble/manual 2 3 @(require (for-label typed/racket/base 4 syntax/parse 5 ;"template.rkt" 6 )) 7 8 @(define ellipses (racket ...)) 9 10 @title[#:tag "template-lib"]{Versatile parser and template library} 11 12 Keywords: grammar, parser, template. 13 14 @defform[(parse expr [pattern body …] …)]{ 15 Analogous to @racket[syntax-parse], except it isn't 16 specialized for syntax, but rather works for arbitrary 17 s-expressions, including syntax ones (denoted by 18 @racket[#'(…)] in the pattern).} 19 20 @defform[#:literals (: :: ... else struct) 21 (tmpl template) 22 #:grammar 23 [(template variable 24 [variable : type] ;; (ann variable type) 25 ;; cons-template 26 (template . template) 27 (template :: template) 28 29 ;; list 30 (template**) 31 ;; list* 32 template**-dotted 33 34 ;; vector 35 #(template**) 36 (vector . template**-dotted) 37 38 ;; hash-template: template** must expand to a list of pairs. 39 (hash . template**-dotted) ;; TODO: how to distinguish 40 (hasheq . template**-dotted) ;; mutable and immutable? 41 (hasheqv . template**-dotted) 42 #hash([template . template]) 43 #hasheq([template . template]) 44 #hasheqv([template . template]) 45 46 ;; struct-template 47 (struct-id template …) 48 (struct struct-id template …) 49 #s(prefab-id template …) 50 #s(template template …) ;; Only allowed in untyped racket 51 52 ;; box 53 #&template 54 55 ;; call-template 56 (~identifier args …) ;; calls (identifier args …) 57 (~ expr args …) ;; calls (expr args …) 58 59 ;; unquote-template 60 ,expr 61 ,@(list expr) 62 ,@(list* expr) ;; must appear in last position. 63 64 65 ;; template-expander 66 template-expander-id 67 (template-expander-id args …) 68 69 ;; maybe-template (should all be template expanders 70 ;; which means the system is extensible enough to express 71 ;; these special cases). 72 (?? alt-template …) 73 (?@ . template**-dotted) 74 (??@ . template**-dotted) 75 (?if condition template template) 76 (|@if| condition template template) 77 (if@ condition template template) 78 (|@cond| [condition template] …) 79 (|@cond| [condition template] … [else template]) 80 (cond@ condition template template) 81 82 ;; like #,@(with-syntax ([meta-var #'template]) 83 ;; #'(template**)) 84 (~let ([meta-var+args template]) 85 . template**-dotted) 86 87 (~sort key template ooo) 88 (~loc stxloc . template) 89 ;; Like (template . template), but discards the first and 90 ;; keeps just the second. If the first contains pattern 91 ;; variables which are repeated, this has the effect of 92 ;; repeating the second as many times as the first. Example: 93 ;; #'(vector (~each some-pattern-var '())) 94 ;; => (vector '() '() '() '() '()) 95 (~each template template) 96 97 ;; escaped 98 (ddd escaped) 99 100 ;; 101 102 ;; literal 103 #t 104 #f 105 string 106 bytes 107 number 108 char 109 keyword 110 regexp 111 pregexp) 112 113 (meta-var+args meta-var 114 (meta-var meta-arg …)) 115 116 (tail-template template) 117 118 ;; specialize mid-sequence in repetition (diagonal-matrix-style) 119 120 (variable identifier) 121 122 (template**-dotted (template* … . template) 123 template**) 124 (template** (code:line template* …) 125 (code:line template* … :: template) 126 (code:line template* … (~rest . template))) 127 (template* template 128 (code:line template ooo) 129 special-cased-template) 130 (special-cased-template (code:line template vardd) 131 (code:line template ddvar)) 132 ;; Where var is an iterated variable. 133 (vardd var.. ;; exclude the current iteration 134 var...) ;; include the current iteration 135 (ddvar ..var ;; exclude the current iteration 136 ...var) ;; include the current iteration 137 138 (ooo #,ellipses ;; TODO: make it a hyperlink 139 ___ 140 ..k ;; k positive integer 141 __k ;; k positive integer 142 (code:line .. expr) ;; expr must return a positive integer 143 (code:line __ expr)) ;; expr must return a positive integer 144 (ddd #,ellipses) 145 ]]{ 146 TODO: implement the versatile template library. 147 @racket[...] 148 149 TODO: support for typed/racket. 150 151 TODO: optimization feature: would it be useful if the 152 expanded code could be optimized? For example, when looking 153 at the output of syntax-parse, the code is far from being 154 concise. 155 156 The patterns for @racket[parse] should all have a way to 157 create a symmetric counterpart for @racket[tmpl], which 158 produces the original value. This symmetry is important 159 because it allows lens-like macros, which operate on only a 160 part of the data structure, leaving everything else 161 intact. 162 163 @racket[??] works like @racket[??] from 164 @racket[syntax/parse/experimental/template], except it 165 allows any number of alternatives (including 0, to avoid 166 special-casing in macros). It is more or less equivalent to 167 @racket[(?? a (?? b (?? c …)))], following syntax/parse's 168 semantics. 169 170 @racket[?@] has the same meaning as in syntax/parse. 171 172 @racket[(??@ t* …)] is a shortcut for 173 @racket[(?? (?@ t* …))] 174 175 For better compatibility with at-exp, @racket[|@if|] can be 176 written @racket[if@], and the same goes for 177 @racket[|@cond|] etc. 178 179 TODO: what's the difference between @racket[~], 180 @racket[template-expander] and @racket[unquote]? 181 @racket[template-expander] runs at compile-time and should 182 treat its arguments as syntax. 183 184 Concerning unquoting, unlike @racket[racket]'s default 185 behaviour in @RACKET[#'([x #,(y …)] …)], unquoting should 186 not break the nesting of ellipses. How should we express 187 voluntary variation of the level of nesting? @racket[~let] 188 already allows expanding part of the template at some level 189 and inserting it verbatim somewhere below, but it's not a 190 silver bullet. One case which comes to mind is when some of 191 the nested data should be mixed with less-nested data, for 192 example going from 193 @racket[([10 1 2 3] [100 4 5] [1000 6])] to 194 @racket[([10 20 30] [400 500] [6000])] should be relatively 195 easy to express. Maybe @racket[~let] with parameters can be 196 a suitable generalized solution: 197 @RACKET[({~let ([(addx v) #,(+ x v)]) [(addx y) …]} …)] 198 199 The special-cased template syntax should allow special 200 treatment of the @racket[i]-th iteration in a doubly-nested 201 loop: matching @racket[x] on @racket[(1 2 3 4 5)], and 202 using the template @racket[(0 x.. ,(* x x) ..x 1) …] will 203 produce @racket[(1 1 1 1 1) 204 (0 4 1 1 1) 205 (0 0 9 1 1) 206 (0 0 0 16 1) 207 (0 0 0 0 24)]. The pattern before 208 @racket[x..] and the pattern after @racket[..x] can expand 209 to multiple items which will be spliced in by wrapping it 210 with @racket[?@].} 211 212 @section{Ideas for implementation} 213 214 @subsection{Extensibility (expanders)} 215 216 Allow normal, inline-prefix, inline-postfix and inline-infix 217 expanders, which can bind using regular expressions. This 218 allows implementing exotic syntax like @racket[var..] 219 (postfix, operates on the pattern preceeding it), 220 @racket[..var] (postfix, operates on the pattern after it), 221 @racket[(… escaped-pattern)] (normal, operates on the 222 containing s-exp) 223 224 @subsection{Customization} 225 226 For things that are likely to be customized by the user in 227 the whole file scope, define a grammar/custom module, used 228 as follows: 229 230 @racketblock[(require grammar/custom) 231 (grammar/custom option …)] 232 233 The @racket[grammar/custom] macro expands to 234 @racket[(require grammar/core)] followed by a bunch of 235 @racket[define-syntax] which wrap the core macros, providing 236 them the custom options: 237 238 @racketblock[(require grammar/core) 239 (define-syntax-rule (parse . rest) 240 (parse/core #:global-options (option …) . rest)) 241 (define-syntax-rule (tmpl . rest) 242 (parse/core #:global-options (option …) . rest))] 243 244 This can also be used to rename the @racket[parse] and 245 @racket[tmpl] macros, if desired (for example, 246 @racket[tmpl] could be renamed to @racket[quasisyntax], or 247 something similar). 248 249 Otherwise, @racket[grammar/custom] could just @racket[set!] 250 some for-syntax variable which stores the options. A second 251 boolean for-syntax variable could be used to check if 252 @racket[grammar/custom] was called twice, and throw an error 253 in that case. 254 255 Or maybe we should just use units? Can they be customized in 256 a similar way? 257 258 The idea is to avoid having to wrap the whole file in a 259 @racket[(parameterize …)], and be able to easily 260 @racket[provide] a customized variation of this library: 261 262 @racketblock[(provide (customized-out grammar/custom))] 263 264 @subsection{Unsorted ideas} 265 266 @subsubsection{Global pattern constraints} 267 268 For patterns, have global constraints: @racket[(~global-or id)] binds 269 @racket[id] to true if the enclosing pattern was matched at least once, and 270 false otherwise. Multiple occurrences of the same @racket[(~global-or id)] make 271 the @racket[id] true if any of the containing clauses was matched at least 272 once. 273 274 Inside a @racket[{~no-order}], it should be possible to impose some partial 275 order constraints, so that we can say: 276 277 @racketblock[ 278 {~no-order 279 {~optional pat-a} 280 {~optional pat-b} 281 pat-c 282 {~optional {~constrain pat-d {~after pat-a}}}}] 283 284 The above code means that @racket[pat-a], @racket[pat-b] and @racket[pat-d] are 285 optional (but not @racket[pat-c]), the four patterns can appear in any order, 286 but if @racket[pat-a] and @racket[pat-d] are both present, then @racket[pat-d] 287 must appear after @racket[pat-a]. 288 289 Scopes: the global constraints apply within a scope. By default, there is an 290 implicit top-level scope, and some forms might implicitly introduce a catch-all 291 scope unless otherwise specified, like the implicit @racket[~demimit-cut] for 292 @racket[define-syntax-class] from @racket[syntax/parse]. There could be two 293 kinds of scopes: unhygienic catch-all scopes which scope all "global" 294 constraints within, and naming scopes, which explicitly say which identifiers 295 they scope. 296 297 @racketblock[ 298 {~scope {a} 299 {~vector 300 {~scope {b} {~no-order {~once a} {~optional b}}} 301 {~scope {b} {~no-order {~once a} {~optional b}}}}}] 302 303 The code above matches against a vector of two @racket[~no-order] lists. The 304 @racket[a] pattern must appear exactly once, either in the first list or in the 305 second, but not in both. On the other hand, the @racket[b] pattern may appear 306 zero or one time in the first list, zero or one time in the second list, and may 307 appear in both since its constraint is scoped for each list. Although it is less 308 clear, the following code is semantically identical: 309 310 @racketblock[ 311 {~scope {a b} 312 {~vector 313 {~no-order {~once a} {~optional b}} 314 {~scope {b} {~no-order {~once a} {~optional b}}}}}] 315 316 Since the @racket[b] in the @racket{~no-order} is bound to the enclosing 317 @racket[{~scope {b} …}], it does not interact in any way with the outer scope. 318 The @racket[~optional] constraint on the @racket[b] in the first 319 @racket[~no-order] therefore does not interact withe the @racket[~optional] 320 constraint in the second @racket[~no-order]. 321 322 @subsubsection{Generalization of pattern/template kinds} 323 324 Nearly all patterns and templates should work equally well for regular lists and 325 syntax objects. It should be possible and easy enough to create new "kinds" of 326 data, which modify how patterns and templates work all the way through the 327 pattern or template tree, until it switches to a new kind. As an example, the 328 following pattern starts as a normal s-expr pattern, and switches to syntax in 329 two nodes: 330 331 @racketblock[ 332 {~s-expr 1 2 (buckle {~optional my} shoe) 333 3 4 {~syntax (knock {~optional at the} door)} 334 5 6 (pick {~optional-wrap (up _) (sticks)}) 335 7 8 {~syntax (lay {~optional-wrap (them _) (straight)})}}] 336 337 That pattern should match the following value: 338 339 @racketblock[ 340 `(1 2 (buckle shoe) 341 3 4 ,#'(knock door) 342 5 6 (pick (up (sticks))) 343 7 8 ,#'(lay (them (straight))))] 344 345 The @racket[~syntax] indicates that the whole subtree should start matching (or 346 producing) syntax objects, instead of regular s-expressions. It is worht noting 347 that syntax objects have extra information (source location, syntax properties) 348 that regular s-expressions lack. One way of implementing this would be to make 349 the pattern directives operate on "enhanced" s-expressions. Enhanced 350 s-expressions are s-expressions with arbitrary kind-specific data attached to 351 them. The @racket[~s-expr] simply translates s-expressions into enhanced 352 s-expressions with an empty data attached, while @racket[~syntax] is a sort of 353 pre-processor which turns syntax objects into enhanced s-expressions with source 354 location and syntax properties attached. These "kind" pre-processors run before 355 the normal pattern directives are applied. Some kind-specific pattern directives 356 can access those properties (if they are used in within the scope of the 357 appropriate @racket[~kind]), so that a @racket[(~loc srcloc . pattern)] matches 358 @racket[pattern] and saves its source location into the variable 359 @racket[srcloc]. 360 361 Kinds should also be able to alter how the pattern variables are bound: 362 @racket[~s-expr] simply binds (in patterns) and uses (in templates) normal 363 Racket variables. On the other hand, @racket[~syntax] binds and uses syntax 364 pattern variables, so that the bound variables are used as @racket[#'var] 365 instead of @racket[var]. 366 367 Different pattern and template forms can specify a default kind (possibly by 368 simply wrapping their pattern or tempalte with the appropriate @racket[~kind]). 369 For example, a @racket[define/match] form would use @racket[~s-expr] by default, 370 whereas a @racket[define-syntax/match] would use @racket[~syntax]. The same 371 would apply for re-implementations of Racket's @racket[match] and 372 @racket[syntax-parse]. 373 374 Do the "kinds" form some sort of monad? TODO: Think about this, and try to see 375 if there are some monads which can be translated to pattern/template kinds 376 usefully. 377 378 @subsubsection{Lenses} 379 380 It should be possible to describe lenses using the patterns: you can work on 381 the focused part of the match, possibly access (read-only) other parts, and 382 return a new value. What should happen when the focused part is under an 383 ellipsis and has more than one match ? Implicitly execute the code n times, like 384 a sort of @racket[for/list]? 385 386 @subsubsection{Backtracking} 387 388 Since the parser may need to backtrack, we need to expose the backtracking 389 mechanism to the user in some way, so that the user can: 390 @itemlist[ 391 @item{Cut the current branch} 392 @item{Perform some side-effects and undo them when backtracking (dangerous)} 393 @item{Record a side-effectful lambda which is executed when the match succeeds 394 or when the current branch is @racket[~commit]ted.} 395 @item{Querry information about the previously failed branches} 396 @item{Maybe affect the order in which non-deterministic branches are taken. 397 This feature would mainly be used by optimizers. 398 399 As a toy "just because we can" example, the backtracking mechanism should be 400 configurable enough that some CSP algorithm like AC2003 can be expressed by 401 the user, turning the pattern library into a CSP solver (where the CSP problem 402 is expressed as a pattern over an empty object). Another toy "just because we 403 can" example would be a datalog implementation built upon this library, where 404 the deduction rules are expressed as patterns. 405 406 The goal is that the parser's backtracking mechanism should be modular enough 407 to allow us to implement a dead-simple unoptimized backtracker, and allow 408 optimizers to be written as plug-ins. For example, an optimiazer could 409 statically detect branches that can be cut due to a prior failure (e.g. if the 410 two-element-list pattern @racket[(foo:id bar:number)] failed because the first 411 element was not an @racket[identifier?], there's no point in trying 412 @racket[(baz:id quux:string fuzz:number)] on the same term. 413 414 Extensive configurability of the backtracking mechanism and optimization 415 features may interact badly with partial application and partial compilation, 416 see below. Think it through before giving too much or too little expressivity 417 to the user.}] 418 419 @subsubsection{Partial application} 420 421 It should be possible to give a partial input with holes to a pattern or 422 template form, and, for optimization purposes, request that the pattern or 423 template processes the input as much as it can (for the parser, it would 424 potentially open a bounded number of backtracking branches, ready to switch to 425 the next one if one fails), leaving an efficient "continuation". 426 427 @subsubsection{Partial compilation} 428 429 One of the drawbacks of @racketmodname[syntax/parse] is that compiling a 430 @racket[syntax-parse] form takes some non-negligible time. This means that if a 431 macro generates another macro, and the generated macro code uses syntax-parse, 432 each call to the "generator" macro will be expensive. A complex macro generating 433 syntax which contains hundreds of uses of syntax-case will be reasonnably fast. 434 The same code using syntax-parse will be much slower. Since the generated uses 435 of @racket[syntax-parse] will all have the same "shape" with a few identifiers 436 etc. changing, it would be nice to be able to partially pre-expand a use of 437 @racket[syntax-parse], leaving only the "holes" to be expanded. With a bottom-up 438 expansion mechanism there's not much to do, so we have to try hard to make the 439 pattern / template expander top-down as much as possible, and/or use a lazy 440 language (for which most things can be evaluated, leaving a continuation for the 441 few things that actually depend on the holes). 442 443 Although partial compilation sounds like a very interesting academic project, 444 it might be too difficult to get something useful out of it in practice. An 445 alternative, which would procude the sought performance benefits for macros 446 generating code which uses the pattern/template library, would be to make as 447 many of the concepts first-class, so that they can easily be supplied as a 448 parameter. Note that firs-class in this case does not necessarily mean "run-time 449 first-class", but possibly "compile-time first-class": we only need to be able 450 to pre-declare parametric templates, then use them in the code generated by a 451 macro. As long as the parametric templates support a form of "separate 452 compilation" and optimization, filling in the parameters can be handled by a 453 fast macro. 454 455 Some of the optimization plug-ins may however rely on a closed-world assumption 456 (i.e. they want to have the whole, final pattern or template, in order to 457 optimize it). If such an optimization plug-in is used, we may have to fall back 458 to the idea of using partial compilation, or simply accept that macros which 459 generate such code will take a while to expand. 460 461 @subsubsection{QuickCheck test generation} 462 463 It should be possible to generate random data that matches (and does not match, 464 too, that's a distinct problem) a pattern (unless there's a user-provided 465 predicate that is opaque to the library, in which case we can just ignore it and 466 generate instances at random, hoping that some will match and some won't). 467 468 Combined with the fact that pattern directives should be reversible into 469 template directives, and vica versa, it means that each directive should also 470 express its set of accepted values in terms of its contents. Of course, we don't 471 expect to be able to uniformly sample random instances, nor do we expect to be 472 able to support in a useful way complex patterns with lots of opaque predicates. 473 474 @subsubsection{Error messages} 475 476 @racketmodname[syntax/parse] generates good error messages, but it does not 477 work as well when the patterns become complex. Think this through, so that the 478 annotation burden is minimal, and so that users don't have to think too hard 479 about where to put a @racket[~describe] (I frequently had the problem with 480 @racket[syntax/parse] where I wrote a @racket[~describe], but it wasn't taken 481 into account. 482 483 @subsection{Things to look at} 484 485 @itemlist[ 486 @item{@racket[math/arry], for @racket[::] and array 487 broadcasting.} 488 @item{Quasipatterns in @racket[match].} 489 @item{The @racket[lens] library} 490 @item{@url{https://github.com/racket/racket/issues/1304} 491 non-linear matching (with repeated binding variables, for 492 example, that should be eq? or equal?)}]