LPEG: captures

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

LPEG: captures

Александр Машин
>> After applying it  (re.match) to the example line "Perhaps,
 >> [[Peter|Simon]], or [[Paul]], so they say", I got:
 >>
 >> table { 1 = Perhaps, [[Peter|Simon]], or [[Paul]], so they say 2 =
 >> table { 1 = Peter 2 = Paul separator = , } }
 > If you give me sample input, pattern, desired output, and actual
 > output, I might be able to help.
Mr Parke, I would be grateful.

This is the input:
Perhaps, [[Peter|Simon]], or [[Paul]], so they say (see [[:Apocrypha]])

This is the desired output:

table {
full = Perhaps, [[Name::Peter|Simon]], or [[Name::Paul]], so they say
(see [[:Apocrypha]])
items = table {
1 = Peter,
2 = Paul
}
separator = ,
}

Alexander Mashin

Reply | Threaded
Open this post in threaded view
|

Re: LPEG: captures

Parke
On Thu, May 28, 2015 at 12:52 AM, Alexander Mashin
<[hidden email]> wrote:

> This is the input:
> Perhaps, [[Peter|Simon]], or [[Paul]], so they say (see [[:Apocrypha]])
>
> This is the desired output:
>
> table {
> full = Perhaps, [[Name::Peter|Simon]], or [[Name::Paul]], so they say (see
> [[:Apocrypha]])
> items = table {
> 1 = Peter,
> 2 = Paul
> }
> separator = ,
> }

The above would be very tricky.  "Peter" and "Paul" are captured
inside two different named captures (full and items).  Additionally,
Peter and Paul are appended to the same named capture (items) even
though Peter and Paul occur at different locations in the input.

Will the following work for you?


grammar  =  [==[


wikitext  <-  {|  ( link / separator / text )*  |}


link  <-  {|
  {:t:''->'link':}
  {'[['}
  !':'
  ''->'Name::'
  {       (  !']]'  !'|'  .  )+  }
  {  '|'  (  !']]'        .  )*  /  }
  {']]'}
  |}


separator  <-  {|
  {:t:''->'separator':}
  { [,;*#] }  { %s* }  |}


text  <-  {|
  {:t:''->'text':}
  {  (  !link  !separator  .  )+  }  |}


]==]


s  =  'Perhaps, [[Peter|Simon]], or [[Paul]], so they say (see [[:Apocrypha]])'

parser  =  require ( 're' ).compile ( grammar )

t  =  parser : match ( s )

print ( s )

print ()
for k,v in pairs ( t ) do
  print ( string.format ( '%d  %-20s  %s',  k,  v.t, table.concat ( v ) ) )
  end

print ()
for k,v in pairs ( t ) do
  if v.t == 'link' then
    print ( string.format ( '%d  %s  %d  %-5s  %s',
                            k, 'link', #v, v[3], v[4] ) )
    end  end


---

The above will output:

Perhaps, [[Peter|Simon]], or [[Paul]], so they say (see [[:Apocrypha]])

1  text                  Perhaps
2  separator             ,
3  link                  [[Name::Peter|Simon]]
4  separator             ,
5  text                  or
6  link                  [[Name::Paul]]
7  separator             ,
8  text                  so they say (see [[:Apocrypha]])

3  link  5  Peter  |Simon
6  link  5  Paul