Replace the last item in a string of 'words'

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Replace the last item in a string of 'words'

Russell Haley
Hi,

Given a file of lines in a file like below, what would be the best way to replace the last item on each line (e.g. V6 in the last line)?

twa01.dat 16 2000 16 0 12 10980 0 I
twa01.dat 16 2000 16 0 14 -9048 0 II
twa01.dat 16 2000 16 0 1 -25727 0 III
twa01.dat 16 2000 16 0 -14 29120 0 aVR
twa01.dat 16 2000 16 0 5 15064 0 aVL
twa01.dat 16 2000 16 0 8 17036 0 aVF
twa01.dat 16 2000 16 0 2 19694 0 V1
twa01.dat 16 2000 16 0 12 26289 0 V2
twa01.dat 16 2000 16 0 21 -23938 0 V3
twa01.dat 16 2000 16 0 18 11347 0 V4
twa01.dat 16 2000 16 0 6 27591 0 V5
twa01.dat 16 2000 16 0 5 -29501 0 V6

I'm not even having much luck with simply capturing the separate items. I've tried a bunch of variations on this:

for line in io.lines(file_name) do
    if line:match('.dat') then
        local t = {}
        for v in line:gmatch('%s(.)') do
            t[#t+1] = v
        end
        for i,v in pairs(t) do
            print(i,v)
        end
    end
end

Any advice would be appreciated.

Thanks!
Russ
Reply | Threaded
Open this post in threaded view
|

Re: Replace the last item in a string of 'words'

Paul K-2
Hi Russ,

> Given a file of lines in a file like below, what would be the best way to replace the last item on each line (e.g. V6 in the last line)?
> twa01.dat 16 2000 16 0 5 -29501 0 V6

Why not `:gsub("%S+$", "newval")`?

Paul.

On Wed, Jun 26, 2019 at 10:45 AM Russell Haley <[hidden email]> wrote:

>
> Hi,
>
> Given a file of lines in a file like below, what would be the best way to replace the last item on each line (e.g. V6 in the last line)?
>
> twa01.dat 16 2000 16 0 12 10980 0 I
> twa01.dat 16 2000 16 0 14 -9048 0 II
> twa01.dat 16 2000 16 0 1 -25727 0 III
> twa01.dat 16 2000 16 0 -14 29120 0 aVR
> twa01.dat 16 2000 16 0 5 15064 0 aVL
> twa01.dat 16 2000 16 0 8 17036 0 aVF
> twa01.dat 16 2000 16 0 2 19694 0 V1
> twa01.dat 16 2000 16 0 12 26289 0 V2
> twa01.dat 16 2000 16 0 21 -23938 0 V3
> twa01.dat 16 2000 16 0 18 11347 0 V4
> twa01.dat 16 2000 16 0 6 27591 0 V5
> twa01.dat 16 2000 16 0 5 -29501 0 V6
>
> I'm not even having much luck with simply capturing the separate items. I've tried a bunch of variations on this:
>
> for line in io.lines(file_name) do
>     if line:match('.dat') then
>         local t = {}
>         for v in line:gmatch('%s(.)') do
>             t[#t+1] = v
>         end
>         for i,v in pairs(t) do
>             print(i,v)
>         end
>     end
> end
>
> Any advice would be appreciated.
>
> Thanks!
> Russ

Reply | Threaded
Open this post in threaded view
|

Re: Replace the last item in a string of 'words'

Gabriel Bertilson
With the whole file, if it doesn't contain any null characters, I
would use the frontier pattern: file_contents:gsub('%S+%f[\n\0]',
something) in Lua 5.3 or 5.2 and file_contents:gsub('%S+%f[\n%z]',
something) in Lua 5.1. With a single line, line:gsub('%S+$',
something) as mentioned by Paul.

— Gabriel

On Wed, Jun 26, 2019 at 12:49 PM Paul K <[hidden email]> wrote:

>
> Hi Russ,
>
> > Given a file of lines in a file like below, what would be the best way to replace the last item on each line (e.g. V6 in the last line)?
> > twa01.dat 16 2000 16 0 5 -29501 0 V6
>
> Why not `:gsub("%S+$", "newval")`?
>
> Paul.
>
> On Wed, Jun 26, 2019 at 10:45 AM Russell Haley <[hidden email]> wrote:
> >
> > Hi,
> >
> > Given a file of lines in a file like below, what would be the best way to replace the last item on each line (e.g. V6 in the last line)?
> >
> > twa01.dat 16 2000 16 0 12 10980 0 I
> > twa01.dat 16 2000 16 0 14 -9048 0 II
> > twa01.dat 16 2000 16 0 1 -25727 0 III
> > twa01.dat 16 2000 16 0 -14 29120 0 aVR
> > twa01.dat 16 2000 16 0 5 15064 0 aVL
> > twa01.dat 16 2000 16 0 8 17036 0 aVF
> > twa01.dat 16 2000 16 0 2 19694 0 V1
> > twa01.dat 16 2000 16 0 12 26289 0 V2
> > twa01.dat 16 2000 16 0 21 -23938 0 V3
> > twa01.dat 16 2000 16 0 18 11347 0 V4
> > twa01.dat 16 2000 16 0 6 27591 0 V5
> > twa01.dat 16 2000 16 0 5 -29501 0 V6
> >
> > I'm not even having much luck with simply capturing the separate items. I've tried a bunch of variations on this:
> >
> > for line in io.lines(file_name) do
> >     if line:match('.dat') then
> >         local t = {}
> >         for v in line:gmatch('%s(.)') do
> >             t[#t+1] = v
> >         end
> >         for i,v in pairs(t) do
> >             print(i,v)
> >         end
> >     end
> > end
> >
> > Any advice would be appreciated.
> >
> > Thanks!
> > Russ
>

Reply | Threaded
Open this post in threaded view
|

Re: Replace the last item in a string of 'words'

Russell Haley
In reply to this post by Paul K-2


On Wed, Jun 26, 2019 at 10:49 AM Paul K <[hidden email]> wrote:
Hi Russ,

> Given a file of lines in a file like below, what would be the best way to replace the last item on each line (e.g. V6 in the last line)?
> twa01.dat 16 2000 16 0 5 -29501 0 V6

Why not `:gsub("%S+$", "newval")`?

Paul.

Hi Paul. To answer your question: because I have no idea what I'm doing. I really suck at string manipulation in Lua and I'm equally poor in regex in any other language. I tried it as you've written it with no matches. The Lua reference manual doesn't seem to indicate that capital %S does anything (but I could be missing something). I changed it to a lowercase %s and got closer to what I need:

    local i = 0
    for line in io.lines(file_name) do
        if line:match('.dat') then
            i = i + 1
            local l = line:gsub("%s+$", "newval"..i)
            print(l)
        end
    end
end

results in:
russellh@canary-dev:~/physionet/get_stats$ ./modify_headers.lua --rec twa01
twa01.dat 16 2000 16 0 12 10980 0 Inewval1
twa01.dat 16 2000 16 0 14 -9048 0 IInewval2
twa01.dat 16 2000 16 0 1 -25727 0 IIInewval3
twa01.dat 16 2000 16 0 -14 29120 0 aVRnewval4
twa01.dat 16 2000 16 0 5 15064 0 aVLnewval5
twa01.dat 16 2000 16 0 8 17036 0 aVFnewval6
twa01.dat 16 2000 16 0 2 19694 0 V1newval7
twa01.dat 16 2000 16 0 12 26289 0 V2newval8
twa01.dat 16 2000 16 0 21 -23938 0 V3newval9
twa01.dat 16 2000 16 0 18 11347 0 V4newval10
twa01.dat 16 2000 16 0 6 27591 0 V5newval11
twa01.dat 16 2000 16 0 5 -29501 0 V6newval12

So, it's close but not removing the original value?

Thanks for your help (And thank you to Gabriel as well!),
Russ
 
On Wed, Jun 26, 2019 at 10:45 AM Russell Haley <[hidden email]> wrote:
>
> Hi,
>
> Given a file of lines in a file like below, what would be the best way to replace the last item on each line (e.g. V6 in the last line)?
>
> twa01.dat 16 2000 16 0 12 10980 0 I
> twa01.dat 16 2000 16 0 14 -9048 0 II
> twa01.dat 16 2000 16 0 1 -25727 0 III
> twa01.dat 16 2000 16 0 -14 29120 0 aVR
> twa01.dat 16 2000 16 0 5 15064 0 aVL
> twa01.dat 16 2000 16 0 8 17036 0 aVF
> twa01.dat 16 2000 16 0 2 19694 0 V1
> twa01.dat 16 2000 16 0 12 26289 0 V2
> twa01.dat 16 2000 16 0 21 -23938 0 V3
> twa01.dat 16 2000 16 0 18 11347 0 V4
> twa01.dat 16 2000 16 0 6 27591 0 V5
> twa01.dat 16 2000 16 0 5 -29501 0 V6
>
> I'm not even having much luck with simply capturing the separate items. I've tried a bunch of variations on this:
>
> for line in io.lines(file_name) do
>     if line:match('.dat') then
>         local t = {}
>         for v in line:gmatch('%s(.)') do
>             t[#t+1] = v
>         end
>         for i,v in pairs(t) do
>             print(i,v)
>         end
>     end
> end
>
> Any advice would be appreciated.
>
> Thanks!
> Russ

Reply | Threaded
Open this post in threaded view
|

Re: Replace the last item in a string of 'words'

Paul K-2
Hi Russ,

> local l = line:gsub("%s+$", "newval"..i)

It should be uppercase S, as `%S+$` will match any non-witespace
characters at the end of the string. From PiL
(https://www.lua.org/pil/20.2.html): "An upper case version of any of
those classes represents the complement of the class. For instance,
'%A' represents all non-letter characters."

I'm not sure why the %S+ match didn't work for you, but I tested on
the exact lines you used and it worked as expected.

Paul.

On Wed, Jun 26, 2019 at 11:17 AM Russell Haley <[hidden email]> wrote:

>
>
>
> On Wed, Jun 26, 2019 at 10:49 AM Paul K <[hidden email]> wrote:
>>
>> Hi Russ,
>>
>> > Given a file of lines in a file like below, what would be the best way to replace the last item on each line (e.g. V6 in the last line)?
>> > twa01.dat 16 2000 16 0 5 -29501 0 V6
>>
>> Why not `:gsub("%S+$", "newval")`?
>>
>> Paul.
>>
> Hi Paul. To answer your question: because I have no idea what I'm doing. I really suck at string manipulation in Lua and I'm equally poor in regex in any other language. I tried it as you've written it with no matches. The Lua reference manual doesn't seem to indicate that capital %S does anything (but I could be missing something). I changed it to a lowercase %s and got closer to what I need:
>
>     local i = 0
>     for line in io.lines(file_name) do
>         if line:match('.dat') then
>             i = i + 1
>             local l = line:gsub("%s+$", "newval"..i)
>             print(l)
>         end
>     end
> end
>
> results in:
> russellh@canary-dev:~/physionet/get_stats$ ./modify_headers.lua --rec twa01
> twa01.dat 16 2000 16 0 12 10980 0 Inewval1
> twa01.dat 16 2000 16 0 14 -9048 0 IInewval2
> twa01.dat 16 2000 16 0 1 -25727 0 IIInewval3
> twa01.dat 16 2000 16 0 -14 29120 0 aVRnewval4
> twa01.dat 16 2000 16 0 5 15064 0 aVLnewval5
> twa01.dat 16 2000 16 0 8 17036 0 aVFnewval6
> twa01.dat 16 2000 16 0 2 19694 0 V1newval7
> twa01.dat 16 2000 16 0 12 26289 0 V2newval8
> twa01.dat 16 2000 16 0 21 -23938 0 V3newval9
> twa01.dat 16 2000 16 0 18 11347 0 V4newval10
> twa01.dat 16 2000 16 0 6 27591 0 V5newval11
> twa01.dat 16 2000 16 0 5 -29501 0 V6newval12
>
> So, it's close but not removing the original value?
>
> Thanks for your help (And thank you to Gabriel as well!),
> Russ
>
>>
>> On Wed, Jun 26, 2019 at 10:45 AM Russell Haley <[hidden email]> wrote:
>> >
>> > Hi,
>> >
>> > Given a file of lines in a file like below, what would be the best way to replace the last item on each line (e.g. V6 in the last line)?
>> >
>> > twa01.dat 16 2000 16 0 12 10980 0 I
>> > twa01.dat 16 2000 16 0 14 -9048 0 II
>> > twa01.dat 16 2000 16 0 1 -25727 0 III
>> > twa01.dat 16 2000 16 0 -14 29120 0 aVR
>> > twa01.dat 16 2000 16 0 5 15064 0 aVL
>> > twa01.dat 16 2000 16 0 8 17036 0 aVF
>> > twa01.dat 16 2000 16 0 2 19694 0 V1
>> > twa01.dat 16 2000 16 0 12 26289 0 V2
>> > twa01.dat 16 2000 16 0 21 -23938 0 V3
>> > twa01.dat 16 2000 16 0 18 11347 0 V4
>> > twa01.dat 16 2000 16 0 6 27591 0 V5
>> > twa01.dat 16 2000 16 0 5 -29501 0 V6
>> >
>> > I'm not even having much luck with simply capturing the separate items. I've tried a bunch of variations on this:
>> >
>> > for line in io.lines(file_name) do
>> >     if line:match('.dat') then
>> >         local t = {}
>> >         for v in line:gmatch('%s(.)') do
>> >             t[#t+1] = v
>> >         end
>> >         for i,v in pairs(t) do
>> >             print(i,v)
>> >         end
>> >     end
>> > end
>> >
>> > Any advice would be appreciated.
>> >
>> > Thanks!
>> > Russ
>>

v
Reply | Threaded
Open this post in threaded view
|

Re: Replace the last item in a string of 'words'

v
In reply to this post by Russell Haley
`%S` stands for "non-space char" and `%s` stays for "space char".
Quoting reference:

> For all classes represented by single letters (%a, %c, etc.), the
corresponding uppercase letter represents the complement of the class.
For instance, `%S` represents all non-space characters.

Looks like your strings have some space characters at the ending. If
so, this should probably work:

local i = 0
for line in io.lines(file_name) do
    if line:match('.dat') then
        i=i+1
        print(line:gsub("(%S+%s*)$", "newval"..i))
    end
end

Or, if you must keep space characters at the end of lines:

...
        print(line:gsub("(%S+)%s*$", "newval"..i))
...

On Wed, 2019-06-26 at 11:16 -0700, Russell Haley wrote:

>
> On Wed, Jun 26, 2019 at 10:49 AM Paul K <[hidden email]> wrote:
> > Hi Russ,
> >
> > > Given a file of lines in a file like below, what would be the
> > best way to replace the last item on each line (e.g. V6 in the last
> > line)?
> > > twa01.dat 16 2000 16 0 5 -29501 0 V6
> >
> > Why not `:gsub("%S+$", "newval")`?
> >
> > Paul.
> >
>
> Hi Paul. To answer your question: because I have no idea what I'm
> doing. I really suck at string manipulation in Lua and I'm equally
> poor in regex in any other language. I tried it as you've written it
> with no matches. The Lua reference manual doesn't seem to indicate
> that capital %S does anything (but I could be missing something). I
> changed it to a lowercase %s and got closer to what I need:
>
>     local i = 0
>     for line in io.lines(file_name) do
>         if line:match('.dat') then
>             i = i + 1
>             local l = line:gsub("%s+$", "newval"..i)
>             print(l)
>         end
>     end
> end
>
> results in:
> russellh@canary-dev:~/physionet/get_stats$ ./modify_headers.lua --rec
> twa01
> twa01.dat 16 2000 16 0 12 10980 0 Inewval1
> twa01.dat 16 2000 16 0 14 -9048 0 IInewval2
> twa01.dat 16 2000 16 0 1 -25727 0 IIInewval3
> twa01.dat 16 2000 16 0 -14 29120 0 aVRnewval4
> twa01.dat 16 2000 16 0 5 15064 0 aVLnewval5
> twa01.dat 16 2000 16 0 8 17036 0 aVFnewval6
> twa01.dat 16 2000 16 0 2 19694 0 V1newval7
> twa01.dat 16 2000 16 0 12 26289 0 V2newval8
> twa01.dat 16 2000 16 0 21 -23938 0 V3newval9
> twa01.dat 16 2000 16 0 18 11347 0 V4newval10
> twa01.dat 16 2000 16 0 6 27591 0 V5newval11
> twa01.dat 16 2000 16 0 5 -29501 0 V6newval12
>
> So, it's close but not removing the original value?
>
> Thanks for your help (And thank you to Gabriel as well!),
> Russ
>  
> > On Wed, Jun 26, 2019 at 10:45 AM Russell Haley <
> > [hidden email]> wrote:
> > >
> > > Hi,
> > >
> > > Given a file of lines in a file like below, what would be the
> > best way to replace the last item on each line (e.g. V6 in the last
> > line)?
> > >
> > > twa01.dat 16 2000 16 0 12 10980 0 I
> > > twa01.dat 16 2000 16 0 14 -9048 0 II
> > > twa01.dat 16 2000 16 0 1 -25727 0 III
> > > twa01.dat 16 2000 16 0 -14 29120 0 aVR
> > > twa01.dat 16 2000 16 0 5 15064 0 aVL
> > > twa01.dat 16 2000 16 0 8 17036 0 aVF
> > > twa01.dat 16 2000 16 0 2 19694 0 V1
> > > twa01.dat 16 2000 16 0 12 26289 0 V2
> > > twa01.dat 16 2000 16 0 21 -23938 0 V3
> > > twa01.dat 16 2000 16 0 18 11347 0 V4
> > > twa01.dat 16 2000 16 0 6 27591 0 V5
> > > twa01.dat 16 2000 16 0 5 -29501 0 V6
> > >
> > > I'm not even having much luck with simply capturing the separate
> > items. I've tried a bunch of variations on this:
> > >
> > > for line in io.lines(file_name) do
> > >     if line:match('.dat') then
> > >         local t = {}
> > >         for v in line:gmatch('%s(.)') do
> > >             t[#t+1] = v
> > >         end
> > >         for i,v in pairs(t) do
> > >             print(i,v)
> > >         end
> > >     end
> > > end
> > >
> > > Any advice would be appreciated.
> > >
> > > Thanks!
> > > Russ
> >
--
v <[hidden email]>


Reply | Threaded
Open this post in threaded view
|

Re: Replace the last item in a string of 'words'

Russell Haley
In reply to this post by Paul K-2


On Wed, Jun 26, 2019 at 11:27 AM Paul K <[hidden email]> wrote:
Hi Russ,

> local l = line:gsub("%s+$", "newval"..i)

It should be uppercase S, as `%S+$` will match any non-witespace
characters at the end of the string. From PiL
(https://www.lua.org/pil/20.2.html): "An upper case version of any of
those classes represents the complement of the class. For instance,
'%A' represents all non-letter characters."

I'm not sure why the %S+ match didn't work for you, but I tested on
the exact lines you used and it worked as expected.
Ah, thanks for the RTM. I missed the line below the character class table in the 5.4 reference. 

%S didn't work because the file uses Windows line endings and I'm working in Ubuntu (thank you for the hint v).  As soon as I converted the line endings it works as expected. So I am now using:

local l = line:gsub("%S+\r$", "newval"..i)

Which works well. 

Thanks everyone!
Russ

Paul.

On Wed, Jun 26, 2019 at 11:17 AM Russell Haley <[hidden email]> wrote:
>
>
>
> On Wed, Jun 26, 2019 at 10:49 AM Paul K <[hidden email]> wrote:
>>
>> Hi Russ,
>>
>> > Given a file of lines in a file like below, what would be the best way to replace the last item on each line (e.g. V6 in the last line)?
>> > twa01.dat 16 2000 16 0 5 -29501 0 V6
>>
>> Why not `:gsub("%S+$", "newval")`?
>>
>> Paul.
>>
> Hi Paul. To answer your question: because I have no idea what I'm doing. I really suck at string manipulation in Lua and I'm equally poor in regex in any other language. I tried it as you've written it with no matches. The Lua reference manual doesn't seem to indicate that capital %S does anything (but I could be missing something). I changed it to a lowercase %s and got closer to what I need:
>
>     local i = 0
>     for line in io.lines(file_name) do
>         if line:match('.dat') then
>             i = i + 1
>             local l = line:gsub("%s+$", "newval"..i)
>             print(l)
>         end
>     end
> end
>
> results in:
> russellh@canary-dev:~/physionet/get_stats$ ./modify_headers.lua --rec twa01
> twa01.dat 16 2000 16 0 12 10980 0 Inewval1
> twa01.dat 16 2000 16 0 14 -9048 0 IInewval2
> twa01.dat 16 2000 16 0 1 -25727 0 IIInewval3
> twa01.dat 16 2000 16 0 -14 29120 0 aVRnewval4
> twa01.dat 16 2000 16 0 5 15064 0 aVLnewval5
> twa01.dat 16 2000 16 0 8 17036 0 aVFnewval6
> twa01.dat 16 2000 16 0 2 19694 0 V1newval7
> twa01.dat 16 2000 16 0 12 26289 0 V2newval8
> twa01.dat 16 2000 16 0 21 -23938 0 V3newval9
> twa01.dat 16 2000 16 0 18 11347 0 V4newval10
> twa01.dat 16 2000 16 0 6 27591 0 V5newval11
> twa01.dat 16 2000 16 0 5 -29501 0 V6newval12
>
> So, it's close but not removing the original value?
>
> Thanks for your help (And thank you to Gabriel as well!),
> Russ
>
>>
>> On Wed, Jun 26, 2019 at 10:45 AM Russell Haley <[hidden email]> wrote:
>> >
>> > Hi,
>> >
>> > Given a file of lines in a file like below, what would be the best way to replace the last item on each line (e.g. V6 in the last line)?
>> >
>> > twa01.dat 16 2000 16 0 12 10980 0 I
>> > twa01.dat 16 2000 16 0 14 -9048 0 II
>> > twa01.dat 16 2000 16 0 1 -25727 0 III
>> > twa01.dat 16 2000 16 0 -14 29120 0 aVR
>> > twa01.dat 16 2000 16 0 5 15064 0 aVL
>> > twa01.dat 16 2000 16 0 8 17036 0 aVF
>> > twa01.dat 16 2000 16 0 2 19694 0 V1
>> > twa01.dat 16 2000 16 0 12 26289 0 V2
>> > twa01.dat 16 2000 16 0 21 -23938 0 V3
>> > twa01.dat 16 2000 16 0 18 11347 0 V4
>> > twa01.dat 16 2000 16 0 6 27591 0 V5
>> > twa01.dat 16 2000 16 0 5 -29501 0 V6
>> >
>> > I'm not even having much luck with simply capturing the separate items. I've tried a bunch of variations on this:
>> >
>> > for line in io.lines(file_name) do
>> >     if line:match('.dat') then
>> >         local t = {}
>> >         for v in line:gmatch('%s(.)') do
>> >             t[#t+1] = v
>> >         end
>> >         for i,v in pairs(t) do
>> >             print(i,v)
>> >         end
>> >     end
>> > end
>> >
>> > Any advice would be appreciated.
>> >
>> > Thanks!
>> > Russ
>>

Reply | Threaded
Open this post in threaded view
|

Re: Replace the last item in a string of 'words'

Philippe Verdy
Le mer. 26 juin 2019 à 20:50, Russell Haley <[hidden email]> a écrit :

%S didn't work because the file uses Windows line endings and I'm working in Ubuntu (thank you for the hint v).  As soon as I converted the line endings it works as expected. So I am now using:

local l = line:gsub("%S+\r$", "newval"..i)

Thing would have worked as expected if you had used  line:gsub('%S+(%s*)$', 'newval' .. i..'$1'),
because it would have preserved the line-ends (also other optional whitespaces at end of lines).
So you would not even have to convert the line ends between ISO/MIME/DOS/LegacyWindows and Linux.

But if you want performance for processing a whole file, your code should avoid splitting lines individidually, and should use large buffers (you'll split your buffer jut before the last occurence of '[\r\n]', and keep that in a small cache that will be prepended to the buffer you'll fill for the next block, until you reach end of file, which may not always be terminated by newlines but that will always match the "$").

In that case, you can use patterns compiled to work at line end boundaries or end of buffer (occuring only at end of files). For the second case, if the buffer to process (including the prepended rest of the previous block) does not terminate with a [\r\n], just append one "virtually" (i.e. only for the source of substitution, but remove it from the substitution before writing the result in the new file).

Your code will then be much faster (it should be to process buffers of about 256KB(+prepended data from the previous block not terminated by a newline) without problem, and with less memory overhead and at a speed very close to the I/O limits on disk or many networks (RAID disks typically obtain their maximum reading/writing speed with block sizes about 64KB).

May be even 256KB is excessive if you have some memory constraints, then use 64KB, it will still be much faster and memory efficient than processing large files line by line with many temporary short strings that will harness the Lua garbage collector). Experiment with your environemetn what would be the fastest size, then look at the Luya memory overhead in the garbage collector statistic: the lower the blocksize, the more you'll have overheads in memory and the slower your code will be.

----

Note that the expression '%S+(%s*)$' may be very greedy in '%S+', and in '%s*' : it can collect arbitrarily long "words" (not-spaces) which could take significant space in memory but would result in non-sense output from what was actually an incorrect input (not the correct text format).

You may want to supply a reasonable maximum size for the final word, and then raise an error if some files happens to have too long "garbage" at end of lines. The same is true of (%s*) which may be arbitrarily long. So you may want to detect files (most probably garbage) that exhaust this maximum, by

- detecting files that cannot be valid UTF-8 text files and are most probably binary files, if they contain '[\0\240-\255]' (you may extend this set to other undesired ASCII controls, such as '[\0-\8\14-\31\127\240-\255]'), then
- detecting '%s{129}$' as invalid text and then
- detecting '%S{129}(%s{,128})$' as invalid text, before
- doing the actual substitution with '%S{,128}(%s{,128})$'

(change 128 and 129 above by the reasonable limits you accept for whitespaces at end of lines or for the last word to replace in these lines)

Reply | Threaded
Open this post in threaded view
|

Re: Replace the last item in a string of 'words'

Russell Haley


On Wed, Jun 26, 2019 at 1:25 PM Philippe Verdy <[hidden email]> wrote:
Le mer. 26 juin 2019 à 20:50, Russell Haley <[hidden email]> a écrit :

%S didn't work because the file uses Windows line endings and I'm working in Ubuntu (thank you for the hint v).  As soon as I converted the line endings it works as expected. So I am now using:

local l = line:gsub("%S+\r$", "newval"..i)

Thing would have worked as expected if you had used  line:gsub('%S+(%s*)$', 'newval' .. i..'$1'),
because it would have preserved the line-ends (also other optional whitespaces at end of lines).
So you would not even have to convert the line ends between ISO/MIME/DOS/LegacyWindows and Linux.
Thanks Phillip, this did work as well. I've already re-processed the 90 files and sent them back to the client, but I'll know to check my email for this next time!
Russ  

But if you want performance for processing a whole file, your code should avoid splitting lines individidually, and should use large buffers (you'll split your buffer jut before the last occurence of '[\r\n]', and keep that in a small cache that will be prepended to the buffer you'll fill for the next block, until you reach end of file, which may not always be terminated by newlines but that will always match the "$").

In that case, you can use patterns compiled to work at line end boundaries or end of buffer (occuring only at end of files). For the second case, if the buffer to process (including the prepended rest of the previous block) does not terminate with a [\r\n], just append one "virtually" (i.e. only for the source of substitution, but remove it from the substitution before writing the result in the new file).

Your code will then be much faster (it should be to process buffers of about 256KB(+prepended data from the previous block not terminated by a newline) without problem, and with less memory overhead and at a speed very close to the I/O limits on disk or many networks (RAID disks typically obtain their maximum reading/writing speed with block sizes about 64KB).

May be even 256KB is excessive if you have some memory constraints, then use 64KB, it will still be much faster and memory efficient than processing large files line by line with many temporary short strings that will harness the Lua garbage collector). Experiment with your environemetn what would be the fastest size, then look at the Luya memory overhead in the garbage collector statistic: the lower the blocksize, the more you'll have overheads in memory and the slower your code will be.

----

Note that the expression '%S+(%s*)$' may be very greedy in '%S+', and in '%s*' : it can collect arbitrarily long "words" (not-spaces) which could take significant space in memory but would result in non-sense output from what was actually an incorrect input (not the correct text format).

You may want to supply a reasonable maximum size for the final word, and then raise an error if some files happens to have too long "garbage" at end of lines. The same is true of (%s*) which may be arbitrarily long. So you may want to detect files (most probably garbage) that exhaust this maximum, by

- detecting files that cannot be valid UTF-8 text files and are most probably binary files, if they contain '[\0\240-\255]' (you may extend this set to other undesired ASCII controls, such as '[\0-\8\14-\31\127\240-\255]'), then
- detecting '%s{129}$' as invalid text and then
- detecting '%S{129}(%s{,128})$' as invalid text, before
- doing the actual substitution with '%S{,128}(%s{,128})$'

(change 128 and 129 above by the reasonable limits you accept for whitespaces at end of lines or for the last word to replace in these lines)

Reply | Threaded
Open this post in threaded view
|

Re: Replace the last item in a string of 'words'

Philippe Verdy
In reply to this post by Philippe Verdy
Also instead of using '%S+' (equivalent to '[^\9-\13\32]+') which matches all non-spaces that you will replace, you could restrict that to the set of characters that can make up a valid replaceable "word", such as "[.'%-_%d%a\128-\239]+" (the subset [\128-\239] includes bytes values used in UTF-8 for the representation of non-ASCII characters, they actually form more restrictive patterns depending on the value of the first byte)

If you add this you may detect files that are not matching the expected format as well when processing large buffers in the following order of checks:
- detect '[\0-\8\14-\31\127\240-\255]' as invalid at any position (not UTF-8)
- detect '%s{129}[\10-\13]' as invalid (too many whitespaces at end of line)
- detect '[.'%-_%d%a\128-\239]{129}%s*[\10-\13]' as invalid (last word on line too long)
- detect '^[.'%-_%d%a\128-\239]{1,128}%s*[\10-\13]' as invalid (no space separator before the last word isolated on its line)
- detect '%S[.'%-_%d%a\128-\239]{1,128}%s*[\10-\13]' as invalid (no space separator before the last word isolated on its line)
- replace '(%S)([.'%-_%d%a\128-\239]{1,128})(%s{,128})[\10-\13]' by '$1'..newval..'$2'

Make sure your program first outputs to a temporary local file before returning it to the caller with its final requested name (only if there's no failure detected in the middle)

Le mer. 26 juin 2019 à 22:24, Philippe Verdy <[hidden email]> a écrit :
Le mer. 26 juin 2019 à 20:50, Russell Haley <[hidden email]> a écrit :

%S didn't work because the file uses Windows line endings and I'm working in Ubuntu (thank you for the hint v).  As soon as I converted the line endings it works as expected. So I am now using:

local l = line:gsub("%S+\r$", "newval"..i)

Thing would have worked as expected if you had used  line:gsub('%S+(%s*)$', 'newval' .. i..'$1'),
because it would have preserved the line-ends (also other optional whitespaces at end of lines).
So you would not even have to convert the line ends between ISO/MIME/DOS/LegacyWindows and Linux.

But if you want performance for processing a whole file, your code should avoid splitting lines individidually, and should use large buffers (you'll split your buffer jut before the last occurence of '[\r\n]', and keep that in a small cache that will be prepended to the buffer you'll fill for the next block, until you reach end of file, which may not always be terminated by newlines but that will always match the "$").

In that case, you can use patterns compiled to work at line end boundaries or end of buffer (occuring only at end of files). For the second case, if the buffer to process (including the prepended rest of the previous block) does not terminate with a [\r\n], just append one "virtually" (i.e. only for the source of substitution, but remove it from the substitution before writing the result in the new file).

Your code will then be much faster (it should be to process buffers of about 256KB(+prepended data from the previous block not terminated by a newline) without problem, and with less memory overhead and at a speed very close to the I/O limits on disk or many networks (RAID disks typically obtain their maximum reading/writing speed with block sizes about 64KB).

May be even 256KB is excessive if you have some memory constraints, then use 64KB, it will still be much faster and memory efficient than processing large files line by line with many temporary short strings that will harness the Lua garbage collector). Experiment with your environemetn what would be the fastest size, then look at the Luya memory overhead in the garbage collector statistic: the lower the blocksize, the more you'll have overheads in memory and the slower your code will be.

----

Note that the expression '%S+(%s*)$' may be very greedy in '%S+', and in '%s*' : it can collect arbitrarily long "words" (not-spaces) which could take significant space in memory but would result in non-sense output from what was actually an incorrect input (not the correct text format).

You may want to supply a reasonable maximum size for the final word, and then raise an error if some files happens to have too long "garbage" at end of lines. The same is true of (%s*) which may be arbitrarily long. So you may want to detect files (most probably garbage) that exhaust this maximum, by

- detecting files that cannot be valid UTF-8 text files and are most probably binary files, if they contain '[\0\240-\255]' (you may extend this set to other undesired ASCII controls, such as '[\0-\8\14-\31\127\240-\255]'), then
- detecting '%s{129}$' as invalid text and then
- detecting '%S{129}(%s{,128})$' as invalid text, before
- doing the actual substitution with '%S{,128}(%s{,128})$'

(change 128 and 129 above by the reasonable limits you accept for whitespaces at end of lines or for the last word to replace in these lines)

Reply | Threaded
Open this post in threaded view
|

Re: Replace the last item in a string of 'words'

Russell Haley
In reply to this post by Russell Haley


On Wed, Jun 26, 2019 at 1:55 PM Russell Haley <[hidden email]> wrote:


On Wed, Jun 26, 2019 at 1:25 PM Philippe Verdy <[hidden email]> wrote:
Le mer. 26 juin 2019 à 20:50, Russell Haley <[hidden email]> a écrit :

%S didn't work because the file uses Windows line endings and I'm working in Ubuntu (thank you for the hint v).  As soon as I converted the line endings it works as expected. So I am now using:

local l = line:gsub("%S+\r$", "newval"..i)

Thing would have worked as expected if you had used  line:gsub('%S+(%s*)$', 'newval' .. i..'$1'),
because it would have preserved the line-ends (also other optional whitespaces at end of lines).
So you would not even have to convert the line ends between ISO/MIME/DOS/LegacyWindows and Linux.
Thanks Phillip, this did work as well. I've already re-processed the 90 files and sent them back to the client, but I'll know to check my email for this next time!
Russ 
Sorry... Philippe 
 

But if you want performance for processing a whole file, your code should avoid splitting lines individidually, and should use large buffers (you'll split your buffer jut before the last occurence of '[\r\n]', and keep that in a small cache that will be prepended to the buffer you'll fill for the next block, until you reach end of file, which may not always be terminated by newlines but that will always match the "$").

In that case, you can use patterns compiled to work at line end boundaries or end of buffer (occuring only at end of files). For the second case, if the buffer to process (including the prepended rest of the previous block) does not terminate with a [\r\n], just append one "virtually" (i.e. only for the source of substitution, but remove it from the substitution before writing the result in the new file).

Your code will then be much faster (it should be to process buffers of about 256KB(+prepended data from the previous block not terminated by a newline) without problem, and with less memory overhead and at a speed very close to the I/O limits on disk or many networks (RAID disks typically obtain their maximum reading/writing speed with block sizes about 64KB).

May be even 256KB is excessive if you have some memory constraints, then use 64KB, it will still be much faster and memory efficient than processing large files line by line with many temporary short strings that will harness the Lua garbage collector). Experiment with your environemetn what would be the fastest size, then look at the Luya memory overhead in the garbage collector statistic: the lower the blocksize, the more you'll have overheads in memory and the slower your code will be.

----

Note that the expression '%S+(%s*)$' may be very greedy in '%S+', and in '%s*' : it can collect arbitrarily long "words" (not-spaces) which could take significant space in memory but would result in non-sense output from what was actually an incorrect input (not the correct text format).

You may want to supply a reasonable maximum size for the final word, and then raise an error if some files happens to have too long "garbage" at end of lines. The same is true of (%s*) which may be arbitrarily long. So you may want to detect files (most probably garbage) that exhaust this maximum, by

- detecting files that cannot be valid UTF-8 text files and are most probably binary files, if they contain '[\0\240-\255]' (you may extend this set to other undesired ASCII controls, such as '[\0-\8\14-\31\127\240-\255]'), then
- detecting '%s{129}$' as invalid text and then
- detecting '%S{129}(%s{,128})$' as invalid text, before
- doing the actual substitution with '%S{,128}(%s{,128})$'

(change 128 and 129 above by the reasonable limits you accept for whitespaces at end of lines or for the last word to replace in these lines)

Reply | Threaded
Open this post in threaded view
|

Re: Replace the last item in a string of 'words'

Philippe Verdy
In reply to this post by Russell Haley
You did not need a Lua program then, Notepad++ would have done that directly by loading the 90 files and using its regexp, or you could have used "sed" on Linux.

Le mer. 26 juin 2019 à 22:56, Russell Haley <[hidden email]> a écrit :


On Wed, Jun 26, 2019 at 1:25 PM Philippe Verdy <[hidden email]> wrote:
Le mer. 26 juin 2019 à 20:50, Russell Haley <[hidden email]> a écrit :

%S didn't work because the file uses Windows line endings and I'm working in Ubuntu (thank you for the hint v).  As soon as I converted the line endings it works as expected. So I am now using:

local l = line:gsub("%S+\r$", "newval"..i)

Thing would have worked as expected if you had used  line:gsub('%S+(%s*)$', 'newval' .. i..'$1'),
because it would have preserved the line-ends (also other optional whitespaces at end of lines).
So you would not even have to convert the line ends between ISO/MIME/DOS/LegacyWindows and Linux.
Thanks Phillip, this did work as well. I've already re-processed the 90 files and sent them back to the client, but I'll know to check my email for this next time!
Russ  

But if you want performance for processing a whole file, your code should avoid splitting lines individidually, and should use large buffers (you'll split your buffer jut before the last occurence of '[\r\n]', and keep that in a small cache that will be prepended to the buffer you'll fill for the next block, until you reach end of file, which may not always be terminated by newlines but that will always match the "$").

In that case, you can use patterns compiled to work at line end boundaries or end of buffer (occuring only at end of files). For the second case, if the buffer to process (including the prepended rest of the previous block) does not terminate with a [\r\n], just append one "virtually" (i.e. only for the source of substitution, but remove it from the substitution before writing the result in the new file).

Your code will then be much faster (it should be to process buffers of about 256KB(+prepended data from the previous block not terminated by a newline) without problem, and with less memory overhead and at a speed very close to the I/O limits on disk or many networks (RAID disks typically obtain their maximum reading/writing speed with block sizes about 64KB).

May be even 256KB is excessive if you have some memory constraints, then use 64KB, it will still be much faster and memory efficient than processing large files line by line with many temporary short strings that will harness the Lua garbage collector). Experiment with your environemetn what would be the fastest size, then look at the Luya memory overhead in the garbage collector statistic: the lower the blocksize, the more you'll have overheads in memory and the slower your code will be.

----

Note that the expression '%S+(%s*)$' may be very greedy in '%S+', and in '%s*' : it can collect arbitrarily long "words" (not-spaces) which could take significant space in memory but would result in non-sense output from what was actually an incorrect input (not the correct text format).

You may want to supply a reasonable maximum size for the final word, and then raise an error if some files happens to have too long "garbage" at end of lines. The same is true of (%s*) which may be arbitrarily long. So you may want to detect files (most probably garbage) that exhaust this maximum, by

- detecting files that cannot be valid UTF-8 text files and are most probably binary files, if they contain '[\0\240-\255]' (you may extend this set to other undesired ASCII controls, such as '[\0-\8\14-\31\127\240-\255]'), then
- detecting '%s{129}$' as invalid text and then
- detecting '%S{129}(%s{,128})$' as invalid text, before
- doing the actual substitution with '%S{,128}(%s{,128})$'

(change 128 and 129 above by the reasonable limits you accept for whitespaces at end of lines or for the last word to replace in these lines)

Reply | Threaded
Open this post in threaded view
|

Re: Replace the last item in a string of 'words'

Russell Haley


On Wed, Jun 26, 2019 at 1:58 PM Philippe Verdy <[hidden email]> wrote:
You did not need a Lua program then, Notepad++ would have done that directly by loading the 90 files and using its regexp, or you could have used "sed" on Linux.
I considered Geany and sed but Lua is my preferred hammer. I really should learn sed...

:-)
Russ

Le mer. 26 juin 2019 à 22:56, Russell Haley <[hidden email]> a écrit :


On Wed, Jun 26, 2019 at 1:25 PM Philippe Verdy <[hidden email]> wrote:
Le mer. 26 juin 2019 à 20:50, Russell Haley <[hidden email]> a écrit :

%S didn't work because the file uses Windows line endings and I'm working in Ubuntu (thank you for the hint v).  As soon as I converted the line endings it works as expected. So I am now using:

local l = line:gsub("%S+\r$", "newval"..i)

Thing would have worked as expected if you had used  line:gsub('%S+(%s*)$', 'newval' .. i..'$1'),
because it would have preserved the line-ends (also other optional whitespaces at end of lines).
So you would not even have to convert the line ends between ISO/MIME/DOS/LegacyWindows and Linux.
Thanks Phillip, this did work as well. I've already re-processed the 90 files and sent them back to the client, but I'll know to check my email for this next time!
Russ  

But if you want performance for processing a whole file, your code should avoid splitting lines individidually, and should use large buffers (you'll split your buffer jut before the last occurence of '[\r\n]', and keep that in a small cache that will be prepended to the buffer you'll fill for the next block, until you reach end of file, which may not always be terminated by newlines but that will always match the "$").

In that case, you can use patterns compiled to work at line end boundaries or end of buffer (occuring only at end of files). For the second case, if the buffer to process (including the prepended rest of the previous block) does not terminate with a [\r\n], just append one "virtually" (i.e. only for the source of substitution, but remove it from the substitution before writing the result in the new file).

Your code will then be much faster (it should be to process buffers of about 256KB(+prepended data from the previous block not terminated by a newline) without problem, and with less memory overhead and at a speed very close to the I/O limits on disk or many networks (RAID disks typically obtain their maximum reading/writing speed with block sizes about 64KB).

May be even 256KB is excessive if you have some memory constraints, then use 64KB, it will still be much faster and memory efficient than processing large files line by line with many temporary short strings that will harness the Lua garbage collector). Experiment with your environemetn what would be the fastest size, then look at the Luya memory overhead in the garbage collector statistic: the lower the blocksize, the more you'll have overheads in memory and the slower your code will be.

----

Note that the expression '%S+(%s*)$' may be very greedy in '%S+', and in '%s*' : it can collect arbitrarily long "words" (not-spaces) which could take significant space in memory but would result in non-sense output from what was actually an incorrect input (not the correct text format).

You may want to supply a reasonable maximum size for the final word, and then raise an error if some files happens to have too long "garbage" at end of lines. The same is true of (%s*) which may be arbitrarily long. So you may want to detect files (most probably garbage) that exhaust this maximum, by

- detecting files that cannot be valid UTF-8 text files and are most probably binary files, if they contain '[\0\240-\255]' (you may extend this set to other undesired ASCII controls, such as '[\0-\8\14-\31\127\240-\255]'), then
- detecting '%s{129}$' as invalid text and then
- detecting '%S{129}(%s{,128})$' as invalid text, before
- doing the actual substitution with '%S{,128}(%s{,128})$'

(change 128 and 129 above by the reasonable limits you accept for whitespaces at end of lines or for the last word to replace in these lines)

Reply | Threaded
Open this post in threaded view
|

Re: Replace the last item in a string of 'words'

Philippe Verdy
Learning sed is essentially learning the more common "regexps" (used in so many tools for "lunic's/null'ix/nil'ux/l'inox" and many text editors, including vi(m), emacs, and many other visual editors for X-based graphic desktops).

Many people (and I think all programmers) can't "live" without the more conventional (and more powerful) regexps, but most can live without the (severely limited) Lua patterns !

On opensourced versions of sed and all version for Linux, sed can use Perl-style regexps, or several regexp dialects, including older BSD style or newer PCRE, which are very similar and in fact equivalent for your goal, as these differences are only in advanced greedy options, or variable substitution, and that also supports the Shell-style file patterns; if Lua resists for long enough, sed/ed/vi/vim may integrate the Lua pattern style.

Note however that basic sed syntax does not let you change the value of some field in the replacement string with a computed variable (which requires some minimum scripting support for such custom computing function, like a simple counter value, or like a custom text transform that is not just a basic one-to-one remapping of letter case).

But you've got that shell scripting support in Bash which also supports the same regexps (so much that sed has been inlined into Bash or Busybox) !



Le mer. 26 juin 2019 à 23:09, Russell Haley <[hidden email]> a écrit :


On Wed, Jun 26, 2019 at 1:58 PM Philippe Verdy <[hidden email]> wrote:
You did not need a Lua program then, Notepad++ would have done that directly by loading the 90 files and using its regexp, or you could have used "sed" on Linux.
I considered Geany and sed but Lua is my preferred hammer. I really should learn sed...

:-)
Russ

Le mer. 26 juin 2019 à 22:56, Russell Haley <[hidden email]> a écrit :


On Wed, Jun 26, 2019 at 1:25 PM Philippe Verdy <[hidden email]> wrote:
Le mer. 26 juin 2019 à 20:50, Russell Haley <[hidden email]> a écrit :

%S didn't work because the file uses Windows line endings and I'm working in Ubuntu (thank you for the hint v).  As soon as I converted the line endings it works as expected. So I am now using:

local l = line:gsub("%S+\r$", "newval"..i)

Thing would have worked as expected if you had used  line:gsub('%S+(%s*)$', 'newval' .. i..'$1'),
because it would have preserved the line-ends (also other optional whitespaces at end of lines).
So you would not even have to convert the line ends between ISO/MIME/DOS/LegacyWindows and Linux.
Thanks Phillip, this did work as well. I've already re-processed the 90 files and sent them back to the client, but I'll know to check my email for this next time!
Russ  

But if you want performance for processing a whole file, your code should avoid splitting lines individidually, and should use large buffers (you'll split your buffer jut before the last occurence of '[\r\n]', and keep that in a small cache that will be prepended to the buffer you'll fill for the next block, until you reach end of file, which may not always be terminated by newlines but that will always match the "$").

In that case, you can use patterns compiled to work at line end boundaries or end of buffer (occuring only at end of files). For the second case, if the buffer to process (including the prepended rest of the previous block) does not terminate with a [\r\n], just append one "virtually" (i.e. only for the source of substitution, but remove it from the substitution before writing the result in the new file).

Your code will then be much faster (it should be to process buffers of about 256KB(+prepended data from the previous block not terminated by a newline) without problem, and with less memory overhead and at a speed very close to the I/O limits on disk or many networks (RAID disks typically obtain their maximum reading/writing speed with block sizes about 64KB).

May be even 256KB is excessive if you have some memory constraints, then use 64KB, it will still be much faster and memory efficient than processing large files line by line with many temporary short strings that will harness the Lua garbage collector). Experiment with your environemetn what would be the fastest size, then look at the Luya memory overhead in the garbage collector statistic: the lower the blocksize, the more you'll have overheads in memory and the slower your code will be.

----

Note that the expression '%S+(%s*)$' may be very greedy in '%S+', and in '%s*' : it can collect arbitrarily long "words" (not-spaces) which could take significant space in memory but would result in non-sense output from what was actually an incorrect input (not the correct text format).

You may want to supply a reasonable maximum size for the final word, and then raise an error if some files happens to have too long "garbage" at end of lines. The same is true of (%s*) which may be arbitrarily long. So you may want to detect files (most probably garbage) that exhaust this maximum, by

- detecting files that cannot be valid UTF-8 text files and are most probably binary files, if they contain '[\0\240-\255]' (you may extend this set to other undesired ASCII controls, such as '[\0-\8\14-\31\127\240-\255]'), then
- detecting '%s{129}$' as invalid text and then
- detecting '%S{129}(%s{,128})$' as invalid text, before
- doing the actual substitution with '%S{,128}(%s{,128})$'

(change 128 and 129 above by the reasonable limits you accept for whitespaces at end of lines or for the last word to replace in these lines)

Reply | Threaded
Open this post in threaded view
|

Re: Replace the last item in a string of 'words'

Jim-2
In reply to this post by v
26.06.2019, 20:27, "v" <[hidden email]>:
                                         ^^^^^^^^^^^^

> v <[hidden email]>
          ^^^^^^^^^^^^

is that your date of birth ?


Reply | Threaded
Open this post in threaded view
|

Re: Replace the last item in a string of 'words'

Jim-2
In reply to this post by Philippe Verdy
26.06.2019, 23:32, "Philippe Verdy" <[hidden email]>:
> Many people (and I think all programmers) can't "live" without
> the more conventional (and more powerful) regexps, but most
> can live without the (severely limited) Lua patterns !

they are indeed too limited, even for an occasional regexp user like me.
even very basic functionality like pattern alternatives is lacking.

> But you've got that shell scripting support in Bash which also
> supports the same regexps

since when does bash support regex ?
zsh has them though, they use PCRE or just plain posix re i guess.

> (so much that sed has been inlined into Bash or Busybox) !

not into bash but into BusyBox (should also be part of ToyBox
since sed is required by POSIX).

i wonder how Lua became Russ' "preferred hammer" for regex pattern
matching tasks ?? :-/

Philippe:

1.2 FREEMAIL_FROM
Sender email is commonly abused enduser mail provider (verdyp[at]gmail.com)

what have you done to the list blockwarts again ? :D


Reply | Threaded
Open this post in threaded view
|

Re: Replace the last item in a string of 'words'

Russell Haley


On Thu, Jun 27, 2019 at 7:16 AM Jim <[hidden email]> wrote:
26.06.2019, 23:32, "Philippe Verdy" <[hidden email]>:
> Many people (and I think all programmers) can't "live" without
> the more conventional (and more powerful) regexps, but most
> can live without the (severely limited) Lua patterns !

they are indeed too limited, even for an occasional regexp user like me.
even very basic functionality like pattern alternatives is lacking.

> But you've got that shell scripting support in Bash which also
> supports the same regexps

since when does bash support regex ?
zsh has them though, they use PCRE or just plain posix re i guess.

> (so much that sed has been inlined into Bash or Busybox) !

not into bash but into BusyBox (should also be part of ToyBox
since sed is required by POSIX).

i wonder how Lua became Russ' "preferred hammer" for regex pattern
matching tasks ?? :-/
You missed my message where I said "I wasn't very good at this?" ha ha. 

Joking aside, I like writing Lua so I usually attempt to solve a problem in Lua first. I keep threatening to learn LPEG but the tools in standard Lua suit my needs just fine. Some time ago there was a post here about a PCRE library wrapper, but again, I haven't needed it.

Russ


Philippe:

1.2 FREEMAIL_FROM
Sender email is commonly abused enduser mail provider (verdyp[at]gmail.com)

what have you done to the list blockwarts again ? :D


Reply | Threaded
Open this post in threaded view
|

Re: Replace the last item in a string of 'words'

Philippe Verdy
In reply to this post by Jim-2
No, it's a replaced alias for my real gmail, and it has nothing to do with a birth date or subscription date.
But I've not posted using this address I don't know (that's the first time I see it). It's may be an alias created by this mailing list (to protect the privacy of subscribers), or possibly by Google (or created by this list on Gmail with a redirect specific for this list, possibly following the instructions Gmail gave to list maintainers, or because this Gmail address accepts messages redirected from other mail accounts I own), but I was not aware of this fact. I don't know if this alias is valid and will finally reach me, I've not tested it.
Even my true name (before the mail address) has been aliased to a single letter.

Le jeu. 27 juin 2019 à 14:21, Jim <[hidden email]> a écrit :
26.06.2019, 20:27, "v" <[hidden email]>:
                                         ^^^^^^^^^^^^
> v <[hidden email]>
          ^^^^^^^^^^^^
is that your date of birth ?
Reply | Threaded
Open this post in threaded view
|

Re: Replace the last item in a string of 'words'

Philippe Verdy
I've tested it and visibly it goes to someone else and does not reach me. So someone one listening this list created this alias on gmail and configured it to redirect to you some message I posted with my real address. Or may be you used some third party agent that change the real poster address and name. I received your query only via the regular Lua mail list on my regular Gmail address.

If you have doubts, look at tracking MIME headers, to see if they are legitimate and were not fabricated.

Check the IP, it's probably not even in my area (I checked the Gmail account and nothing is said about this supposed "alias"): If this was created by the Lua mailing list owner, I was never informed of this fact and they control that address and the IP should match the IP used by the Lua mailing list agent. May be that IP belongs to you or to the third party agent you use.

My opinion is that this "v" is another user that subscribed to this list with his own Gmail account (and Gmail gives it another avatar, a drawn cat, striped in yellow and red holding socks, over a sunny yellow background). And Google tracks it with this info:

Received: from v-home ([46.61.242.47]) by smtp.googlemail.com with ESMTPSA id y12sm2955767lfy.36.2019.06.26.11.27.02 for <[hidden email]> (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Wed, 26 Jun 2019 11:27:03 -0700 (PDT)

That "v" user is in Russia according to IP-Whois, and posted via an origin SMTP server in the "rt.ru" domain (Rostelecom). I've got nothing in Russia.


Le jeu. 27 juin 2019 à 20:26, Philippe Verdy <[hidden email]> a écrit :
No, it's a replaced alias for my real gmail, and it has nothing to do with a birth date or subscription date.
But I've not posted using this address I don't know (that's the first time I see it). It's may be an alias created by this mailing list (to protect the privacy of subscribers), or possibly by Google (or created by this list on Gmail with a redirect specific for this list, possibly following the instructions Gmail gave to list maintainers, or because this Gmail address accepts messages redirected from other mail accounts I own), but I was not aware of this fact. I don't know if this alias is valid and will finally reach me, I've not tested it.
Even my true name (before the mail address) has been aliased to a single letter.

Le jeu. 27 juin 2019 à 14:21, Jim <[hidden email]> a écrit :
26.06.2019, 20:27, "v" <[hidden email]>:
                                         ^^^^^^^^^^^^
> v <[hidden email]>
          ^^^^^^^^^^^^
is that your date of birth ?
Reply | Threaded
Open this post in threaded view
|

Re: Replace the last item in a string of 'words'

Jim-2
In reply to this post by Philippe Verdy
27.06.2019, 20:26, "Philippe Verdy" <[hidden email]>:
> No, it's a replaced alias for my real gmail, and it has nothing
> to do with a birth date or subscription date.

so it was you who sent that mails. you successfully fooled us.

> But I've not posted using this address I don't know
> (that's the first time I see it).

good to know you are in full control of your mail tool. :D

> It may be an alias created by this mailing list
> (to protect the privacy of subscribers), or possibly by Google

very possible it is the latter.
why are you posting thru their servers ?

> (or created by this list on Gmail with a redirect specific
> for this list, possibly following the instructions Gmail gave
> to list maintainers,

this can be definitely ruled out since this mailing list mercifully
has nothing to do with gmail.com.

> or because this Gmail address accepts messages redirected
> from other mail accounts I own)

have you created some redirect loops ?

> but I was not aware of this fact.
> I don't know if this alias is valid and will finally reach me,
> I've not tested it.

good to know you are in full control of your mail setup. :D

> Even my true name (before the mail address) has been
> aliased to a single letter.

what have they done to you !!!

0.2 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level
 mail domains are different

*********************************************************************************************

1.2 FREEMAIL_FROM          Sender email is commonly abused enduser mail
 provider (verdyp[at]gmail.com)

*********************************************************************************************

0.3 HTML_MESSAGE           BODY: HTML included in message

see ? everybody hates HTML included in messages.

0.2 PPF_NUMERIC_ENTITY     RAW: Body contains numeric HTML entities

0.2 FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and
 EnvelopeFrom freemail headers are different

as always: thanks for keeping the silly season interesting,
we missed you for a while.

i am very interested in the puzzle's solution.
what about posting directly via the gmail.com "basic" webinterface ?


12