split method?

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

split method?

Dave Collins

I’ve used this split (string to table) method I found here: http://stackoverflow.com/questions/1426954/split-string-in-lua.

 

I’ve got it working, but I don’t understand it.

 

function string:split(delimiter)

      local result = { }  

      local from  = 1  

      local delim_from, delim_to = string.find( self, delimiter, from  )  

      while delim_from do    

            table.insert( result, string.sub( self, from , delim_from-1 ) )    

            from  = delim_to + 1    

            delim_from, delim_to = string.find( self, delimiter, from  )  

      end  

      table.insert( result, string.sub( self, from  ) )  

      return result

end

 

This is how I’m calling it:

 

local myDateString = "2011-06-21"

local myDateTbl = myDateString.split(myDateString,"-")

dump_table(myDateTbl)

 

 

1=2011

2=06

3=21

 

Two things I don’t understand:

1] When I call it, I reference it twice: myDateString.split(myDateString,"-")Why? Or should I just do this:  string.split(myDateString,"-") Which works, so I guess the answer is yes.

 

2] The function takes one parameter (delimiter), but I must pass it two (myDateString, "-"). If I only pass it the delimiter string like it seems to want, it breaks. Why does it take two when it looks like it takes one?

 

Are these both occurring because I’m not really writing a function, I’m really modifying the string prototype?

 

 

Dave Collins

Front-End Engineer

Mercatus Technologies Inc.
60 Adelaide Street East, Suite 700
Toronto ON M5C 3E4
T  416 603 3406 x 298
F  416 603 1790

 

[hidden email]

www.mercatustechnologies.com

 

Reply | Threaded
Open this post in threaded view
|

print non-kb characters

Dave Collins

I think this is so basic that Google searches are second-guessing I want a more advanced answer.

 

How do I print non-kb characters? I just want to print an ‘en dash’!

 

 

Dave Collins

Front-End Engineer

Mercatus Technologies Inc.
60 Adelaide Street East, Suite 700
Toronto ON M5C 3E4
T  416 603 3406 x 298
F  416 603 1790

 

[hidden email]

www.mercatustechnologies.com

Reply | Threaded
Open this post in threaded view
|

Re: print non-kb characters

KHMan
On 6/28/2011 1:00 AM, Dave Collins wrote:
> I think this is so basic that Google searches are second-guessing
> I want a more advanced answer.
>
> How do I print non-kb characters? I just want to print an ‘en dash’!

Modern *nix terminals should be using UTF-8 already. On WinXP
console, you need to switch to UTF-8 (code page 65001), then just
write out the UTF-8 bytes. Something like that, perhaps?

--
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia

Reply | Threaded
Open this post in threaded view
|

Re: split method?

Lorenzo Donati-2
In reply to this post by Dave Collins
Please, switch your mail client to plaint text.
HTML posting is considered bad netiquette in this list.

On 27/06/2011 18.37, Dave Collins wrote:
> I’ve used this split (string to table) method I found here:
> http://stackoverflow.com/questions/1426954/split-string-in-lua.
>
> I’ve got it working, but I don’t understand it.
>

[snip]

First of all you must understand that ":" is Lua way to call a method
for an object, but under the hood is just syntactic sugar.

The *definition*

function string:split(delimiter)
...
end

is equivalent to

string.split = function( self, delimiter )
...
end

i.e. you are assigning to the field named "split" of table "string" a
new function whose first argument (automatically inserted by Lua under
the hood) is called "self" (it is like "this" in C++ or Java - it
references the object on which the method is called).


>
> This is how I’m calling it:
>
> *local*myDateString = "2011-06-21"
>
> *local*myDateTbl = myDateString.split(myDateString,"-")


When defining a new method this way, it is customary to call it using
OOP syntax, i.e.:

myDateString:split("-")

but even here, at the call site, ":" is syntactic sugar for

myDateString.split(myDateString,"-")

which means:

1. get the field "split" from table myDateString
2. call it passing it two arguments: myDateString and "_"

Regarding 1: since myDateString is a string and every string has by
default the table "string" as metatable [1], if a field ("split" in this
case) is requested, the field is looked-up in the metatable. In this
case the previously defined function (string.split) is returned.

Regarding 2: since that function takes two parameter, self and
delimiter, they are assigned the arguments from the call; therefore
self gets myDateString and delimiter gets "-" as values.

>
> dump_table(myDateTbl)
>
> 1=2011
>
> 2=06
>
> 3=21
>
> Two things I don’t understand:
>
> 1] When I call it, I reference it twice:
> myDateString.split(myDateString,"-")Why? Or should I just do this:
> string.split(myDateString,"-") Which works, so I guess the answer is yes.

both are correct, if you followed the explanation above; as I said, it
is less verbose and more OOP to say:

myDateString:split("-")


>
> 2] The function takes one parameter (delimiter), but I must pass it two
> (myDateString,"-"). If I only pass it the delimiter string like it seems
> to want, it breaks. Why does it take two when it looks like it takes one?
>

This happens because you define the function using the OOP style, so the
parameter self is hidden.

Lua doesn't have "classes" and so knows almost nothing about "methods",
it only has syntactic facilities to help with OOP style.

A method is just a regular function whose first parameter is the object
Even the name "self" is conventional. The Lua uses this name when
defining a function with the OO style, but if you want to be explicit
you could also use other names. E.g.:

string.split = function( obj, delimiter )
...
-- replace any "self" with "obj" and it will work the same
end



> Are these both occurring because I’m not really writing a function, I’m
> really modifying the string prototype?

Well, Lua has no prototypes, it has metatables. But in a way, yes, you
are adding custom behaviour to the string type.

Note that by someone modifying or adding items to the default libraries
(such as the "string" table) is considered bad practice. Expecially in
large scale apps.


>
> *Dave Collins*

Cheers
-- Lorenzo


[1] http://www.lua.org/manual/5.1/manual.html#2.8


Reply | Threaded
Open this post in threaded view
|

Re: split method?

Norbert Kiesel
In reply to this post by Dave Collins
On Mon, 2011-06-27 at 12:37 -0400, Dave Collins wrote:
> I’ve used this split (string to table) method I found here:
> http://stackoverflow.com/questions/1426954/split-string-in-lua.
>
>  
>
> I’ve got it working, but I don’t understand it.

Read the manual again, especially the part about Function calls (2.5.8).

>
>  
>
> function string:split(delimiter)
>
>       local result = { }  
>
>       local from  = 1  
>
>       local delim_from, delim_to = string.find( self, delimiter, from
> )  
>
>       while delim_from do    
>
>             table.insert( result, string.sub( self, from ,
> delim_from-1 ) )    
>
>             from  = delim_to + 1    
>
>             delim_from, delim_to = string.find( self, delimiter, from
> )  
>
>       end  
>
>       table.insert( result, string.sub( self, from  ) )  
>
>       return result
>
> end
>
>  
>
> This is how I’m calling it:
>
>  
>
> local myDateString = "2011-06-21"
>
> local myDateTbl = myDateString.split(myDateString,"-")

You could (should?) call it like that:

local myDateTbl = myDateString:split("-")

Function definitions using : add an implicit first parameter called
"self".  Again, read section 2.5.8 of the manual (or the relevant
sections from the wonderful Lua Wiki).

</nk>





Reply | Threaded
Open this post in threaded view
|

Re: split method?

Wesley Smith
In reply to this post by Dave Collins
> I’ve got it working, but I don’t understand it.
>
>
>
> function string:split(delimiter)
>
>       local result = { }
>
>       local from  = 1
>
>       local delim_from, delim_to = string.find( self, delimiter, from  )
>
>       while delim_from do
>
>             table.insert( result, string.sub( self, from , delim_from-1 )
> )
>
>             from  = delim_to + 1
>
>             delim_from, delim_to = string.find( self, delimiter, from  )
>
>       end
>
>       table.insert( result, string.sub( self, from  ) )
>
>       return result
>
> end
>
>
>
> This is how I’m calling it:
>
>
>
> local myDateString = "2011-06-21"
>
> local myDateTbl = myDateString.split(myDateString,"-")
>
> dump_table(myDateTbl)
>


I think this function could be a lot simpler.  string.gmatch is more
suited to this type of problem than string.find IMHO:

local myDateString = "2011-06-21"

function string:split(delim)
        local res = {}
        for v in self:gmatch(string.format("([^%s]+)%s?", delim, delim)) do
                res[#res+1] = v
        end
        return res
end

local myDateTbl = myDateString.split(myDateString,"-")
print(unpack(myDateTbl))

Reply | Threaded
Open this post in threaded view
|

RE: split method?

Dave Collins
In reply to this post by Lorenzo Donati-2
Lorenzo:
> Please, switch your mail client to plain text.

Sorry 'bout that.

> You could (should?) call it like that:
> local myDateTbl = myDateString:split("-")

Oh. Now I see why it didn’t work when I first tried it. I was trying to do myDateString.split("-"). I didn't see the :.

I'm noo(b).


Wesley:
> I think this function could be a lot simpler.  string.gmatch is more suited to this type of problem than string.find IMHO:
> ... gmatch(string.format("([^%s]+)%s?", delim, delim)) ...

I try to avoid regex's. Just because a block of code is shorter doesn't mean it’s simpler. I find them largely opaque to easy comprehension.


Thanks all. You guys are a big help to a Lua noob.


Dave

Reply | Threaded
Open this post in threaded view
|

RE: print non-kb characters

Dave Collins
In reply to this post by KHMan
> Modern *nix terminals should be using UTF-8 already. On WinXP
> console, you need to switch to UTF-8 (code page 65001), then just
> write out the UTF-8 bytes. Something like that, perhaps?

Let's pretend we're waaaay overthinking this.

Let's try this:

print ("-" .. " is a hyphen")
print ( ?? .. " is an n-dash")

Dave

Reply | Threaded
Open this post in threaded view
|

Re: print non-kb characters

Rob Kendrick-2
On Mon, Jun 27, 2011 at 02:40:53PM -0400, Dave Collins wrote:

> > Modern *nix terminals should be using UTF-8 already. On WinXP
> > console, you need to switch to UTF-8 (code page 65001), then just
> > write out the UTF-8 bytes. Something like that, perhaps?
>
> Let's pretend we're waaaay overthinking this.
>
> Let's try this:
>
> print ("-" .. " is a hyphen")
> print ( ?? .. " is an n-dash")

But we're not.  The value you need to put in place of ?? is dependant on
your terminal and which character encoding/code page it uses.  If it's
UTF-8, then your life's pretty easy.

B.

Reply | Threaded
Open this post in threaded view
|

Re: print non-kb characters

KHMan
In reply to this post by Dave Collins
On 6/28/2011 2:40 AM, Dave Collins wrote:

>> Modern *nix terminals should be using UTF-8 already. On WinXP
>> console, you need to switch to UTF-8 (code page 65001), then just
>> write out the UTF-8 bytes. Something like that, perhaps?
>
> Let's pretend we're waaaay overthinking this.
>
> Let's try this:
>
> print ("-" .. " is a hyphen")
> print ( ?? .. " is an n-dash")

Without more information, we can only guess at what platform or
software you are using.

You need a UTF-8 aware editor, for starters, or you can use
escaped character codes in strings.

--
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia

Reply | Threaded
Open this post in threaded view
|

Re: split method?

Wesley Smith
In reply to this post by Dave Collins
> Wesley:
>> I think this function could be a lot simpler.  string.gmatch is more suited to this type of problem than string.find IMHO:
>> ... gmatch(string.format("([^%s]+)%s?", delim, delim)) ...
>
> I try to avoid regex's. Just because a block of code is shorter doesn't mean it’s simpler. I find them largely opaque to easy comprehension.
>

To each his own.  For "-" it ends up being "([^-]+)-?".  I too used to
be afraid of string patterns, but once you get over that, they're
quite pleasant to work with.  FWIW, Lua's patterns are a much reduced
set of ops compared to full blown regexs, making it much easier to
understand them.  Also, for the record I didn't say it was simpler
because it was shorter.  I said it's simpler because string.gmatch
maps to your problem better.  The shortness comes as a consequence.

wes

Reply | Threaded
Open this post in threaded view
|

RE: print non-kb characters

Dave Collins
In reply to this post by KHMan
>> Let's pretend we're waaaay overthinking this.
>>
>> Let's try this:
>>
v> print ("-" .. " is a hyphen")
>> print ( ?? .. " is an n-dash")
>
>Without more information, we can only guess at what platform or
>software you are using.
>
>You need a UTF-8 aware editor, for starters, or you can use
>escaped character codes in strings.

Well, the target platform is Win CE; I'm developing on XP.

If I've got to worry about character sets then I'm just going to tell the design team they can't have en-dashes - they're getting hyphens. >:(


Dave


Reply | Threaded
Open this post in threaded view
|

Re: print non-kb characters

KHMan
On 6/28/2011 3:02 AM, Dave Collins wrote:

>>> Let's pretend we're waaaay overthinking this.
>>>
>>> Let's try this:
>>>
> v>  print ("-" .. " is a hyphen")
>>> print ( ?? .. " is an n-dash")
>>
>> Without more information, we can only guess at what platform or
>> software you are using.
>>
>> You need a UTF-8 aware editor, for starters, or you can use
>> escaped character codes in strings.
>
> Well, the target platform is Win CE; I'm developing on XP.
>
> If I've got to worry about character sets then I'm just going to tell the design team they can't have en-dashes - they're getting hyphens.>:(

I suppose it has UTF-16, right? If you can output UTF-16 glyphs in
whatever window interface you have, then you only need to put in
the appropriate (double) bytes in a Lua string to pass onto
whatever you are using to display text on your app. That's one
scenario of the many possible ones.

I described the WinXP console as one usage scenario given the lack
of information on your problem. Switching console code page is
simply an easy way to see Unicode filenames on XP.

If you are printing ASCII or single-byte-single-character strings
using Lua, then your API may be using 8-bit code page Win32 calls.
So, it is possible on Win32 that your API calls may hamper display
of Unicode characters, dunno if it's the same on Win CE.

So it all depends... good luck...

--
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia

Reply | Threaded
Open this post in threaded view
|

RE: print non-kb characters

Dave Collins

> I suppose it has UTF-16, right? If you can output UTF-16 glyphs in
whatever window interface you have, then you only need to put in
the appropriate (double) bytes in a Lua string to pass onto
whatever you are using to display text on your app. That's one
scenario of the many possible ones.

Well, I have yet to find a description of how to actually output a code, or a list of what the codes are.

If this were HTML/JavaScript I would simply write: document.write("\u2013"). The character codes are listed like this (http://www.ascii.cl/htmlcodes.htm), and I simply write out the string.

If this were Visual Basic, I would simply write: print chr(150). The ASCII codes are listed like this (http://yorktown.cbe.wwu.edu/sandvig/docs/ASCIICodes.aspx), and I simply use the chr() function to print them.


My book tells me I can escape characters like this: \99, which outputs "c". Since "c" is 99 and [en dash] is 150, presumably I could simply type \150. But it does not work.

Dave


Reply | Threaded
Open this post in threaded view
|

Re: print non-kb characters

KHMan
On 6/28/2011 3:50 AM, Dave Collins wrote:
>
>> I suppose it has UTF-16, right? If you can output UTF-16 glyphs in
> whatever window interface you have, then you only need to put in
> the appropriate (double) bytes in a Lua string to pass onto
> whatever you are using to display text on your app. That's one
> scenario of the many possible ones.
>
> Well, I have yet to find a description of how to actually output a code, or a list of what the codes are.

Whether your app can or cannot display Unicode text does not
depend on Lua; it depends on what C library call or Win32 API or
similar calls you are using to output text, based on how Lua is
integrated. print() to console? Win32 calls in WM_PAINT? Do you
know what UTF-16 or UTF-8 is? Or what is the level of Unicode
support in Win CE etc?

IMHO only you can know or find out what the capabilities of the
specific, particular setup you are using. That's about all I can
say... perhaps others on the list can offer something more.

--
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia

Reply | Threaded
Open this post in threaded view
|

Re: print non-kb characters

David Walker
I did this on my Mac OS-X lua. It is UTF-8.

> print("\226\128\147")

>

FWIW, you can use the following calculator to get the UTF-8 equivalents.

http://amolip.de/Projects/UniCalc/UniCalc.html

On Mon, Jun 27, 2011 at 1:11 PM, KHMan <[hidden email]> wrote:

>
> On 6/28/2011 3:50 AM, Dave Collins wrote:
>>
>>> I suppose it has UTF-16, right? If you can output UTF-16 glyphs in
>>
>> whatever window interface you have, then you only need to put in
>> the appropriate (double) bytes in a Lua string to pass onto
>> whatever you are using to display text on your app. That's one
>> scenario of the many possible ones.
>>
>> Well, I have yet to find a description of how to actually output a code, or a list of what the codes are.
>
> Whether your app can or cannot display Unicode text does not depend on Lua; it depends on what C library call or Win32 API or similar calls you are using to output text, based on how Lua is integrated. print() to console? Win32 calls in WM_PAINT? Do you know what UTF-16 or UTF-8 is? Or what is the level of Unicode support in Win CE etc?
>
> IMHO only you can know or find out what the capabilities of the specific, particular setup you are using. That's about all I can say... perhaps others on the list can offer something more.
>
> --
> Cheers,
> Kein-Hong Man (esq.)
> Kuala Lumpur, Malaysia
>

Reply | Threaded
Open this post in threaded view
|

Re: print non-kb characters

Philippe Lhoste
In reply to this post by Dave Collins
On 27/06/2011 21:50, Dave Collins wrote:
> If this were HTML/JavaScript I would simply write: document.write("\u2013"). The
> character codes are listed like this (http://www.ascii.cl/htmlcodes.htm), and I simply
> write out the string.

Made a quick test:

if io.output("Test.txt") then
   io.write("\254\255\32\19")
   io.close()
end

I get an UTF-16 file (with Bom) with an ndash.

--
Philippe Lhoste
--  (near) Paris -- France
--  http://Phi.Lho.free.fr
--  --  --  --  --  --  --  --  --  --  --  --  --  --


Reply | Threaded
Open this post in threaded view
|

Re: split method?

Philippe Lhoste
In reply to this post by Dave Collins
On 27/06/2011 20:37, Dave Collins wrote:
> I try to avoid regex's. Just because a block of code is shorter doesn't mean it’s
> simpler. I find them largely opaque to easy comprehension.

They are opaque only for the uninitiated, or when getting too large... Which is mostly
avoided in Lua because of lack of alternative (the pipe).
Documenting/commenting created regexes can help too... :-)

--
Philippe Lhoste
--  (near) Paris -- France
--  http://Phi.Lho.free.fr
--  --  --  --  --  --  --  --  --  --  --  --  --  --


Reply | Threaded
Open this post in threaded view
|

RE: split method?

Dave Collins
In reply to this post by Norbert Kiesel
Looks like my split technique doesn’t work on "."


function string:split(delimiter)
        local result = { }  
        local from  = 1  
        local delim_from, delim_to = string.find( self, delimiter, from  )  
        while delim_from do    
                table.insert( result, string.sub( self, from , delim_from-1 ) )    
                from  = delim_to + 1    
                delim_from, delim_to = string.find( self, delimiter, from  )  
        end  
        table.insert( result, string.sub( self, from  ) )  
        return result
end


local foo = "foo-bar"
dump_table(foo:split("-"))

1=foo
2=bar

local foo = "foo~bar"
dump_table(foo:split("~"))

1=foo
2=bar

local foo = "foo.bar"
dump_table(foo:split("."))

1=
2=
3=
4=
5=
6=
7=
8=

Sigh. Back to the drawing board...



 
Dave Collins
Front-End Engineer
Mercatus Technologies Inc.
60 Adelaide Street East, Suite 700
Toronto ON M5C 3E4
T  416 603 3406 x 298
F  416 603 1790

[hidden email]
www.mercatustechnologies.com

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Norbert Kiesel
Sent: Monday, June 27, 2011 2:06 PM
To: [hidden email]
Subject: Re: split method?

On Mon, 2011-06-27 at 12:37 -0400, Dave Collins wrote:
> I’ve used this split (string to table) method I found here:
> http://stackoverflow.com/questions/1426954/split-string-in-lua.
>
>  
>
> I’ve got it working, but I don’t understand it.

Read the manual again, especially the part about Function calls (2.5.8).

>
>  
>
> function string:split(delimiter)
>
>       local result = { }  
>
>       local from  = 1  
>
>       local delim_from, delim_to = string.find( self, delimiter, from
> )  
>
>       while delim_from do    
>
>             table.insert( result, string.sub( self, from ,
> delim_from-1 ) )    
>
>             from  = delim_to + 1    
>
>             delim_from, delim_to = string.find( self, delimiter, from
> )  
>
>       end  
>
>       table.insert( result, string.sub( self, from  ) )  
>
>       return result
>
> end
>
>  
>
> This is how I’m calling it:
>
>  
>
> local myDateString = "2011-06-21"
>
> local myDateTbl = myDateString.split(myDateString,"-")

You could (should?) call it like that:

local myDateTbl = myDateString:split("-")

Function definitions using : add an implicit first parameter called
"self".  Again, read section 2.5.8 of the manual (or the relevant
sections from the wonderful Lua Wiki).

</nk>





Reply | Threaded
Open this post in threaded view
|

RE: split method?

Dave Collins
In reply to this post by Wesley Smith
Thanks Wesley, I'm going to use yours after all. Despite being a "black box", it does work.

Dave

>Wesley Smith:
>I think this function could be a lot simpler.  string.gmatch is more
>suited to this type of problem than string.find IMHO:
>
>local myDateString = "2011-06-21"
>
>function string:split(delim)
> local res = {}
> for v in self:gmatch(string.format("([^%s]+)%s?", delim, delim)) do
> res[#res+1] = v
> end
> return res
>end
>
>local myDateTbl = myDateString.split(myDateString,"-")
>print(unpack(myDateTbl))


12