check the ratio of numeric/alphanumric words in given string

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

check the ratio of numeric/alphanumric words in given string

DouOlivia

Hi there,

Should have performed a search in the list first, but I’m afraid I cannot come out with a good query. Here’s the question: Is there a library that can calculate the ratio of numberic or alphanumeric words in a given string (the actual scenario is to use a document’s content so the string will be very long)? E.g  for string “34 aaa A-7SXD bbb ccc”, the ratio is calculated as 0.4.

 

Thanks.

 
Reply | Threaded
Open this post in threaded view
|

Re: check the ratio of numeric/alphanumric words in given string

Rena
On Wed, Jan 21, 2015 at 2:33 AM, DouOlivia <[hidden email]> wrote:

> Hi there,
>
> Should have performed a search in the list first, but I’m afraid I cannot
> come out with a good query. Here’s the question: Is there a library that can
> calculate the ratio of numberic or alphanumeric words in a given string (the
> actual scenario is to use a document’s content so the string will be very
> long)? E.g  for string “34 aaa A-7SXD bbb ccc”, the ratio is calculated as
> 0.4.
>
>
>
> Thanks.
>
>

Seems like a simple task for Lua's pattern matching:

function countMatches(str, pat)
    local c = 0
    for word in str:gmatch(pat) do c = c + 1 end
    return c
end

testStr = "34 aaa A-7SXD bbb ccc"
local ratio = countMatches(testStr, '%d+') / countMatches(testStr, '[%a%p]+')
assert(ratio == 0.4)

--
Sent from my Game Boy.

Reply | Threaded
Open this post in threaded view
|

Re: check the ratio of numeric/alphanumric words in given string

Matthew Wild
On 21 January 2015 at 10:29, Rena <[hidden email]> wrote:

> On Wed, Jan 21, 2015 at 2:33 AM, DouOlivia <[hidden email]> wrote:
>> Hi there,
>>
>> Should have performed a search in the list first, but I’m afraid I cannot
>> come out with a good query. Here’s the question: Is there a library that can
>> calculate the ratio of numberic or alphanumeric words in a given string (the
>> actual scenario is to use a document’s content so the string will be very
>> long)? E.g  for string “34 aaa A-7SXD bbb ccc”, the ratio is calculated as
>> 0.4.
>
> Seems like a simple task for Lua's pattern matching:
>
> function countMatches(str, pat)
>     local c = 0
>     for word in str:gmatch(pat) do c = c + 1 end
>     return c
> end

Couldn't resist... this version of countMatches() is shorter and
faster (on my machine at least) in long documents:

function countMatches(str, pat)
    return select(2, str:gsub(pat, "%0"))
end

function ratio(text)
        return countMatches(text, '%d+') / countMatches(text, '[%a%p]+')
end

Regards,
Matthew

Reply | Threaded
Open this post in threaded view
|

Re: check the ratio of numeric/alphanumric words in given string

David Favro
On 01/21/2015 09:37 PM, Matthew Wild wrote:

> On 21 January 2015 at 10:29, Rena <[hidden email]> wrote:
>> On Wed, Jan 21, 2015 at 2:33 AM, DouOlivia <[hidden email]> wrote:
>>> Hi there,
>>>
>>> Should have performed a search in the list first, but I’m afraid I cannot
>>> come out with a good query. Here’s the question: Is there a library that can
>>> calculate the ratio of numberic or alphanumeric words in a given string (the
>>> actual scenario is to use a document’s content so the string will be very
>>> long)? E.g  for string “34 aaa A-7SXD bbb ccc”, the ratio is calculated as
>>> 0.4.
>> Seems like a simple task for Lua's pattern matching:
>>
>> function countMatches(str, pat)
>>      local c = 0
>>      for word in str:gmatch(pat) do c = c + 1 end
>>      return c
>> end
> Couldn't resist... this version of countMatches() is shorter and
> faster (on my machine at least) in long documents:
>
> function countMatches(str, pat)
>      return select(2, str:gsub(pat, "%0"))
> end
>
> function ratio(text)
>          return countMatches(text, '%d+') / countMatches(text, '[%a%p]+')
> end

I don't think that's the result the OP was looking for.  What if the
test string is,
"34 aaa A-7S8X9D bbb ccc"
Both of your implementations are returning 0.57 but I think that the OP
still wanted 0.4 although he didn't really define it very precisely
(e.g. how does the string "% 4" evaluate?).

Maybe something like:

local function ratio( str )
     local tot, num = 0, 0;
     local sfind = string.find;
     for word in str:gmatch("%S+") do
         tot=tot+1;
         if sfind(word,"%d") then num=num+1; end
         end
     return num / tot;
     end;

You could do a similar approach with gsub() also.