Parsing binary data from an RS232 connection

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Parsing binary data from an RS232 connection

Russell Haley
Hi,

I am writing a little Lua program to parse binary data received via serial communications. I'm wondering if someone on the mailing list has a novel approach to parsing the data? The communications protocol is something like this: 

<HEADER> - 4 bytes
<MSG_TYPE> - 1 byte
<SEQUENCE> - 2 bytes
<PAYLOAD_LENGTH> - 2 bytes
<PAYLOAD> - X bytes (variable based on message type)
<CHECKSUM> - 4 bytes

I see two options for parsing messages received via serial:
1) Process each character.  Lots of examples here: https://stackoverflow.com/questions/829063/how-to-iterate-individual-characters-in-lua-string. Once the header is found, just count until I find the payload length (+ checksum) and then count that number of characters
2) Using pattern matching and captures to search for the <HEADER> + 5 bytes. Then parse out the payload length (+ checksum) and take that many characters. This seems potentially very fast, but creates a great deal of complexity if I don't get the entire message in one serial port read. Something like this (where 'ABCD' represents the header): 
   local m = input:match('ABCD(.+5)')
   if m then
   local typ, seq, pl_len = string.unpack('>B>H>H',m)
   local msg_len =
   if pl_len > 0 then
       ...
   end

Does anyone have a suggestion for parsing the data or even an existing project? Any and all input is welcome.

Thanks!
Russell
Reply | Threaded
Open this post in threaded view
|

Re: Parsing binary data from an RS232 connection

Sean Conner
It was thus said that the Great Russell Haley once stated:

> Hi,
>
> I am writing a little Lua program to parse binary data received via serial
> communications. I'm wondering if someone on the mailing list has a novel
> approach to parsing the data? The communications protocol is something like
> this:
>
> <HEADER> - 4 bytes
> <MSG_TYPE> - 1 byte
> <SEQUENCE> - 2 bytes
> <PAYLOAD_LENGTH> - 2 bytes
> <PAYLOAD> - X bytes (variable based on message type)
> <CHECKSUM> - 4 bytes
>
> I see two options for parsing messages received via serial:
> 1) Process each character.  Lots of examples here:
> https://stackoverflow.com/questions/829063/how-to-iterate-individual-characters-in-lua-string.
> Once the header is found, just count until I find the payload length (+
> checksum) and then count that number of characters
> 2) Using pattern matching and captures to search for the <HEADER> + 5
> bytes. Then parse out the payload length (+ checksum) and take that many
> characters. This seems potentially very fast, but creates a great deal of
> complexity if I don't get the entire message in one serial port read.
> Something like this (where 'ABCD' represents the header):
>    local m = input:match('ABCD(.+5)')
>    if m then
>    local typ, seq, pl_len = string.unpack('>B>H>H',m)
>    local msg_len =
>    if pl_len > 0 then
>        ...
>    end
>
> Does anyone have a suggestion for parsing the data or even an existing
> project? Any and all input is welcome.

  How about:

        data = serial:read(9)
        header,type,seq,len = string.unpack(">I4 I1 I2 I2")
        data = serial:read(len)
        crc  = string.unpack(">I4",serial:read(4))

  Seems straightforward to me

  -spc (Just add some error checking ... )


Reply | Threaded
Open this post in threaded view
|

RE: Parsing binary data from an RS232 connection

Tim McCracken
In reply to this post by Russell Haley

 

Does anyone have a suggestion for parsing the data or even an existing project? Any and all input is welcome.

 

Thanks!

Russell

 

 

Russell,

I have been working on a “smart buffer” for just this issue, although it is not complete yet. However, it might give you some ideas on how to proceed. It is NOT pure Lua, but rather consists of two userdata object – the “smart buffer” and the serial port driver that takes the smart buffer as an argument.

 

The serial port driver simply reads and writes binary data from the smart buffer. Then in Lua, the application can read and write various sized variables from the smart buffer. For example, the library has the following functions/methods:

 

Sequential Access: (pushes to the TX buffer, pops from the RX buffer)

get_byte(), get_word(), put_word(value)

 

Random Access:

get_byte(offset), get_word(access), put_word(offset, value)

 

Of course, if you are writing a userdata library, it may be just as easy to fully decode the serial protocol in ‘C’ and use callbacks to Lua to process the data. In my experience, this may actually be a better approach for “structured” protocols such as those used for industrial automation (Modbus) or SCADA (DNP3).

 

Tim

 

Reply | Threaded
Open this post in threaded view
|

Re: Parsing binary data from an RS232 connection

Russell Haley
In reply to this post by Sean Conner


On Fri, Dec 6, 2019 at 6:14 PM Sean Conner <[hidden email]> wrote:
It was thus said that the Great Russell Haley once stated:
> Hi,
>
> I am writing a little Lua program to parse binary data received via serial
> communications. I'm wondering if someone on the mailing list has a novel
> approach to parsing the data? The communications protocol is something like
> this:
>
> <HEADER> - 4 bytes
> <MSG_TYPE> - 1 byte
> <SEQUENCE> - 2 bytes
> <PAYLOAD_LENGTH> - 2 bytes
> <PAYLOAD> - X bytes (variable based on message type)
> <CHECKSUM> - 4 bytes
>
> I see two options for parsing messages received via serial:
> 1) Process each character.  Lots of examples here:
> https://stackoverflow.com/questions/829063/how-to-iterate-individual-characters-in-lua-string.
> Once the header is found, just count until I find the payload length (+
> checksum) and then count that number of characters
> 2) Using pattern matching and captures to search for the <HEADER> + 5
> bytes. Then parse out the payload length (+ checksum) and take that many
> characters. This seems potentially very fast, but creates a great deal of
> complexity if I don't get the entire message in one serial port read.
> Something like this (where 'ABCD' represents the header):
>    local m = input:match('ABCD(.+5)')
>    if m then
>    local typ, seq, pl_len = string.unpack('>B>H>H',m)
>    local msg_len =
>    if pl_len > 0 then
>        ...
>    end
>
> Does anyone have a suggestion for parsing the data or even an existing
> project? Any and all input is welcome.

  How about:

        data = serial:read(9)
        header,type,seq,len = string.unpack(">I4 I1 I2 I2")
        data = serial:read(len)
        crc  = string.unpack(">I4",serial:read(4))

  Seems straightforward to me

  -spc (Just add some error checking ... )
 

Thanks Sean, I couldn't see the forest through the trees. 

Russ  
 
Reply | Threaded
Open this post in threaded view
|

Re: Parsing binary data from an RS232 connection

Marc Balmer
In reply to this post by Tim McCracken
We have a module 'csp' (Characater Sequence Parser) for this, where you can define sequences to be detected in data streams and call a function if the sequence pattern gets detected:

local p = csp.new(defaultHandler)

p:sequence('\027*', reset)
p:sequence('\027!', softReset)
p:sequence('\027B0', b0)
p:sequence('\027B1', b1)
p:sequence('\013', function () print('newline') end)
p:sequence('\029A0', b0)
p:sequence('\029A3', b1)
p:sequence('\027B%c-', b)
p:sequence('\029B%c', b)
p:sequence('\027=%c%c', pos)
p:sequence('\x05%S\x0d', function (s) print('string', s) end)

-- decimals

p:sequence('\027[%d', decimal)
p:sequence('\027p%d;', decimal)

p:dump()

p:parse('\027[45x\n')
p:parse('\027p42;abc\n')

We used it to decode ESC/POS printer commands.

Nut sure though if this would match you usecase.

Am 07.12.2019 um 03:38 schrieb Tim McCracken <[hidden email]>:

 
Does anyone have a suggestion for parsing the data or even an existing project? Any and all input is welcome.
 
Thanks!
Russell
 
 
Russell,
I have been working on a “smart buffer” for just this issue, although it is not complete yet. However, it might give you some ideas on how to proceed. It is NOT pure Lua, but rather consists of two userdata object – the “smart buffer” and the serial port driver that takes the smart buffer as an argument.
 
The serial port driver simply reads and writes binary data from the smart buffer. Then in Lua, the application can read and write various sized variables from the smart buffer. For example, the library has the following functions/methods:
 
Sequential Access: (pushes to the TX buffer, pops from the RX buffer)
get_byte(), get_word(), put_word(value)
 
Random Access:
get_byte(offset), get_word(access), put_word(offset, value)
 
Of course, if you are writing a userdata library, it may be just as easy to fully decode the serial protocol in ‘C’ and use callbacks to Lua to process the data. In my experience, this may actually be a better approach for “structured” protocols such as those used for industrial automation (Modbus) or SCADA (DNP3).
 
Tim

Reply | Threaded
Open this post in threaded view
|

Re: Parsing binary data from an RS232 connection

Cedric Mauclair
On December 7, 2019 10:00:46 AM UTC, Marc Balmer <[hidden email]> wrote:

>We have a module 'csp' (Characater Sequence Parser) for this, where you
>can define sequences to be detected in data streams and call a function
>if the sequence pattern gets detected:
>
>local p = csp.new(defaultHandler)
>
>p:sequence('\027*', reset)
>p:sequence('\027!', softReset)
>p:sequence('\027B0', b0)
>p:sequence('\027B1', b1)
>p:sequence('\013', function () print('newline') end)
>p:sequence('\029A0', b0)
>p:sequence('\029A3', b1)
>p:sequence('\027B%c-', b)
>p:sequence('\029B%c', b)
>p:sequence('\027=%c%c', pos)
>p:sequence('\x05%S\x0d', function (s) print('string', s) end)
>
>-- decimals
>
>p:sequence('\027[%d', decimal)
>p:sequence('\027p%d;', decimal)
>
>p:dump()
>
>p:parse('\027[45x\n')
>p:parse('\027p42;abc\n')
>
>We used it to decode ESC/POS printer commands.
>
>Nut sure though if this would match you usecase.
>
>> Am 07.12.2019 um 03:38 schrieb Tim McCracken
><[hidden email]>:
>>
>>  
>> Does anyone have a suggestion for parsing the data or even an
>existing project? Any and all input is welcome.
>>  
>> Thanks!
>> Russell
>>  
>>  
>> Russell,
>> I have been working on a “smart buffer” for just this issue, although
>it is not complete yet. However, it might give you some ideas on how to
>proceed. It is NOT pure Lua, but rather consists of two userdata object
>– the “smart buffer” and the serial port driver that takes the smart
>buffer as an argument.
>>  
>> The serial port driver simply reads and writes binary data from the
>smart buffer. Then in Lua, the application can read and write various
>sized variables from the smart buffer. For example, the library has the
>following functions/methods:
>>  
>> Sequential Access: (pushes to the TX buffer, pops from the RX buffer)
>> get_byte(), get_word(), put_word(value)
>>  
>> Random Access:
>> get_byte(offset), get_word(access), put_word(offset, value)
>>  
>> Of course, if you are writing a userdata library, it may be just as
>easy to fully decode the serial protocol in ‘C’ and use callbacks to
>Lua to process the data. In my experience, this may actually be a
>better approach for “structured” protocols such as those used for
>industrial automation (Modbus) or SCADA (DNP3).
>>  
>> Tim

Hi everyone,

Seems like a very interesting module. Is it sharable?
--
CM.

Reply | Threaded
Open this post in threaded view
|

Re: Parsing binary data from an RS232 connection

sur-behoffski
In reply to this post by Russell Haley
G'day,

Apologies in advance for such a long message, but the protocol format,
as described, is buggy, and could cause heartache.  These protocols are
quite tricky to get right; I have had great fortune to peek over the
shoulders of more than one experienced practitioner, as a part of
taking over incomplete, demonstration, or under-performing code, and
working to bring the code up to commercial standards.

[The under-performing code was because the stack pointer was misaligned
for a Z80 system (0xffff), so every push and pop operation took over
twice as long as documented in the manual!]

-- sur-behoffski (Brenton Hoff)
programmer, Grouse Software



On 2019-12-07 08:39, Russell Haley wrote:

> I am writing a little Lua program to parse binary data received via serial
> communications. I'm wondering if someone on the mailing list has a novel
> approach to parsing the data? The communications protocol is something like
> this:
>
> <HEADER> - 4 bytes
> <MSG_TYPE> - 1 byte
> <SEQUENCE> - 2 bytes
> <PAYLOAD_LENGTH> - 2 bytes
> <PAYLOAD> - X bytes (variable based on message type)
> <CHECKSUM> - 4 bytes
 >
 > [...]


You are making contradictory assumptions here:  Is the channel noisy --
there is any chance of message corruption, perhaps by:

     - A single-bit error in one byte;
     - A burst error, smearing a series of bits, possibly crossing byte
       boundaries;
     - Any chance of dropping a character (are you polling the UART?  If so,
       can you guarantee that the polling task, perhaps competing with other
       tasks for CPU/scheduler priority, will inspect the UART sufficiently
       often that the UART's hardware queue will never overflow?);
     - Maybe you've got the UART working with a DMA controller -- this is
       more common, and mostly gets rid of the task scheduling hazards, plus
       CPU overhead nuisance of polling -- but exactly how do you set up the
       DMA controller to issue an interrupt at the right time for the client
       receiver/queue/stream/whatever?  Do you have to have a "header-DMA"
       transfer, followed by a "packet body plus checksum" transfer?
     - On the flip side of dropping a character, is there any chance that
       some noise or glitch may be seen as a start-of-character sequence,
       leading to an extra character being inserted during an idle time?

As you can see, things can get fairly hairy fairly quickly.  However, your
packet format has a contradiction in it:

     1. Either the channel is completely correct at every single hardware
        and software operation level (unlikely, but...), in which case, why
        bother with a checksum?   OR

     2. The channel is noisy/fallible, so the variable-length specifier can
        be POISONED by noise, and yet it is trusted/worshipped by the protocol
        code as if it's perfect!  What if a frame came in which was intended
        to have a <PAYLOAD_LENGTH> of "06", but noise corrupted it to "32"?
        The receiver would need to have some way of detecting this
        corruption... for a two-way protocol, one end might hang, waiting for
        the end of the frame, and the other end might hang, waiting for an
        ACK.

-------

One way of possibly patching this is by:

     1. Adding a checksum after the end of the fixed-length header -- including
        the specification of the variable-length body that is to follow; and
     2. Demand that the start-of-packet <HEADER> *never* appears anywhere else
        in the frame -- nowhere in the variable-length data, nowhere in the
        fixed-length header, except as the first four bytes.
     3. Adding limits to frame sizes, timers to limit frame transmission
        period, and a mechanism for a receiver to respond with a *sequenced*
        NAK.  Without an increasing sequence number on messages from both ends,
        frames and/or ACK/NAK messages may end up getting duplicated, leading
        confusion that could result in a frame being dropped or maybe duplicated.

--

Okay, some more on techniques at the bit level (bit stuffing), and at the
byte level (byte stuffing) for ensuring that some "sentinel" header
sequence never appears in an arbitrary data stream, except as a by-product
of corruption, e.g. noise.  (If corruption does occur, the protocol depends on
the strength of the checksum; for large frames, 32-bit CRCs are favoured;
the CCITT CRC32 has been the traditional favourite, but in the last 15 years or
so, the Castagnoli CRC32 has been gaining traction as a worthy successor).

--

I've written a number of low-level microcontroller networking systems, in the
late 1980s/early 1990s, before TCP/IP became widely established.  These
networks were based on SDLC/HDLC:

         https://en.m.wikipedia.org/wiki/High-Level_Data_Link_Control

Usually, the UART would be programmed for NRZI (non-return-to-zero-inverted)
bit-blitting protocol.  The key thing to notice in the article is in the
header sequence:

       Protocol Structure -> Frame Format -> Flag

The technique of *bit stuffing* is used to ensure that the flag does not appear
in the data, so, when a flag arrives at end-of-frame, the receiver knows that
the checksum is the previous 16 bits.

The flag is 0x7e: A zero bit, six 1 bits, and a final zero bit.  Bit stuffing,
at the transmitter end, watches the in-frame data as it passes, and, whenever
it sees five consecutive 1 bits, unconditionally inserts a zero bit, before
resuming the frame data.  Likewise, the receiver watches the data stream, and if
it sees five 1-bits followed by a zero bit, it discards the zero bit.

--

If you are working purely in software, then byte-stuffing can be used instead of
bit stuffing, for the variable-length data in the frame.  This is done by:

     1. Designating one value as the Flag: 0x7e;
     2. Designating another byte as an Escape, typically: 0x7d; and
     3. Using a simple, reversible sequence to do the stuffing:
         3a. If a byte is not a Flag or an Escape, send it as-is;
         3b. For either of a Flag byte or an Escape byte:
             - Send the Escape byte; and
             - Send the original byte, XORed with 0x20.

So, a Flag (0x7e) in the data gets sent as "0x7d 0x5e"; and
an  Escape (0x7d) in the data gets sent as "0x7d 0x5d".

-----------

Okay,

Reply | Threaded
Open this post in threaded view
|

Re: Parsing binary data from an RS232 connection

Marc Balmer
In reply to this post by Cedric Mauclair


> Am 07.12.2019 um 15:24 schrieb Cedric Mauclair <[hidden email]>:
>
> On December 7, 2019 10:00:46 AM UTC, Marc Balmer <[hidden email]> wrote:
>> We have a module 'csp' (Characater Sequence Parser) for this, where you
>> can define sequences to be detected in data streams and call a function
>> if the sequence pattern gets detected:
>>
>> local p = csp.new(defaultHandler)
>>
>> p:sequence('\027*', reset)
>> p:sequence('\027!', softReset)
>> p:sequence('\027B0', b0)
>> p:sequence('\027B1', b1)
>> p:sequence('\013', function () print('newline') end)
>> p:sequence('\029A0', b0)
>> p:sequence('\029A3', b1)
>> p:sequence('\027B%c-', b)
>> p:sequence('\029B%c', b)
>> p:sequence('\027=%c%c', pos)
>> p:sequence('\x05%S\x0d', function (s) print('string', s) end)
>>
>> -- decimals
>>
>> p:sequence('\027[%d', decimal)
>> p:sequence('\027p%d;', decimal)
>>
>> p:dump()
>>
>> p:parse('\027[45x\n')
>> p:parse('\027p42;abc\n')
>>
>> We used it to decode ESC/POS printer commands.
>>
>> Nut sure though if this would match you usecase.
>>
>>> Am 07.12.2019 um 03:38 schrieb Tim McCracken
>> <[hidden email]>:
>>>
>>>
>>> Does anyone have a suggestion for parsing the data or even an
>> existing project? Any and all input is welcome.
>>>
>>> Thanks!
>>> Russell
>>>
>>>
>>> Russell,
>>> I have been working on a “smart buffer” for just this issue, although
>> it is not complete yet. However, it might give you some ideas on how to
>> proceed. It is NOT pure Lua, but rather consists of two userdata object
>> – the “smart buffer” and the serial port driver that takes the smart
>> buffer as an argument.
>>>
>>> The serial port driver simply reads and writes binary data from the
>> smart buffer. Then in Lua, the application can read and write various
>> sized variables from the smart buffer. For example, the library has the
>> following functions/methods:
>>>
>>> Sequential Access: (pushes to the TX buffer, pops from the RX buffer)
>>> get_byte(), get_word(), put_word(value)
>>>
>>> Random Access:
>>> get_byte(offset), get_word(access), put_word(offset, value)
>>>
>>> Of course, if you are writing a userdata library, it may be just as
>> easy to fully decode the serial protocol in ‘C’ and use callbacks to
>> Lua to process the data. In my experience, this may actually be a
>> better approach for “structured” protocols such as those used for
>> industrial automation (Modbus) or SCADA (DNP3).
>>>
>>> Tim
>
> Hi everyone,
>
> Seems like a very interesting module. Is it sharable?

We have not published it under an an open source license, if it is that what you mean.  We thought it was to specialised (we used it to implement printer simulators).



> --
> CM.
>


Reply | Threaded
Open this post in threaded view
|

Re: Parsing binary data from an RS232 connection

Cedric Mauclair
On December 12, 2019 12:23:29 PM UTC, Marc Balmer <[hidden email]> wrote:

>
>
>> Am 07.12.2019 um 15:24 schrieb Cedric Mauclair
><[hidden email]>:
>>
>> On December 7, 2019 10:00:46 AM UTC, Marc Balmer <[hidden email]>
>wrote:
>>> We have a module 'csp' (Characater Sequence Parser) for this, where
>you
>>> can define sequences to be detected in data streams and call a
>function
>>> if the sequence pattern gets detected:
>>>
>>> local p = csp.new(defaultHandler)
>>>
>>> p:sequence('\027*', reset)
>>> p:sequence('\027!', softReset)
>>> p:sequence('\027B0', b0)
>>> p:sequence('\027B1', b1)
>>> p:sequence('\013', function () print('newline') end)
>>> p:sequence('\029A0', b0)
>>> p:sequence('\029A3', b1)
>>> p:sequence('\027B%c-', b)
>>> p:sequence('\029B%c', b)
>>> p:sequence('\027=%c%c', pos)
>>> p:sequence('\x05%S\x0d', function (s) print('string', s) end)
>>>
>>> -- decimals
>>>
>>> p:sequence('\027[%d', decimal)
>>> p:sequence('\027p%d;', decimal)
>>>
>>> p:dump()
>>>
>>> p:parse('\027[45x\n')
>>> p:parse('\027p42;abc\n')
>>>
>>> We used it to decode ESC/POS printer commands.
>>>
>>> Nut sure though if this would match you usecase.
>>>
>>>> Am 07.12.2019 um 03:38 schrieb Tim McCracken
>>> <[hidden email]>:
>>>>
>>>>
>>>> Does anyone have a suggestion for parsing the data or even an
>>> existing project? Any and all input is welcome.
>>>>
>>>> Thanks!
>>>> Russell
>>>>
>>>>
>>>> Russell,
>>>> I have been working on a “smart buffer” for just this issue,
>although
>>> it is not complete yet. However, it might give you some ideas on how
>to
>>> proceed. It is NOT pure Lua, but rather consists of two userdata
>object
>>> – the “smart buffer” and the serial port driver that takes the smart
>>> buffer as an argument.
>>>>
>>>> The serial port driver simply reads and writes binary data from the
>>> smart buffer. Then in Lua, the application can read and write
>various
>>> sized variables from the smart buffer. For example, the library has
>the
>>> following functions/methods:
>>>>
>>>> Sequential Access: (pushes to the TX buffer, pops from the RX
>buffer)
>>>> get_byte(), get_word(), put_word(value)
>>>>
>>>> Random Access:
>>>> get_byte(offset), get_word(access), put_word(offset, value)
>>>>
>>>> Of course, if you are writing a userdata library, it may be just as
>>> easy to fully decode the serial protocol in ‘C’ and use callbacks to
>>> Lua to process the data. In my experience, this may actually be a
>>> better approach for “structured” protocols such as those used for
>>> industrial automation (Modbus) or SCADA (DNP3).
>>>>
>>>> Tim
>>
>> Hi everyone,
>>
>> Seems like a very interesting module. Is it sharable?
>
>We have not published it under an an open source license, if it is that
>what you mean.  We thought it was to specialised (we used it to
>implement printer simulators).
>
>
>
>> --
>> CM.
>>

Yes, that was indeed the meaning of the question. Thanks anyway.
--
CM.