Parse 12 bit numbers from binary file

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Parse 12 bit numbers from binary file

Russell Haley
Hi,
I'm trying to parse 12 bit numbers out of a binary file. From https://www.physionet.org/physiotools/wag/signal-5.htm:

Format 212

Each sample is represented by a 12-bit two’s complement amplitude. The first sample is obtained from the 12 least significant bits of the first byte pair (stored least significant byte first). The second sample is formed from the 4 remaining bits of the first byte pair (which are the 4 high bits of the 12-bit sample) and the next byte (which contains the remaining 8 bits of the second sample). The process is repeated for each successive pair of samples. Most of the signal files in PhysioBank are written in format 212

The file I am trying to read is here:

https://physionet.org/pn3/twadb/twa00.dat

When I use the existing tool to read the data out I get the following:

  0    -298     127
  1    -295     132
  2    -292     137
  3    -293     141
  4    -295     145
  5    -295     149
  6    -293     153
  7    -290     160
  8    -286     167
  9    -283     176

My code is here:

local filename = '/home/russellh/physionet/twadb_mod/twa00.dat'
local f = assert(io.open(filename), 'Failed to open file')
local str = f:read('*all')
f:close()

sz = 30
local index = 1
local c1,c2, samp1, samp2
local count = 0
while index < sz do
     c1, c2, index = string.unpack('<h<b', str, index)
    samp1 = c1 >> 0x04
    samp2 = c1 >> 0x0c & c2
    print(count, samp1, samp2)
    count = count + 1
end

russellh@canary-dev:~/physionet/get_stats$ lua parse-nums.lua
0 1152921504606846957 127
1 1152921504606846352 4503599627370492
2 8 0
3 1152921504606845087 0
4 1152921504606846957 4503599627370381
5 1152921504606846352 4503599627370492
6 9 0
7 1152921504606845279 0
8 1152921504606846957 4503599627370393
9 1152921504606846432 4503599627370492

My (likely erroneous) assumption is I need to convert the numbers to signed 12 bit numbers? I've tried derivations of `local t1 = (samp1 - (samp1*2)) + 4096`, all with no luck.

Thoughts?

Russ

Reply | Threaded
Open this post in threaded view
|

Re: Parse 12 bit numbers from binary file

Andrew Gierth
>>>>> "Russell" == Russell Haley <[hidden email]> writes:

 Russell> Hi,
 Russell> I'm trying to parse 12 bit numbers out of a binary file. From
 Russell> https://www.physionet.org/physiotools/wag/signal-5.htm:
 Russell> Format 212

 Russell> Each sample is represented by a 12-bit two’s complement
 Russell> amplitude. The first sample is obtained from the 12 least
 Russell> significant bits of the first byte pair (stored least
 Russell> significant byte first). The second sample is formed from the
 Russell> 4 remaining bits of the first byte pair (which are the 4 high
 Russell> bits of the 12-bit sample) and the next byte (which contains
 Russell> the remaining 8 bits of the second sample). The process is
 Russell> repeated for each successive pair of samples. Most of the
 Russell> signal files in PhysioBank are written in format 212

 Russell> The file I am trying to read is here:

 Russell> https://physionet.org/pn3/twadb/twa00.dat

That file is clearly not in the format described. For one thing, the
data in it shows a clear 4-byte (not 3-byte) stride; for another, the
corresponding header file says it's in format 16, not 212.

 Russell>     c1, c2, index = string.unpack('<h<b', str, index)
 Russell>     samp1 = c1 >> 0x04
 Russell>     samp2 = c1 >> 0x0c & c2

That & there is clearly wrong, and you're putting the bits in the wrong
place too. And yes, the signedness is wrong - you should probably read
only unsigned values and convert to signed after messing with the bits.
But first you'd need to find some data in this format to test against...

--
Andrew.

Reply | Threaded
Open this post in threaded view
|

Re: Parse 12 bit numbers from binary file

Russell Haley


On Thu, Jul 4, 2019 at 9:24 AM Andrew Gierth <[hidden email]> wrote:
>>>>> "Russell" == Russell Haley <[hidden email]> writes:

 Russell> Hi,
 Russell> I'm trying to parse 12 bit numbers out of a binary file. From
 Russell> https://www.physionet.org/physiotools/wag/signal-5.htm:
 Russell> Format 212

 Russell> Each sample is represented by a 12-bit two’s complement
 Russell> amplitude. The first sample is obtained from the 12 least
 Russell> significant bits of the first byte pair (stored least
 Russell> significant byte first). The second sample is formed from the
 Russell> 4 remaining bits of the first byte pair (which are the 4 high
 Russell> bits of the 12-bit sample) and the next byte (which contains
 Russell> the remaining 8 bits of the second sample). The process is
 Russell> repeated for each successive pair of samples. Most of the
 Russell> signal files in PhysioBank are written in format 212

 Russell> The file I am trying to read is here:

 Russell> https://physionet.org/pn3/twadb/twa00.dat

That file is clearly not in the format described. For one thing, the
data in it shows a clear 4-byte (not 3-byte) stride; for another, the
corresponding header file says it's in format 16, not 212.

How embarrassing; thank you Andrew. I had started this last night at home with an MIT-BIH file from here:
https://physionet.org/physiobank/database/mitdb/. I didn't have that dataset handy here at work and in a rush to get the email out I took the line "Most of the signal files in PhysioBank are written in format 212" at its face value.

This is a side project that is only partly related to work so it will have to wait until tonight. 
Thanks again,
Russ


 Russell>     c1, c2, index = string.unpack('<h<b', str, index)
 Russell>     samp1 = c1 >> 0x04
 Russell>     samp2 = c1 >> 0x0c & c2

That & there is clearly wrong, and you're putting the bits in the wrong
place too. And yes, the signedness is wrong - you should probably read
only unsigned values and convert to signed after messing with the bits.
But first you'd need to find some data in this format to test against...

--
Andrew.
Reply | Threaded
Open this post in threaded view
|

Re: Parse 12 bit numbers from binary file

Andrew Gierth
>>>>> "Russell" == Russell Haley <[hidden email]> writes:

 Russell> c1, c2, index = string.unpack('<h<b', str, index)
 Russell> samp1 = c1 >> 0x04
 Russell> samp2 = c1 >> 0x0c & c2

 >> That & there is clearly wrong, and you're putting the bits in the
 >> wrong place too. And yes, the signedness is wrong - you should
 >> probably read only unsigned values and convert to signed after
 >> messing with the bits. But first you'd need to find some data in
 >> this format to test against...

My solution (not really tested, because I have nothing to compare with)
would be along these lines:

local f = assert(io.open(arg[1],"r"))
local dat = f:read("*all")
f:close()

-- Given a 12-bit two's-complement integer value, sign-extend it to the
-- full width of a Lua integer, whatever that is. This works because
-- toggling the sign bit moves the negative values into order before the
-- positive ones, such that one can then just subtract an offset to get
-- the correct values.

local function sext12(v)
    return ((v & 0xFFF) ~ 0x800) - 0x800
end

-- Precompute lookup tables for byte 2 (much faster than doing the
-- bitops each time).

local t1, t2 = {}, {}
for b = 0,255 do
    t1[b] = sext12((b << 8) & 0xF00)
    t2[b] = sext12((b << 4) & 0xF00)
end

local sbyte = string.byte

for idx = 0, (#dat // 3) - 1 do
    local b1,b2,b3 = sbyte(dat, idx*3 + 1, idx*3 + 3)
    local v1 = t1[b2] | b1
    local v2 = t2[b2] | b3
    print(idx, v1, v2)
end

--
Andrew.

Reply | Threaded
Open this post in threaded view
|

Re: Parse 12 bit numbers from binary file

Sergey Kovalev
In reply to this post by Russell Haley
чт, 4 июл. 2019 г. в 19:02, Russell Haley <[hidden email]>:

> The file I am trying to read is here:
>
> https://physionet.org/pn3/twadb/twa00.dat
>
> When I use the existing tool to read the data out I get the following:
>
>   0    -298     127
>   1    -295     132
>   2    -292     137
>   3    -293     141
>   4    -295     145
>   5    -295     149
>   6    -293     153
>   7    -290     160
>   8    -286     167
>   9    -283     176
>
1) file must be binary
2) your file contains pairs of 16bit integers

f=io.open("twa00.dat","rb")
s=f:read"*all"
f:close()

for i=1,#s,4 do
    a,b=string.unpack("<i2i2",s,i)
    print(a,b)
end

Reply | Threaded
Open this post in threaded view
|

Re: Parse 12 bit numbers from binary file

Russell Haley
In reply to this post by Andrew Gierth


On Thu, Jul 4, 2019 at 1:05 PM Andrew Gierth <[hidden email]> wrote:
>>>>> "Russell" == Russell Haley <[hidden email]> writes:

 Russell> c1, c2, index = string.unpack('<h<b', str, index)
 Russell> samp1 = c1 >> 0x04
 Russell> samp2 = c1 >> 0x0c & c2

 >> That & there is clearly wrong, and you're putting the bits in the
 >> wrong place too. And yes, the signedness is wrong - you should
 >> probably read only unsigned values and convert to signed after
 >> messing with the bits. But first you'd need to find some data in
 >> this format to test against...

My solution (not really tested, because I have nothing to compare with)
would be along these lines:

local f = assert(io.open(arg[1],"r"))
local dat = f:read("*all")
f:close()

-- Given a 12-bit two's-complement integer value, sign-extend it to the
-- full width of a Lua integer, whatever that is. This works because
-- toggling the sign bit moves the negative values into order before the
-- positive ones, such that one can then just subtract an offset to get
-- the correct values.

local function sext12(v)
    return ((v & 0xFFF) ~ 0x800) - 0x800
end

-- Precompute lookup tables for byte 2 (much faster than doing the
-- bitops each time).

local t1, t2 = {}, {}
for b = 0,255 do
    t1[b] = sext12((b << 8) & 0xF00)
    t2[b] = sext12((b << 4) & 0xF00)
end

local sbyte = string.byte

for idx = 0, (#dat // 3) - 1 do
    local b1,b2,b3 = sbyte(dat, idx*3 + 1, idx*3 + 3)
    local v1 = t1[b2] | b1
    local v2 = t2[b2] | b3
    print(idx, v1, v2)
end

--
Andrew.
You never cease to flabbergast me Mr. Gierth. Thank you. 

Russ