[Hdf-forum] HDF lib incompatible with HDF file spec?

Krug, Markus markus.krug at hm.edu
Mon Sep 18 05:03:26 CDT 2017


Dear all,

I just want to come back to my question about incompatibility between the HDFlib and the HDF file spec concerning the actual physical layout of a HDF file. Can  anyone confirm my observation that this can lead to corrupt files if they are generated first in a ‘non HDFlib based’ application that complies to the HDF file spec and then is altered in a ‘HDFlib based’ application like HDFview?

Best Regards
Markus
Von: Krug, Markus
Gesendet: Mittwoch, 6. September 2017 17:56
An: 'HDF Users Discussion List' <hdf-forum at lists.hdfgroup.org>
Betreff: AW: [Hdf-forum] HDF lib incompatible with HDF file spec?

Dear Mark,

completely correct. I wrote some routines that generate hdf files. However only a small subset of functionality is uses. More less only compressed, compound data types with a maximum number of 5 will be in the files. Very likely not more than two groups. I follow this paper (http://www.ep.liu.se/ecp/076/050/ecp12076050.pdf) concerning the hdf file layout because I have the need to write ‘time series’ in my embedded application.

You are right. The HDF file spec is highly complex. Even my reduced functional set takes me significant more time that I was planning to get an understanding. In the meantime I think I understand what I need for my purpose. However, I’m not saying that the file that I can generate so far are 100% correct in the sense of the HDF file spec. But at least HDFview can read them with no problems. So it cannot be that wrong.

Best Regards
Markus

Von: Hdf-forum [mailto:hdf-forum-bounces at lists.hdfgroup.org] Im Auftrag von Miller, Mark C.
Gesendet: Dienstag, 5. September 2017 19:22
An: HDF Users Discussion List <hdf-forum at lists.hdfgroup.org<mailto:hdf-forum at lists.hdfgroup.org>>
Betreff: Re: [Hdf-forum] HDF lib incompatible with HDF file spec?

Hmm. If I understand you, you have written code that you believe produces an HDF5 file according to the 3.0 file version specification, https://support.hdfgroup.org/HDF5/doc/H5.format.html but nevertheless does NOT use the HDF5 library to do it. Furthermore, where 'extended padding' is concerned, your implementation does business differently than the HDF5 implementation.

You can prove HDF5 tools will *read* the file ok. But, in a read-modify-write scenario, the file is getting corrupted by HDF5 library due to the difference in how the two implementations handle the extended padding -- a feature that you explain is '...not defined at all -- not even recommended'.

Is that about right?

If so, it does indeed sound like a potential issue in the file format specification for HDF5.

Your scenario sounds like a super useful test case...does a wholly independent implementation produce a file the HDF5 library can "handle"?

I wonder if there are settings in HDF5 library you may need to set (such as alignment or block-size or something) such that read-modify-write will indeed work ok? I wonder if there is some metadata missing from your file that will inform the HDF5 library what specific settings it must use to properly read and write to the file? I wonder if there is some boot-block information you have neglected to include so that the HDF5 library is not aware of all the paramaters effecting the file's layout.

The only reason for calling into question many possibilities of your implementation is that the HDF5 file format is fairly complex. I don't think it is easily duplicated without using the library itself. So, I think its highly likely you may be overlooking some important features of the format necessary for the HDF5 library to fully handle it.

All that said, I commend your courage for attempting it and hope others can chime in with more detailed thoughts on what to do about it.

Mark



"Hdf-forum on behalf of Krug, Markus" wrote:

Dear all,

I just came around an interesting issue.
I implemented the writing of HDF files on an embedded system. The amount of functionality I implemented is significant less than the HDF lib offers. So it is just tailored to my needs. I implemented everything on base of the HDF 3.0 file spec. One point of my tailoring was to optimize the file size. Therefore, I write every internal block in the HDF files aligned byte-by-byte to the next – or padded to the address alignment if it is requested by the HDF file specification. The HDF files generated by HDFview or Matlab have plenty of space in-between the internal blocks. Sometimes a few hundred bytes. As far as I read from the HDF file specification this ‘extended padding’ is not defined at all – not even recommended.
However, this ‘extended padding’ that is performed by the HDF lib leads to a behavior that I would consider as an incompatibility to itself. To demonstrate this I attached two HDF files to this email. The first (sizeoptimized.h5) is generated by my embedded software and is optimized concerning the file size. It contains three compounds with each of them has 2 elements. You should be able to open that file in HDFview or similar tools and read all its contents.
The second file (sizeoptimizedextended.h5) is generated by HDFview by adding a fourth compound after the sizeoptimized.h5 file was opened in HDFview. You can see that the file is partly corrupted. The reason for this is that HDFview (and therefore the HDF lib I guess) is not really taking care about the position of the internal blocks of a file that it is writing to. It seems to me it has some internal mapping of those blocks. This mapping gets applied even if it will collide, and therefore corrupt, the existing blocks.
If my observation is correct I think the HDF lib will need a bugfix or the HDF file spec will need a description of how the internal blocks are allowed to be positioned within a HDF file.
I forgot to mention that I tried to use the HDF lib sources and compile it to my system. However, I quit after a couple of days because the way the sources are written are not suitable at all to adopt them to an embedded system that runs a simplified file system and a real-time operating system – and all of it has to fit into a few hundred kilobytes.

Can anyone comment on my observation?


Best Regards
Markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.hdfgroup.org/pipermail/hdf-forum_lists.hdfgroup.org/attachments/20170918/36a405a9/attachment-0001.html>


More information about the Hdf-forum mailing list