[Hdf-forum] Avoiding corruption of the HDF5 File

Dana Robinson derobins at hdfgroup.org
Mon Sep 25 16:08:52 CDT 2017

Hi all,

(I'm jumping in to the middle of this discussion while still on vacation, so please excuse me if I'm missing something.)

Those documents describe cache flushing. Flush ordering should not affect the file format – any transient 'file corruption' would be due to a second reader inspecting an incomplete file, which techniques like SWMR are designed to address.

Incompatibilities between the official HDF5 library and third-party HDF5 libraries should probably be considered bugs in one or the other (or maybe even both!), as long as they are truly holding to the published HDF5 file format.


From: Hdf-forum <hdf-forum-bounces at lists.hdfgroup.org> on behalf of "Miller, Mark C." <miller86 at llnl.gov>
Reply-To: HDF List <hdf-forum at lists.hdfgroup.org>
Date: Monday, September 25, 2017 at 10:52
To: HDF List <hdf-forum at lists.hdfgroup.org>
Subject: Re: [Hdf-forum] Avoiding corruption of the HDF5 File

Hi Quincey,

One question though...Is it possible to produce bytes-on-disk format from a wholly different code base that is nonetheless compatible with HDF5 proper?

Your answer seems to suggest it is NOT possible without using (some of) the HDF5 code base.

That would be a shame as it suggests there is no longer a well defined bytes-on-disk format apart from whatever the HDF5 implementation produces.


"Hdf-forum on behalf of Quincey Koziol" wrote:

Hi Ewan,
There’s two things you can be doing to address file corruption issues:

- For the near term, use the techniques and code for managing the metadata cache described here:  https://support.hdfgroup.org/HDF5/docNewFeatures/FineTuneMDC/RFC%20H5Ocork%20v5%20new%20fxn%20names.pdf

- In the next year or so, we will be finishing the “SWMR” feature, described here:  https://support.hdfgroup.org/HDF5/docNewFeatures/NewFeaturesSwmrDocs.html

The metadata cache techniques are rather unsubtle, but will avoid corrupted files until the “full” SWMR feature is finished.


On Sep 21, 2017, at 8:33 PM, Ewan Makepeace <makepeace at jawasoft.com<mailto:makepeace at jawasoft.com>> wrote:

Dear Experts,

We are building a data acquisition and processing system on top of an HDF5 file store. Generally we have been very pleased with HDF5 - great flexibility in data structure, performant, small file size, availability of third party data access tools etc.

However our system needs to run for 36-48 hours at a time - and we are finding that if we (deliberately or accidentally) stop the process while running (and writing data) the file is corrupted and we lose all our work.

We are in C# and wrote our access routines on top of HDF5.net<http://HDF5.net> (which I understand is deprecated). We tend to keep all active pointer objects open for the duration of the process that reads or writes them (file, group and dataset handles in particular).

1) Is there a full featured replacement for HDF5.net<http://HDF5.net> now, that I was unaware of? Previous contenders were found to be missing support for features we depend on. If so will it address the corruption issue?

2) Should we be opening and closing all the entities on every write? I would have thought that would dramatically slow access but perhaps not. Guidance?

3) Are there any other tips to making the file less susceptible to corruption if writing is abandoned unexpectedly?

Please help - this issue could be serious enough to make us reconsider our storage choice, which would be expensive now.

Hdf-forum is for HDF software users discussion.
Hdf-forum at lists.hdfgroup.org<mailto:Hdf-forum at lists.hdfgroup.org>
Twitter: https://twitter.com/hdf5

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.hdfgroup.org/pipermail/hdf-forum_lists.hdfgroup.org/attachments/20170925/de056776/attachment-0001.html>

More information about the Hdf-forum mailing list