Apache Daffodil Improves DFDL Compatibility
Written by Kay Ewbank   
Tuesday, 12 March 2019

Apache Daffodil. an open source implementation of the Data Format Description Language to convert between fixed format data and XML/JSON, has been updated to improve DFDL compatibility. 

The Data Format Description Language (DFDL) is a specification that was developed by the Open Grid Forum to create a standard way of describing different data formats, including both textual and binary, scientific and numeric, legacy and modern, commercial record-oriented, and many industry and military standards.



The open-source implementation, Daffodil, is currently an Apache Incubator project, has Java and Scala APIs, provides Apache NiFI processors for parsing and unparsing NiFi FlowFiles, and has an  extension to XML Calabash that declares XProc pipeline steps to parse and unparse input data. 

DFDL defines a language that is a subset of W3C XML schema to describe the logical format of the data, and annotations within the schema to describe the physical representation. The Open Grid Forum was created by a merger between the Global Grid Forum and the Enterprise Grid Alliance, and is a group of developers and vendors interested in standardizing grid computing.

Daffodil uses these DFDL schemas to parse fixed format data into an infoset, which is most commonly represented as either XML or JSON, meaning developers can use XML or JSON to consume, inspect, and manipulate fixed format data. Daffodil can also be used in the reverse direction to serialize or “unparse” an XML or JSON infoset back to the original data format.

The updated release has a number of changes and bug fixes specifically made to improve IBM DFDL compatibility, including the TDML runner being improved to tolerate left-over data for IBM test compatibility.

Test Data Markup Language (TDML) it is a way of specifying a DFDL schema, input test data, and expected result or expected error/diagnostic messages, all self-contained in an XML file. IBM created TDML to capture tests for their own DFDL implementation. Daffodil incorporated the idea and has extended it, though there is now an effort to reconcile TDML dialects so that all implementations can run the same tests.

This release of Daffodil incorporates TDML runner cross validation, meaning it is now possible to use the TDML runner with tests with different DFDL implementations, including the IBM DFDL implementation. The TDML runner has also added type-aware infoset comparisons, meaning developers can now provide an xsi:type attribute in infoset elements, allowing the TDML runner to determine if two elements are logically the same even if there infoset values may differ.


More Information

Apache Daffodil

Related Articles

Flink Gets Event-time Streaming

The Significance Of Big Data

IBM Pure Data

IBM Hot Data In A Flash

Perform Data Queries Faster With Drill

BigQuery Now Open to All


To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.


NumPy 2 Released

NumPy 2.0 has been released, the first major new version since 2006. NumPy is the fundamental mathematical library for Python, and this release adds new features and performance improvements, but also [ ... ]

Perl v5.40.0 Shows That It Is Too Resilient To Die

Having faced doubt, debate and insecurity, Perl is still going after all those years, alive, kicking and making releases. Business as usual.

More News

kotlin book



or email your comment to: comments@i-programmer.info