Apache NiFi Adds Python Processor Support
Written by Kay Ewbank   
Tuesday, 09 July 2024

Apache NiFi 2, a project for processing and distributing data, has been released with support for Python processors in the MiNiFi framework, and a completely rebuilt user interface.

Apache NiFi is based on the NiagaraFiles software developed by the US National Security Agency (NSA), which was open sourced in 2014. The name NiFi derives from Niagara Files. NiFi can be used to automate the flow of data between software systems, and it uses ETL (extract, transform, load), along with the ability to operate within clusters and security based on TLS encryption.


NiFi primarily serves as the consumer between Kafka and HDFS. NiFi also provides schema validation for event streams while enabling the flows to modify and republish secure event streams for general use. It can also be used to monitor data flows and identify potential problems, and for securing data flows by encrypting data at rest and in transit.

NiFi executes within a JVM on a host operating system. Its primary components start with a web server that hosts NiFi's HTTP-based command and control API. There's a flow controller that provides threads for extensions to run on, and manages the schedule of when extensions receive resources to execute.

NiFi uses the concept of FlowFiles that represent objects moving through the system. For each FlowFile, NiFi keeps track of a map of key/value pair attribute strings and its associated content of zero or more bytes. The state of active FlowFiles are stored in a FlowFile Repository. There's also a content repository that stores the actual content bytes of a given FlowFile, and a provenance repository where all provenance event data is

NiFi is extensible by developers, and the extensions operate and execute within the JVM. 

This release of NiFi has a rebuilt user interface that lets the system or the user select a dark mode. More usefully, it now supports Kafka 3 for both consumption and publishing with Kafka.

The NiFi team says this version can now split binary Packet Capture (PCAP) with SplitPCAP, and Microsoft Excel XLSX files can be split to individual sheets with SplitExcel. They say this is:

"a good example of the increasingly common usage of NiFi in the wild to capture and transform unstructured or semi-structured data and deliver it to systems such as databases, vector stores, and more."

There's also a new interface for Python extensions supporting components which source new data. Still on the Python front, there's now Python Processor support in the MiNiFi framework.

NiFi 2 is available now.


More Information

Apache NiFi

Related Articles

Cloudera And StreamNative Open Source NiFi Pulsar Connector

Apache Daffodil Improves DFDL Compatibility

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.


Does AI Copy Code - Lawsuit Says No

Are we worried about AI code assistants? Well some of us were worried and offended enough to take GitHub/ Microsoft and Open AI to court over code copying by GitHub Copilot. But the judge came down on [ ... ]

Amazon Timestream for InfluxDB Handles Your Time Series Workloads

Amazon has announced Timestream, a fully-managed time series database service that is based on open source InfluxDB.
But what is a time series ?

More News

kotlin book



or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 09 July 2024 )