dPipes - Pythonic Data Pipelines
About
dPipes
is a Python package for creating reusable, modular, and composable data pipelines.
It's small project that came out of the desire to turn this:
into this:
from dpipes.processor import PipeProcessor
ps = PipeProcessor(
funcs=[func_1, func_2, func_3]
)
data = ps(data)
Now, arguably, there is not much functional difference between the two implementations. They both accomplish the same task with roughly the same amount of code.
But, what happens if you want to apply the same pipeline of functions to a different data object?
Using the first method, you'd need to re-write (copy/paste) your method-chaining pipeline:
Using the latter method, you'd only need to pass in a different object to the pipeline:
Under the Hood
dPipes
uses two functions from Python's functools
module: reduce
and partial
. The reduce
function enables function composition; the partial
function enables use of arbitrary kwargs
.
Generalization
Although dPipes
initially addressed pd.DataFrame.pipe
method-chaining, it's extensible to any
API that implements a pandas-like DataFrame.pipe
method (e.g. Polars). Further, the
dpipes.pipeline module extends this composition to any arbitrary Python function.
That is, this:
or this:
becomes this:
which is, arguably, more readable and, once again, easier to apply to other objects.
Note
Though the above examples use simple callables, users can pass any arbitary kwargs
to a
Pipeline
object. The PipelineProcessor
objects can also "broadcast" arguments throughout
a DataFrame pipeline.
See the tutorials and how-to sections for additional information and examples.