Python fluent filter, map, etc

I am looking now at an answer that strikes closer to the heart of the question:

fluentpy https://pypi.org/project/fluentpy/ :

Here is the kind of method chaining for collections that a streams programmer (in scala, java, others) will appreciate:

import fluentpy as _
(
  _(range(1,50+1))
  .map(_.each * 4)
  .filter(_.each <= 170)
  .filter(lambda each: len(str(each))==2)
  .filter(lambda each: each % 20 == 0)
  .enumerate()
  .map(lambda each: 'Result[%d]=%s' %(each[0],each[1]))
  .join(',')
  .print()
)

And it works fine:

Result[0]=20,Result[1]=40,Result[2]=60,Result[3]=80

I am just now trying this out. It will be a very good day today if this were working as it is shown above.

Update: Look at this: maybe python can start to be more reasonable as one-line shell scripts:

python3 -m fluentpy "lib.sys.stdin.readlines().map(str.lower).map(print)"

Here is it in action on command line:

$echo -e "Hello World line1\nLine 2\Line 3\nGoodbye" 
         | python3 -m fluentpy "lib.sys.stdin.readlines().map(str.lower).map(print)"

hello world line1

line 2

line 3

goodbye

There is an extra newline that should be cleaned up - but the gist of it is useful (to me anyways).

Update Here's yet another library/option : one that I adapted from a gist and is available on pipy as infixpy:

from infixpy import *
a = (Seq(range(1,51))
     .map(lambda x: x * 4)
     .filter(lambda x: x <= 170)
     .filter(lambda x: len(str(x)) == 2)
     .filter( lambda x: x % 20 ==0)
     .enumerate()                                            Ï
     .map(lambda x: 'Result[%d]=%s' %(x[0],x[1]))
     .mkstring(' .. '))
print(a)

Comprehensions are the fluent python way of handling filter/map operations.

Your code would be something like:

def evenize(input_list):
    return [x for x in input_list if x % 2 == 0]

Comprehensions don't work well with side effects like console logging, so do that in a separate loop. Chaining function calls isn't really that common an idiom in python. Don't expect that to be your bread and butter here. Python libraries tend to follow the "alter state or return a value, but not both" pattern. Some exceptions exist.

Edit: On the plus side, python provides several flavors of comprehensions, which are awesome:

List comprehension: [x for x in range(3)] == [0, 1, 2]

Set comprehension: {x for x in range(3)} == {0, 1, 2}

Dict comprehension: ` {x: x**2 for x in range(3)} == {0: 0, 1: 1, 2: 4}

Generator comprehension (or generator expression): (x for x in range(3)) == <generator object <genexpr> at 0x10fc7dfa0>

With the generator comprehension, nothing has been evaluated yet, so it is a great way to prevent blowing up memory usage when pipelining operations on large collections.

For instance, if you try to do the following, even with python3 semantics for range:

for number in [x**2 for x in range(10000000000000000)]:
    print(number)

you will get a memory error trying to build the initial list. On the other hand, change the list comprehension into a generator comprehension:

for number in (x**2 for x in range(1e20)):
    print(number)

and there is no memory issue (it just takes forever to run). What happens is the range object gets built (which only stores the start, stop and step values (0, 1e20, and 1)) the object gets built, and then the for-loop begins iterating over the genexp object. Effectively, the for-loop calls

GENEXP_ITERATOR = `iter(genexp)`
number = next(GENEXP_ITERATOR)
# run the loop one time
number = next(GENEXP_ITERATOR)
# run the loop one time
# etc.

(Note the GENEXP_ITERATOR object is not visible at the code level)

next(GENEXP_ITERATOR) tries to pull the first value out of genexp, which then starts iterating on the range object, pulls out one value, squares it, and yields out the value as the first number. The next time the for-loop calls next(GENEXP_ITERATOR), the generator expression pulls out the second value from the range object, squares it and yields it out for the second pass on the for-loop. The first set of numbers are no longer held in memory.

This means that no matter how many items in the generator comprehension, the memory usage remains constant. You can pass the generator expression to other generator expressions, and create long pipelines that never consume large amounts of memory.

def pipeline(filenames):
    basepath = path.path('/usr/share/stories')
    fullpaths = (basepath / fn for fn in filenames)
    realfiles = (fn for fn in fullpaths if os.path.exists(fn))
    openfiles = (open(fn) for fn in realfiles)
    def read_and_close(file):
        output = file.read(100)
        file.close()
        return output
    prefixes = (read_and_close(file) for file in openfiles)
    noncliches = (prefix for prefix in prefixes if not prefix.startswith('It was a dark and stormy night')
    return {prefix[:32]: prefix for prefix in prefixes}

At any time, if you need a data structure for something, you can pass the generator comprehension to another comprehension type (as in the last line of this example), at which point, it will force the generators to evaluate all the data they have left, but unless you do that, the memory consumption will be limited to what happens in a single pass over the generators.


Generators, iterators, and itertools give added powers to chaining and filtering actions. But rather than remember (or look up) rarely used things, I gravitate toward helper functions and comprehensions.

For example in this case, take care of the logging with a helper function:

def echo(x):
    print(x)
    return x

Selecting even values is easy with the if clause of a comprehension. And since the final output is a dictionary, use that kind of comprehension:

In [118]: d={echo(x):True for x in s if x%2==0}
2
4

In [119]: d
Out[119]: {2: True, 4: True}

or to add these values to an existing dictionary, use update.

new_set.update({echo(x):True for x in s if x%2==0})

another way to write this is with an intermediate generator:

{y:True for y in (echo(x) for x in s if x%2==0)}

Or combine the echo and filter in one generator

def even(s):
    for x in s:
        if x%2==0:
            print(x)
            yield(x)

followed by a dict comp using it:

{y:True for y in even(s)}