Sunday, February 28, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Technology Companies

On demand data in Python, Part 2: The magic of itertools

January 9, 2019
in Technology Companies
IBM RFE Community, Release 23
588
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

On demand data in Python, Part 2

Find a versatile toolkit for working with iterators in the standard library

You might also like

2021 is the year that open source overcomes its diversity problems – IBM Developer

The Case of the Miscoded Credentials – IBM Developer

Two new Call for Code for Racial Justice projects just went open source – IBM Developer


Content series:

This content is part # of # in the series: On demand data in Python, Part 2

https://www.ibm.com/developerworks/library/?series_title_by=**auto**

Stay tuned for additional content in this series.

This content is part of the series:On demand data in Python, Part 2

Stay tuned for additional content in this series.

In Part
1
of this series, you looked at Python iterators and the most straightforward way
of creating your own iterators, which is by writing generator functions or
expressions. Once you get the basics, you will quickly start to have all sorts of
ideas for building on iterators. However, it’s a good idea to make sure you’re not reinventing the wheel.

Python’s standard library is a marvel of valuable support and tools, with some a bit
more buried than others. There are several modules useful in manipulating iterators,
but first you should look at some facilities available right in the built-ins, meaning for the most part, you don’t need to import anything to use them.

Using map to build on iterators

With the map function, you can process each item in an iterator, which leaves you with a new iterator of the results.

>>> r = range(10)
>>> m = map(str, r)
>>> next(m)
'0'
>>> list(m)
['1', '2', '3', '4', '5', '6', '7', '8', '9']

The first argument to map is the mapping function, the function to apply to each item.
In this case, I use str to convert each integer from the range into a
string. When I request the first item from the map iterator, I get a string. I then use list to extract everything in the sequence. Notice that the list starts with 1 rather than 0. That’s because I already extracted the first item from the map. Iterators, unlike lists or tuples, do not support random access. You get items one at a time, in a forward direction.

Of course, you can write your own mapping function. It expects one input value from
map. Paste the following into your interpreter session:

def get_letter(i):
    return chr(ord('a')+i)

Here, I use ord to get the numerical code corresponding to the character
'a' and add the function’s argument to derive a new character code. The function chr converts this code back to a string character. Let’s use this function with map.

>>> r = range(10)
>>> list(map(get_letter, r))
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

Map versus generator expressions

At this point, you probably remember Part
1
and realize this is much like a generator expression. Indeed, I could have done
the following.

>>> list( ( get_letter(i) for i in range(10) ) )
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

Notice how I put the generator expression right in as an argument to list.
That should help illustrate that you can use generator expressions just like other
Python expressions. In this case, the get_letter function is simple enough that you could even build it right into the generator expression.

>>> list( ( chr(ord('a')+i) for i in range(10) ) )
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

This seems redundant, but map has existed in Python longer than generator expressions, and there are specialized cases where it’s still handy. For the most part though, you would probably use generator expressions for such simple manipulations of iterators.

Making reductions from iterators

Sometimes, you don’t want to turn one iterator into another but rather compute one single value from an iterator.

>>> sum(range(20, 30))
245

The sum function takes an iterator of numbers (think ints or floats) and returns a sum of all items in the iterator, in this case the sum of all numbers from 20 to 30. This is called a reduction, where a series of values is reduced to one new value. Python also provides a generalized function for reduction. Start by pasting in the following function:

def sum_of_squares(acc, i):
    return acc + i**2

Use this to reduce as follows. You need the functools module.

>>> import functools
>>> functools.reduce(sum_of_squares, range(10))
285

The first argument is a function that takes two arguments. It takes an accumulated value and then it takes a new value to combine into an updated accumulated value in some way, which is then returned. In this case, the accumulated value is the sum of squares of items so far.

Just as generator expressions can be more readable than using map, there
are often more readable alternatives to reduce, but it’s useful to have for certain
specialized uses (and it’s a conceptual counterpoint to map).

The idea of map is to take an iterator and generate a derived iterator. The idea of reduce is to take an iterator and return a single value derived from all its items. This combination of concepts turns out to be a powerful way to think about processing large quantities of data.

The string.join method

I want to mention a special method that acts as a reduction.

>>> tenletters = ( chr(ord('a')+i) for i in range(10) )
>>> '!'.join(tenletters)
'a!b!c!d!e!f!g!h!i!j'

The join method on a string takes an iterator and creates a new string by
joining together the iterator item strings with a specified string in between. Playing around a bit more:

>>> '!'.join(tenletters)
''
>>> tenletters = ( chr(ord('a')+i) for i in range(10) )
>>> ''.join(tenletters)
'abcdefghij'
>>> tenletters = ( chr(ord('a')+i) for i in range(10) )
>>> ' and '.join(tenletters)
'a and b and c and d and e and f and g and h and i and j'

The output from the first line is a good reminder that iterators are one way, and once
used up, you won’t get any more out of them. Pay attention because this is a common
cause of bugs in code using iterators. In the next line, I set up a new generator, and you can see that a join on the empty string just concatenates all the strings in the given iterator. You can also join with longer strings as demonstrated in the final couple of lines.

Filtering

There is another built-in function you can use to create an iterator containing only some subset of its input. In Part
1
, I presented the following example.

>>> import math
>>> notby2or3 = ( n for n in range(1, 20) if math.gcd(n, 2) == 1 and math.gcd(n, 3) == 1 )
>>> list(notby2or3)
[1, 5, 7, 11, 13, 17, 19]

Go back and check Part
1
if you need a refresher on the notby2or3 function. You can
implement the equivalent of this generator expression using the filter
function, which (again) is just good to be aware of.

>>> import math
>>> def notby2or3(n):
...     return math.gcd(n, 2) == 1 and math.gcd(n, 3) == 1
... 
>>> list(filter(notby2or3, range(1, 20)))
[1, 5, 7, 11, 13, 17, 19]

I could have written this with one or two lines fewer using what’s called a lambda function, but that’s outside the scope of this tutorial series.

There are many small patterns for using iterators that come up over and
over again. The itertools module provides high-performance implementations of many of these, which are well worth learning.

Suppose you’d like to create a longer iterator that concatenates a sequence
of other iterators:

>>> import itertools
>>> it = itertools.chain(range(5), range(5, 0, -1), range(5))
>>> list(it)
[0, 1, 2, 3, 4, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4]

Pass itertools.chain the iterators as function arguments and
it produces a chain of the outputs of these. If you have a list or tuple sequence of iterators, you can use Python’s special syntax for variable positional function arguments.

>>> list_of_iters = [range(5), range(5, 0, -1), range(5)]
>>> it = itertools.chain(*list_of_iters)
>>> list(it)
[0, 1, 2, 3, 4, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4]

You can use any iterator for functional variable arguments. It doesn’t have
to be a list or tuple. Paste in the following:

def forth_back_forth(n):
    yield range(n)
    yield range(n, 0, -1)
    yield range(n)

You can use this generator to feed itertools.chain its arguments:

>>> it = itertools.chain(*forth_back_forth(3))
>>> list(it)
[0, 1, 2, 3, 2, 1, 0, 1, 2]

Infinite iterators

The combination of forth_back_forth and itertools.chain makes me wonder what it would be like to continue that back/forth/back number pattern even further. Paste in the following:

def zigzag_iters(period):
    while True:
        yield range(period)
        yield range(period, 0, -1)

Then, try it out:

>>> it = zigzag_iters(2)
>>> next(it)
range(0, 2)
>>> next(it)
range(2, 0, -1)
>>> next(it)
range(0, 2)
>>> next(it)
range(2, 0, -1)
>>> next(it)
range(0, 2)

If you continue to enter next(it), it continues to cycle
between the two output ranges over and over because while
True
implements an infinite loop, and there is no code within it to break the
loop’s execution. This generator is in a constant state of suspension and
resumption until the Python process (the interactive interpreter, in this
case) terminates. Or you could use the close() method on any generator object to end it regardless of where it is in execution.

You might have noticed that I’m not using list to illustrate
the iterator’s contents. If you try to do this, you are basically trying to create an infinite list, and of course, you will run out of memory.

Remember also that this is returning iterators, not items. With
forth_back_forth, the idea was to use
itertools.chain to get the items from those iterators. You
cannot do the same thing with zigzag_iters because this also
tries to create a collection of infinite items, and all the arguments to a
function must be known before the function is executed, even in the case
of variable arguments. This would would mean the Python runtime would be trying to get all items from an infinite iterator, which is not possible.

Infinite iterators are actually a useful concept once you start using generators. But, you have to be aware of operations that attempt to build any collection from the contents of such iterators.

Cascading iterators

Because you can’t use itertools.chain with zigzag_iters, create a more direct version.

def zigzag(period):
    while True:
        for n in range(period):
            yield n
        for n in range(period, 0, -1):
            yield n

Paste this in and see how the iterator continues indefinitely:

>>> it = zigzag(2)
>>> next(it)
0
>>> next(it)
1
>>> next(it)
2
>>> next(it)
1
>>> next(it)
0
>>> next(it)
1
>>> next(it)
2
>>> next(it)
1
>>> next(it)
2
>>> next(it)
1
>>> next(it)
0
>>> next(it)
1

This is another common pattern where an outer iterator is a cascade of one
or more inner iterators. It’s common enough that Python provides a
standard syntax for it, the yield from statement, as you can see from the following rewrite of zigzag that is functionally identical.

def zigzag(period):
    while True:
        yield from range(period)
        yield from range(period, 0, -1)

If you think of an iterator providing the entirety of another iterator,
think yield from. The provided iterator will be completely
consumed in this case. If it is an infinite iterator, the outer iterator will also be infinite and continue on the yield from statement indefinitely.

A variant on itertools.chain

One final note, for completeness: Earlier, I said that you can’t use the
zigzag_iters function with itertools.chain because it would attempt to create an infinite collection of function arguments. The minds behind itertools also thought of that and provided a variant that does work, itertools.chain.from_iterable.

>>> it = itertools.chain.from_iterable(zigzag_iters(2))
>>> next(it)
0
>>> next(it)
1
>>> next(it)
2
>>> next(it)
1
>>> next(it)
0
>>> next(it)
1
>>> next(it)
2
>>> next(it)
1
>>> next(it)
0
>>> next(it)
1
>>> next(it)
2
>>> next(it)
1

More from itertools

You’ll find in itertools some handy routines for dealing with
infinite iterators. One is itertools.islice. With this, you
can extract a subset sequence from an iterator, including from an infinite iterator.

>>> it1 = zigzag(5)
>>> it2 = itertools.islice(it1, 0, 20)
>>> list(it2)
[0, 1, 2, 3, 4, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 4, 3, 2, 1]
>>> it2 = itertools.islice(it1, 1, 20)
>>> list(it2)
[1, 2, 3, 4, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 4, 3, 2, 1]
>>> it2 = itertools.islice(it1, 0, 20, 2)
>>> list(it2)
[0, 2, 4, 4, 2, 0, 2, 4, 4, 2]

The function takes the iterator to excerpt from, the starting index, and
the ending index. You can also optionally provide a step, so in some ways,
it feels similar to range. Be warned though: Unlike with range, a negative step value is not allowed.

Grouping

One of the most useful bits of itertools is also one of its trickiest, but well worth understanding.

Use itertools.groupby to segment iterators according to some criteria. Here is one that groups hours into quadrants of a twelve-hour period.

>>> hours = range(12)
>>> def divide_by_3(n): return n//3
... 
>>> it = itertools.groupby(hours, divide_by_3)
>>> next(it)
(0, <itertools._grouper object at 0x1075ac9e8>)

You provide itertools.groupby with an input iterator and a grouping
function, which is executed on each item in that iterator. When the result of the grouping function changes from one item to the next, a new group is created, with all the items until the next change in grouping function value. The grouping function, divide_by_3, uses the floor division operator, //, to divide and then round down to the next smallest integer.

The Figure 1 helps visualize the operation of itertools.groupby in this example.

Figure 1. A visualization of itertools.groupby

The key to understanding itertools.groupby is remembering that
the change in the value of the grouping function is what leads to the
grouped items in the output. With some practice, you’ll be able to expertly write grouping functions for all sorts of use cases.

Let’s dig a bit more into our itertools.groupby example.

>>> hours = range(12)
>>> for quadrant, group_it in itertools.groupby(hours, divide_by_3):
...     print('Items in quadrant', quadrant)
...     for i in group_it:
...         print(i)
... 
Items in quadrant 0
0
1
2
Items in quadrant 1
3
4
5
Items in quadrant 2
6
7
8
Items in quadrant 3
9
10
11

Here, the outer loop invokes itertools.groupby, but each item
in the loop is a tuple. The second element in this tuple,
group_it, is in turn an iterator, as you can see in Figure 1. So, the inner loop works on the iterator of each group.

Grouping with infinite iterators

To pull together a couple of concepts from this tutorial, you can use
grouping with infinite iterators. Start by looking at a feature from itertools, which neatly creates infinite iterators.

>>> inf_it = itertools.cycle(range(12))
>>> print(list(itertools.islice(inf_it, 32)))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7]

In the first line, I use itertools.cycle, which creates an
infinite cycle over a finite iterator you provide. In this case, it just
generates the 0 to 11 sequence over and over again. I use itertools.islice to safely take a subset of it and print that as a list. There is also itertools.repeat, which takes a single value and iterates that one value over and over.

Combining itertools.groupby and itertools.cycle is the last tool to explore in this tutorial.

>>> inf_it = itertools.cycle(range(12))
>>> hours_32 = itertools.islice(inf_it, 32)
>>> for quadrant, group_it in itertools.groupby(hours_32, divide_by_3):
...     print('Items in quadrant', quadrant, ':', list(group_it))
... 
Items in quadrant 0 : [0, 1, 2]
Items in quadrant 1 : [3, 4, 5]
Items in quadrant 2 : [6, 7, 8]
Items in quadrant 3 : [9, 10, 11]
Items in quadrant 0 : [0, 1, 2]
Items in quadrant 1 : [3, 4, 5]
Items in quadrant 2 : [6, 7, 8]
Items in quadrant 3 : [9, 10, 11]
Items in quadrant 0 : [0, 1, 2]
Items in quadrant 1 : [3, 4, 5]
Items in quadrant 2 : [6, 7]

Study this session carefully to get a valuable grasp of how these iterator tools interact. For example, you can see how the cycle of the hour numbers leads to a cycle of the grouping function results.

You’re starting to see how compactly you can pull together basic building blocks to produce interesting patterns of iterators. As you combine such facilities with your own specialized generators, this becomes a powerful paradigm for breaking down problems dealing with data in series efficiently.

Go forth and iterate

There is more to itertools than I was able to cover in this tutorial, but you now have a basic idea of some of its offerings, and a good look at one of its trickiest but potentially most valuable facilities, groupby.

As with the entire standard library, you can find comprehensive documentation
of the module
, but as usual, this is more for a well-versed developer to check on
details. It’s not designed as a primer. I recommend that you start with the building
blocks in this tutorial and the previous
one
, and experiment bit-by-bit with the features of itertools, and the other facilities available in built-ins and the standard library for working with iterators.

Take a trip on the even wilder side in Part 3 of this series. You’ll explore
coroutines, a special sort of generator function. You’ll also learn about another powerful but tricky standard library module: asyncio.


Downloadable resources

Related topics

Credit: Source link

Previous Post

What should you do with your old devices?

Next Post

JavaScript waitForever

Related Posts

Introducing the technology preview of IBM API Hub on IBM Developer, where you can discover, try, adopt, and consume APIs from IBM and our ecosystem partners – IBM Developer
Technology Companies

2021 is the year that open source overcomes its diversity problems – IBM Developer

February 27, 2021
Introducing the technology preview of IBM API Hub on IBM Developer, where you can discover, try, adopt, and consume APIs from IBM and our ecosystem partners – IBM Developer
Technology Companies

The Case of the Miscoded Credentials – IBM Developer

February 24, 2021
Six courses to build your technology skills in 2021 – IBM Developer
Technology Companies

Two new Call for Code for Racial Justice projects just went open source – IBM Developer

February 20, 2021
Introducing the technology preview of IBM API Hub on IBM Developer, where you can discover, try, adopt, and consume APIs from IBM and our ecosystem partners – IBM Developer
Technology Companies

Introducing the technology preview of IBM API Hub on IBM Developer, where you can discover, try, adopt, and consume APIs from IBM and our ecosystem partners – IBM Developer

February 18, 2021
Six courses to build your technology skills in 2021 – IBM Developer
Technology Companies

Project OWL announces new release of ClusterDuck Protocol to build emergency mesh networks – IBM Developer

February 17, 2021
Next Post
How to Change the WordPress Admin Login Logo

JavaScript waitForever

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Accurate classification of COVID‐19 patients with different severity via machine learning – Sun – 2021 – Clinical and Translational Medicine
Machine Learning

Accurate classification of COVID‐19 patients with different severity via machine learning – Sun – 2021 – Clinical and Translational Medicine

February 28, 2021
Privacy Commissioner asks for clarity on minister’s powers in Critical Infrastructure Bill
Internet Security

Privacy Commissioner asks for clarity on minister’s powers in Critical Infrastructure Bill

February 28, 2021
Top Master’s Programs In Machine Learning In The US
Machine Learning

Top Master’s Programs In Machine Learning In The US

February 28, 2021
TikTok agrees to pay $92 million to settle teen privacy class-action lawsuit
Internet Security

TikTok agrees to pay $92 million to settle teen privacy class-action lawsuit

February 28, 2021
Machine Learning as a Service (MLaaS) Market 2020 Emerging Trend and Advancement Outlook 2025
Machine Learning

Key Company Profile, Production Revenue, Product Picture and Specifications 2025

February 28, 2021
Cybercrime groups are selling their hacking skills. Some countries are buying
Internet Security

Cybercrime groups are selling their hacking skills. Some countries are buying

February 28, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Accurate classification of COVID‐19 patients with different severity via machine learning – Sun – 2021 – Clinical and Translational Medicine February 28, 2021
  • Privacy Commissioner asks for clarity on minister’s powers in Critical Infrastructure Bill February 28, 2021
  • Top Master’s Programs In Machine Learning In The US February 28, 2021
  • TikTok agrees to pay $92 million to settle teen privacy class-action lawsuit February 28, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates