Block storage rewritesHackerRank “Service Lane” in Python“Algorithmic Crush” problem hitting timeout errorsPangrams python implementationThe veiled guise of binary and stringPangrams implementation using LinqMatching maximal nodes with velocitiesCount numbers with atleast one common factor, excluding one (updated)Counting ways to divide a word into palindrome blocksOptimization of list's sublistProgramming Contest - Snuke Festival

Why would five hundred and five same as one?

Is this saw blade faulty?

Why didn’t Eve recognize the little cockroach as a living organism?

Reason why a kingside attack is not justified

Extract substring according to regexp with sed or grep

Would this string work as string?

categorizing a variable turns it from insignificant to significant

What is the tangent at a sharp point on a curve?

Why is "la Gestapo" feminine?

1 John in Luther’s Bibel

How would a solely written language work mechanically

Unfrosted light bulb

Can you describe someone as luxurious? As in someone who likes luxurious things?

I keep switching characters, how do I stop?

Calculate Pi using Monte Carlo

Output visual diagram of picture

Why can't I get pgrep output right to variable on bash script?

What is the period/term used describe Giuseppe Arcimboldo's style of painting?

How can a new country break out from a developed country without war?

Why is participating in the European Parliamentary elections used as a threat?

Air travel with refrigerated insulin

Can you take a "free object interaction" while incapacitated?

Do people actually use the word "kaputt" in conversation?

Turning a hard to access nut?

Block storage rewrites

HackerRank “Service Lane” in Python“Algorithmic Crush” problem hitting timeout errorsPangrams python implementationThe veiled guise of binary and stringPangrams implementation using LinqMatching maximal nodes with velocitiesCount numbers with atleast one common factor, excluding one (updated)Counting ways to divide a word into palindrome blocksOptimization of list's sublistProgramming Contest - Snuke Festival

Summary

In a block storage system, new data is written in blocks. We are going to represent the flash memory as one sequential array. We have a list of block writes coming in the form of arrays of size 2: writes[i] = [first_block_written, last_block_written].

Each block has a rewrite limit. If rewrites on a block reach a certain specified threshold we should run a special diagnostic.

Given blockCount (an integer representing the total number of blocks), writes (the list of block-write arrays of size 2), and threshold, your task is to return the list of disjoint block segments, each consisting of blocks that have reached the rewrite threshold. The list of block segments should be sorted in increasing order by their left ends.

Examples

For blockCount = 10, writes = [[0, 4], [3, 5], [2, 6]], and threshold = 2, the output should be blockStorageRewrites(blockCount, writes, threshold) = [[2, 5]].

After the first write, the blocks 0, 1, 2, 3 and 4 were written in once;
After the second write, the blocks 0, 1, 2 and 5 were written in once, and the blocks 3 and 4 reached the rewrite threshold;
After the final write, the blocks 2 and 5 reached the rewrite threshold as well, so the blocks that should be diagnosed are 2, 3, 4 and 5.
Blocks 2, 3, 4 and 5 form one consequent segment [2, 5].

For blockCount = 10, writes = [[0, 4], [3, 5], [2, 6]], and threshold = 3, the output should be
blockStorageRewrites(blockCount, writes, threshold) = [[3, 4]]

For blockCount = 10, writes = [[3, 4], [0, 1], [6, 6]], and threshold = 1, the output should be
blockStorageRewrites(blockCount, writes, threshold) = [[0, 1], [3, 4], [6, 6]]

Constraints

1 ≤ blockCount ≤ 10**5

0 ≤ writes.length ≤ 10**5

writes[i].length = 2

0 ≤ writes[i][0] ≤ writes[i][1] < blockCount

First try

from itertools import groupby

def group_blocks(num_writes, threshold):
 i = 0
 for key, group in groupby(num_writes, lambda x : x >= threshold):
 # This is faster compared to len(list(g))
 length = sum(1 for _ in group)
 if key:
 yield [i, i + length - 1]
 i += length

def blockStorageRewrites(blockCount, writes, threshold):
 num_writes = [0] * blockCount
 for lower, upper in writes:
 for i in range(lower, upper + 1):
 num_writes[i] += 1
 return list(group_blocks(num_writes, threshold))

Here I just do exactly what is asked, I create an array of blockCount size, loop over the writes and lastly group the consecutive ranges with itertoos.groupby

After trying to optimize

I'm not quite sure what the complexity was, but I tried to lessen the load, yet I still didn't pass the TLE constraints

def get_bad_blocks(blockCount, writes, threshold):
 cons = False
 u = l = -1
 for i in range(blockCount):
 count = 0
 for lower, upper in writes:
 if lower <= i <= upper:
 count += 1
 if count >= threshold:
 if not cons:
 cons = True
 l = i
 u = -1
 break
 else:
 u = i - 1
 cons = False

 if u != -1 and l != -1:
 yield [l, u]
 l = -1
 if cons:
 yield [l, i]


def blockStorageRewrites(blockCount, writes, threshold): 
 return list(get_bad_blocks(blockCount, writes, threshold))

Questions

You can review any and all, but preferably I'm looking for answers that:

Tell me if my first example is readable

I'm less concerned about readability in my second try, and more interested in speed

Please ignore the PEP8 naming violations as this is an issue with the programming challenge site

edited Mar 13 at 16:09

asked Mar 13 at 15:33

Ludisposed

8,84422267

1

$begingroup$
I don't think it's possible to give an "answer" addressing the performance issue — your current algorithm is unsalvageable. Consider what would happen in a realistic scenario where blockCount=1000000. The very first thing you'd do is loop for i in range(1000000)! That cannot possibly be part of a valid solution. You need to come up with an algorithm that doesn't take O(blockCount) time.
$endgroup$
– Quuxplusone
Mar 13 at 15:47

$begingroup$
The comment "should be faster compared to len(list(g))" makes me think you should be doing some performance tests :)
$endgroup$
– Peilonrayz
Mar 13 at 15:51

$begingroup$
@Quuxplusone probably, but I fail to see how this can be done without looping over the entire blockcounts... hence the asking for a review.
$endgroup$
– Ludisposed
Mar 13 at 15:52

$begingroup$
@Peilonrayz I have tested that locally, it should be faster because it doesn't have to construct the list as far as I can tell
$endgroup$
– Ludisposed
Mar 13 at 15:53

2

$begingroup$
I don't have time to check now, but my guess is that you would want to use a (default)dict with only those blocks that are encountered in writes. If it is a starting block, add +1, if it's the end block, subtract 1. Then go over the values with itertool.accumulate to get the numbers of writes between blocks.
$endgroup$
– Georgy
Mar 13 at 16:10

|
show 4 more comments

Summary

In a block storage system, new data is written in blocks. We are going to represent the flash memory as one sequential array. We have a list of block writes coming in the form of arrays of size 2: writes[i] = [first_block_written, last_block_written].

Each block has a rewrite limit. If rewrites on a block reach a certain specified threshold we should run a special diagnostic.

Given blockCount (an integer representing the total number of blocks), writes (the list of block-write arrays of size 2), and threshold, your task is to return the list of disjoint block segments, each consisting of blocks that have reached the rewrite threshold. The list of block segments should be sorted in increasing order by their left ends.

Examples

For blockCount = 10, writes = [[0, 4], [3, 5], [2, 6]], and threshold = 2, the output should be blockStorageRewrites(blockCount, writes, threshold) = [[2, 5]].

After the first write, the blocks 0, 1, 2, 3 and 4 were written in once;
After the second write, the blocks 0, 1, 2 and 5 were written in once, and the blocks 3 and 4 reached the rewrite threshold;
After the final write, the blocks 2 and 5 reached the rewrite threshold as well, so the blocks that should be diagnosed are 2, 3, 4 and 5.
Blocks 2, 3, 4 and 5 form one consequent segment [2, 5].

For blockCount = 10, writes = [[0, 4], [3, 5], [2, 6]], and threshold = 3, the output should be
blockStorageRewrites(blockCount, writes, threshold) = [[3, 4]]

For blockCount = 10, writes = [[3, 4], [0, 1], [6, 6]], and threshold = 1, the output should be
blockStorageRewrites(blockCount, writes, threshold) = [[0, 1], [3, 4], [6, 6]]

Constraints

1 ≤ blockCount ≤ 10**5

0 ≤ writes.length ≤ 10**5

writes[i].length = 2

0 ≤ writes[i][0] ≤ writes[i][1] < blockCount

First try

from itertools import groupby

def group_blocks(num_writes, threshold):
 i = 0
 for key, group in groupby(num_writes, lambda x : x >= threshold):
 # This is faster compared to len(list(g))
 length = sum(1 for _ in group)
 if key:
 yield [i, i + length - 1]
 i += length

def blockStorageRewrites(blockCount, writes, threshold):
 num_writes = [0] * blockCount
 for lower, upper in writes:
 for i in range(lower, upper + 1):
 num_writes[i] += 1
 return list(group_blocks(num_writes, threshold))

Here I just do exactly what is asked, I create an array of blockCount size, loop over the writes and lastly group the consecutive ranges with itertoos.groupby

After trying to optimize

I'm not quite sure what the complexity was, but I tried to lessen the load, yet I still didn't pass the TLE constraints

def get_bad_blocks(blockCount, writes, threshold):
 cons = False
 u = l = -1
 for i in range(blockCount):
 count = 0
 for lower, upper in writes:
 if lower <= i <= upper:
 count += 1
 if count >= threshold:
 if not cons:
 cons = True
 l = i
 u = -1
 break
 else:
 u = i - 1
 cons = False

 if u != -1 and l != -1:
 yield [l, u]
 l = -1
 if cons:
 yield [l, i]


def blockStorageRewrites(blockCount, writes, threshold): 
 return list(get_bad_blocks(blockCount, writes, threshold))

Questions

You can review any and all, but preferably I'm looking for answers that:

Tell me if my first example is readable

I'm less concerned about readability in my second try, and more interested in speed

Please ignore the PEP8 naming violations as this is an issue with the programming challenge site

edited Mar 13 at 16:09

asked Mar 13 at 15:33

Ludisposed

8,84422267

1

$begingroup$
I don't think it's possible to give an "answer" addressing the performance issue — your current algorithm is unsalvageable. Consider what would happen in a realistic scenario where blockCount=1000000. The very first thing you'd do is loop for i in range(1000000)! That cannot possibly be part of a valid solution. You need to come up with an algorithm that doesn't take O(blockCount) time.
$endgroup$
– Quuxplusone
Mar 13 at 15:47

$begingroup$
The comment "should be faster compared to len(list(g))" makes me think you should be doing some performance tests :)
$endgroup$
– Peilonrayz
Mar 13 at 15:51

$begingroup$
@Quuxplusone probably, but I fail to see how this can be done without looping over the entire blockcounts... hence the asking for a review.
$endgroup$
– Ludisposed
Mar 13 at 15:52

$begingroup$
@Peilonrayz I have tested that locally, it should be faster because it doesn't have to construct the list as far as I can tell
$endgroup$
– Ludisposed
Mar 13 at 15:53

2

$begingroup$
I don't have time to check now, but my guess is that you would want to use a (default)dict with only those blocks that are encountered in writes. If it is a starting block, add +1, if it's the end block, subtract 1. Then go over the values with itertool.accumulate to get the numbers of writes between blocks.
$endgroup$
– Georgy
Mar 13 at 16:10

|
show 4 more comments

Summary

In a block storage system, new data is written in blocks. We are going to represent the flash memory as one sequential array. We have a list of block writes coming in the form of arrays of size 2: writes[i] = [first_block_written, last_block_written].

Each block has a rewrite limit. If rewrites on a block reach a certain specified threshold we should run a special diagnostic.

Given blockCount (an integer representing the total number of blocks), writes (the list of block-write arrays of size 2), and threshold, your task is to return the list of disjoint block segments, each consisting of blocks that have reached the rewrite threshold. The list of block segments should be sorted in increasing order by their left ends.

Examples

For blockCount = 10, writes = [[0, 4], [3, 5], [2, 6]], and threshold = 2, the output should be blockStorageRewrites(blockCount, writes, threshold) = [[2, 5]].

After the first write, the blocks 0, 1, 2, 3 and 4 were written in once;
After the second write, the blocks 0, 1, 2 and 5 were written in once, and the blocks 3 and 4 reached the rewrite threshold;
After the final write, the blocks 2 and 5 reached the rewrite threshold as well, so the blocks that should be diagnosed are 2, 3, 4 and 5.
Blocks 2, 3, 4 and 5 form one consequent segment [2, 5].

For blockCount = 10, writes = [[0, 4], [3, 5], [2, 6]], and threshold = 3, the output should be
blockStorageRewrites(blockCount, writes, threshold) = [[3, 4]]

For blockCount = 10, writes = [[3, 4], [0, 1], [6, 6]], and threshold = 1, the output should be
blockStorageRewrites(blockCount, writes, threshold) = [[0, 1], [3, 4], [6, 6]]

Constraints

1 ≤ blockCount ≤ 10**5

0 ≤ writes.length ≤ 10**5

writes[i].length = 2

0 ≤ writes[i][0] ≤ writes[i][1] < blockCount

First try

from itertools import groupby

def group_blocks(num_writes, threshold):
 i = 0
 for key, group in groupby(num_writes, lambda x : x >= threshold):
 # This is faster compared to len(list(g))
 length = sum(1 for _ in group)
 if key:
 yield [i, i + length - 1]
 i += length

def blockStorageRewrites(blockCount, writes, threshold):
 num_writes = [0] * blockCount
 for lower, upper in writes:
 for i in range(lower, upper + 1):
 num_writes[i] += 1
 return list(group_blocks(num_writes, threshold))

Here I just do exactly what is asked, I create an array of blockCount size, loop over the writes and lastly group the consecutive ranges with itertoos.groupby

After trying to optimize

I'm not quite sure what the complexity was, but I tried to lessen the load, yet I still didn't pass the TLE constraints

def get_bad_blocks(blockCount, writes, threshold):
 cons = False
 u = l = -1
 for i in range(blockCount):
 count = 0
 for lower, upper in writes:
 if lower <= i <= upper:
 count += 1
 if count >= threshold:
 if not cons:
 cons = True
 l = i
 u = -1
 break
 else:
 u = i - 1
 cons = False

 if u != -1 and l != -1:
 yield [l, u]
 l = -1
 if cons:
 yield [l, i]


def blockStorageRewrites(blockCount, writes, threshold): 
 return list(get_bad_blocks(blockCount, writes, threshold))

Questions

You can review any and all, but preferably I'm looking for answers that:

Tell me if my first example is readable

I'm less concerned about readability in my second try, and more interested in speed

Please ignore the PEP8 naming violations as this is an issue with the programming challenge site

edited Mar 13 at 16:09

asked Mar 13 at 15:33

Ludisposed

8,84422267

Summary

In a block storage system, new data is written in blocks. We are going to represent the flash memory as one sequential array. We have a list of block writes coming in the form of arrays of size 2: writes[i] = [first_block_written, last_block_written].

Each block has a rewrite limit. If rewrites on a block reach a certain specified threshold we should run a special diagnostic.

Given blockCount (an integer representing the total number of blocks), writes (the list of block-write arrays of size 2), and threshold, your task is to return the list of disjoint block segments, each consisting of blocks that have reached the rewrite threshold. The list of block segments should be sorted in increasing order by their left ends.

Examples

For blockCount = 10, writes = [[0, 4], [3, 5], [2, 6]], and threshold = 2, the output should be blockStorageRewrites(blockCount, writes, threshold) = [[2, 5]].

After the first write, the blocks 0, 1, 2, 3 and 4 were written in once;
After the second write, the blocks 0, 1, 2 and 5 were written in once, and the blocks 3 and 4 reached the rewrite threshold;
After the final write, the blocks 2 and 5 reached the rewrite threshold as well, so the blocks that should be diagnosed are 2, 3, 4 and 5.
Blocks 2, 3, 4 and 5 form one consequent segment [2, 5].

For blockCount = 10, writes = [[0, 4], [3, 5], [2, 6]], and threshold = 3, the output should be
blockStorageRewrites(blockCount, writes, threshold) = [[3, 4]]

For blockCount = 10, writes = [[3, 4], [0, 1], [6, 6]], and threshold = 1, the output should be
blockStorageRewrites(blockCount, writes, threshold) = [[0, 1], [3, 4], [6, 6]]

Constraints

1 ≤ blockCount ≤ 10**5

0 ≤ writes.length ≤ 10**5

writes[i].length = 2

0 ≤ writes[i][0] ≤ writes[i][1] < blockCount

First try

from itertools import groupby

def group_blocks(num_writes, threshold):
 i = 0
 for key, group in groupby(num_writes, lambda x : x >= threshold):
 # This is faster compared to len(list(g))
 length = sum(1 for _ in group)
 if key:
 yield [i, i + length - 1]
 i += length

def blockStorageRewrites(blockCount, writes, threshold):
 num_writes = [0] * blockCount
 for lower, upper in writes:
 for i in range(lower, upper + 1):
 num_writes[i] += 1
 return list(group_blocks(num_writes, threshold))

Here I just do exactly what is asked, I create an array of blockCount size, loop over the writes and lastly group the consecutive ranges with itertoos.groupby

After trying to optimize

I'm not quite sure what the complexity was, but I tried to lessen the load, yet I still didn't pass the TLE constraints

def get_bad_blocks(blockCount, writes, threshold):
 cons = False
 u = l = -1
 for i in range(blockCount):
 count = 0
 for lower, upper in writes:
 if lower <= i <= upper:
 count += 1
 if count >= threshold:
 if not cons:
 cons = True
 l = i
 u = -1
 break
 else:
 u = i - 1
 cons = False

 if u != -1 and l != -1:
 yield [l, u]
 l = -1
 if cons:
 yield [l, i]


def blockStorageRewrites(blockCount, writes, threshold): 
 return list(get_bad_blocks(blockCount, writes, threshold))

Questions

You can review any and all, but preferably I'm looking for answers that:

Tell me if my first example is readable

I'm less concerned about readability in my second try, and more interested in speed

Please ignore the PEP8 naming violations as this is an issue with the programming challenge site

python python-3.x programming-challenge time-limit-exceeded

edited Mar 13 at 16:09

asked Mar 13 at 15:33

Ludisposed

8,84422267

edited Mar 13 at 16:09

asked Mar 13 at 15:33

Ludisposed

8,84422267

edited Mar 13 at 16:09

asked Mar 13 at 15:33

Ludisposed

8,84422267

asked Mar 13 at 15:33

Ludisposed

8,84422267

asked Mar 13 at 15:33

Ludisposed

8,84422267

1

$begingroup$
I don't think it's possible to give an "answer" addressing the performance issue — your current algorithm is unsalvageable. Consider what would happen in a realistic scenario where blockCount=1000000. The very first thing you'd do is loop for i in range(1000000)! That cannot possibly be part of a valid solution. You need to come up with an algorithm that doesn't take O(blockCount) time.
$endgroup$
– Quuxplusone
Mar 13 at 15:47

$begingroup$
The comment "should be faster compared to len(list(g))" makes me think you should be doing some performance tests :)
$endgroup$
– Peilonrayz
Mar 13 at 15:51

$begingroup$
@Quuxplusone probably, but I fail to see how this can be done without looping over the entire blockcounts... hence the asking for a review.
$endgroup$
– Ludisposed
Mar 13 at 15:52

$begingroup$
@Peilonrayz I have tested that locally, it should be faster because it doesn't have to construct the list as far as I can tell
$endgroup$
– Ludisposed
Mar 13 at 15:53

2

$begingroup$
I don't have time to check now, but my guess is that you would want to use a (default)dict with only those blocks that are encountered in writes. If it is a starting block, add +1, if it's the end block, subtract 1. Then go over the values with itertool.accumulate to get the numbers of writes between blocks.
$endgroup$
– Georgy
Mar 13 at 16:10

|
show 4 more comments

1

$begingroup$
I don't think it's possible to give an "answer" addressing the performance issue — your current algorithm is unsalvageable. Consider what would happen in a realistic scenario where blockCount=1000000. The very first thing you'd do is loop for i in range(1000000)! That cannot possibly be part of a valid solution. You need to come up with an algorithm that doesn't take O(blockCount) time.
$endgroup$
– Quuxplusone
Mar 13 at 15:47

$begingroup$
The comment "should be faster compared to len(list(g))" makes me think you should be doing some performance tests :)
$endgroup$
– Peilonrayz
Mar 13 at 15:51

$begingroup$
@Quuxplusone probably, but I fail to see how this can be done without looping over the entire blockcounts... hence the asking for a review.
$endgroup$
– Ludisposed
Mar 13 at 15:52

$begingroup$
@Peilonrayz I have tested that locally, it should be faster because it doesn't have to construct the list as far as I can tell
$endgroup$
– Ludisposed
Mar 13 at 15:53

2

$begingroup$
I don't have time to check now, but my guess is that you would want to use a (default)dict with only those blocks that are encountered in writes. If it is a starting block, add +1, if it's the end block, subtract 1. Then go over the values with itertool.accumulate to get the numbers of writes between blocks.
$endgroup$
– Georgy
Mar 13 at 16:10

I don't think it's possible to give an "answer" addressing the performance issue — your current algorithm is unsalvageable. Consider what would happen in a realistic scenario where blockCount=1000000. The very first thing you'd do is loop for i in range(1000000)! That cannot possibly be part of a valid solution. You need to come up with an algorithm that doesn't take O(blockCount) time.

– Quuxplusone
Mar 13 at 15:47

The comment "should be faster compared to len(list(g))" makes me think you should be doing some performance tests :)

– Peilonrayz
Mar 13 at 15:51

@Quuxplusone probably, but I fail to see how this can be done without looping over the entire blockcounts... hence the asking for a review.

– Ludisposed
Mar 13 at 15:52

@Peilonrayz I have tested that locally, it should be faster because it doesn't have to construct the list as far as I can tell

– Ludisposed
Mar 13 at 15:53

I don't have time to check now, but my guess is that you would want to use a (default)dict with only those blocks that are encountered in writes. If it is a starting block, add +1, if it's the end block, subtract 1. Then go over the values with itertool.accumulate to get the numbers of writes between blocks.

– Georgy
Mar 13 at 16:10

|
show 4 more comments

1 Answer
1

active

oldest

votes

First off when you're optimizing code you need to know what to optimize. At first I thought the problem code was not the groupby, but instead the creation of num_writes. And so I changed your code to be able to find the performance of it.

import cProfile
import random

from itertools import groupby

def group_blocks(num_writes, threshold):
 i = 0
 for key, group in groupby(num_writes, lambda x : x >= threshold):
 # This is faster compared to len(list(g))
 length = sum(1 for _ in group)
 if key:
 yield [i, i + length - 1]
 i += length


def build_writes(block_count, writes):
 num_writes = [0] * block_count
 for lower, upper in writes:
 for i in range(lower, upper + 1):
 num_writes[i] += 1
 return num_writes


def blockStorageRewrites(blockCount, writes, threshold):
 num_writes = build_writes(blockCount, writes)
 return list(group_blocks(num_writes, threshold))


block_count = 10**5
writes = []
for _ in range(10**4):
 a = random.randrange(0, block_count)
 b = random.randrange(a, block_count)
 writes.append([a, b])


cProfile.run('blockStorageRewrites(block_count, writes, 10**4)')

Resulting in the following profile:

 200008 function calls in 25.377 seconds

 Ordered by: standard name

 ncalls tottime percall cumtime percall filename:lineno(function)
 1 0.002 0.002 25.377 25.377 <string>:1(<module>)
 100001 0.019 0.000 0.025 0.000 lus.py:10(<genexpr>)
 1 25.342 25.342 25.342 25.342 lus.py:16(build_writes)
 1 0.000 0.000 25.375 25.375 lus.py:24(blockStorageRewrites)
 1 0.000 0.000 0.033 0.033 lus.py:6(group_blocks)
 100000 0.007 0.000 0.007 0.000 lus.py:8(<lambda>)
 1 0.000 0.000 25.377 25.377 built-in method builtins.exec
 1 0.007 0.007 0.033 0.033 built-in method builtins.sum
 1 0.000 0.000 0.000 0.000 method 'disable' of '_lsprof.Profiler' objects

Changing the code as per Georgy's comment to:

def build_writes(block_count, writes):
 num_writes = dict(enumerate([0] * block_count))
 for lower, upper in writes:
 num_writes[lower] += 1
 num_writes[upper] -= 1
 return list(accumulate(num_writes))

Gets the following profile, which is orders of magnitude faster:

 200011 function calls in 0.066 seconds

 Ordered by: standard name

 ncalls tottime percall cumtime percall filename:lineno(function)
 1 0.002 0.002 0.066 0.066 <string>:1(<module>)
 100002 0.021 0.000 0.028 0.000 lus.py:10(<genexpr>)
 1 0.025 0.025 0.025 0.025 lus.py:16(build_writes)
 1 0.003 0.003 0.064 0.064 lus.py:24(blockStorageRewrites)
 2 0.000 0.000 0.036 0.018 lus.py:6(group_blocks)
 100000 0.008 0.000 0.008 0.000 lus.py:8(<lambda>)
 1 0.000 0.000 0.066 0.066 built-in method builtins.exec
 2 0.008 0.004 0.036 0.018 built-in method builtins.sum
 1 0.000 0.000 0.000 0.000 method 'disable' of '_lsprof.Profiler' objects

answered Mar 13 at 16:17

Peilonrayz

26k338110

1

$begingroup$
The return list(accumulate(num_writes)) line should be return list(accumulate(num_writes.values())) else I'd be accumulating the keys. Other then that, this is a great teaching moment for me. I should know what the bottlenecks are before optimization :)
$endgroup$
– Ludisposed
Mar 14 at 8:13

1

$begingroup$
And I did had to do (block_count + 1) and num_writes[upper + 1] -= 1 to make the upper bound work correctly
$endgroup$
– Ludisposed
Mar 14 at 8:17

$begingroup$
@Ludisposed Ah oops :( My main aim was to show how to go about improving performance, so I didn't test at all.
$endgroup$
– Peilonrayz
Mar 14 at 14:06

$begingroup$
No problem, as the "You should have profiled the code before starting to optimize" was the (for me) important part of the answer. I do wonder did you find any other mistakes, simplifications or just quit after profiling?
$endgroup$
– Ludisposed
Mar 14 at 14:14

1

$begingroup$
@Ludisposed I quit after profiling, but the first code looks good otherwise. If there wasn't a TLE then I'd say the original code is good and say to stop there. If there's another TLE then I'd look into it more, but I think 25 seconds -> 0.5 would mean there isn't one.
$endgroup$
– Peilonrayz
Mar 14 at 14:19

|
show 2 more comments

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f215355%2fblock-storage-rewrites%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

import cProfile
import random

from itertools import groupby

def group_blocks(num_writes, threshold):
 i = 0
 for key, group in groupby(num_writes, lambda x : x >= threshold):
 # This is faster compared to len(list(g))
 length = sum(1 for _ in group)
 if key:
 yield [i, i + length - 1]
 i += length


def build_writes(block_count, writes):
 num_writes = [0] * block_count
 for lower, upper in writes:
 for i in range(lower, upper + 1):
 num_writes[i] += 1
 return num_writes


def blockStorageRewrites(blockCount, writes, threshold):
 num_writes = build_writes(blockCount, writes)
 return list(group_blocks(num_writes, threshold))


block_count = 10**5
writes = []
for _ in range(10**4):
 a = random.randrange(0, block_count)
 b = random.randrange(a, block_count)
 writes.append([a, b])


cProfile.run('blockStorageRewrites(block_count, writes, 10**4)')

Resulting in the following profile:

 200008 function calls in 25.377 seconds

 Ordered by: standard name

 ncalls tottime percall cumtime percall filename:lineno(function)
 1 0.002 0.002 25.377 25.377 <string>:1(<module>)
 100001 0.019 0.000 0.025 0.000 lus.py:10(<genexpr>)
 1 25.342 25.342 25.342 25.342 lus.py:16(build_writes)
 1 0.000 0.000 25.375 25.375 lus.py:24(blockStorageRewrites)
 1 0.000 0.000 0.033 0.033 lus.py:6(group_blocks)
 100000 0.007 0.000 0.007 0.000 lus.py:8(<lambda>)
 1 0.000 0.000 25.377 25.377 built-in method builtins.exec
 1 0.007 0.007 0.033 0.033 built-in method builtins.sum
 1 0.000 0.000 0.000 0.000 method 'disable' of '_lsprof.Profiler' objects

Changing the code as per Georgy's comment to:

def build_writes(block_count, writes):
 num_writes = dict(enumerate([0] * block_count))
 for lower, upper in writes:
 num_writes[lower] += 1
 num_writes[upper] -= 1
 return list(accumulate(num_writes))

Gets the following profile, which is orders of magnitude faster:

 200011 function calls in 0.066 seconds

 Ordered by: standard name

 ncalls tottime percall cumtime percall filename:lineno(function)
 1 0.002 0.002 0.066 0.066 <string>:1(<module>)
 100002 0.021 0.000 0.028 0.000 lus.py:10(<genexpr>)
 1 0.025 0.025 0.025 0.025 lus.py:16(build_writes)
 1 0.003 0.003 0.064 0.064 lus.py:24(blockStorageRewrites)
 2 0.000 0.000 0.036 0.018 lus.py:6(group_blocks)
 100000 0.008 0.000 0.008 0.000 lus.py:8(<lambda>)
 1 0.000 0.000 0.066 0.066 built-in method builtins.exec
 2 0.008 0.004 0.036 0.018 built-in method builtins.sum
 1 0.000 0.000 0.000 0.000 method 'disable' of '_lsprof.Profiler' objects

answered Mar 13 at 16:17

Peilonrayz

26k338110

1

$begingroup$
The return list(accumulate(num_writes)) line should be return list(accumulate(num_writes.values())) else I'd be accumulating the keys. Other then that, this is a great teaching moment for me. I should know what the bottlenecks are before optimization :)
$endgroup$
– Ludisposed
Mar 14 at 8:13

1

$begingroup$
And I did had to do (block_count + 1) and num_writes[upper + 1] -= 1 to make the upper bound work correctly
$endgroup$
– Ludisposed
Mar 14 at 8:17

$begingroup$
@Ludisposed Ah oops :( My main aim was to show how to go about improving performance, so I didn't test at all.
$endgroup$
– Peilonrayz
Mar 14 at 14:06

$begingroup$
No problem, as the "You should have profiled the code before starting to optimize" was the (for me) important part of the answer. I do wonder did you find any other mistakes, simplifications or just quit after profiling?
$endgroup$
– Ludisposed
Mar 14 at 14:14

1

$begingroup$
@Ludisposed I quit after profiling, but the first code looks good otherwise. If there wasn't a TLE then I'd say the original code is good and say to stop there. If there's another TLE then I'd look into it more, but I think 25 seconds -> 0.5 would mean there isn't one.
$endgroup$
– Peilonrayz
Mar 14 at 14:19

|
show 2 more comments

import cProfile
import random

from itertools import groupby

def group_blocks(num_writes, threshold):
 i = 0
 for key, group in groupby(num_writes, lambda x : x >= threshold):
 # This is faster compared to len(list(g))
 length = sum(1 for _ in group)
 if key:
 yield [i, i + length - 1]
 i += length


def build_writes(block_count, writes):
 num_writes = [0] * block_count
 for lower, upper in writes:
 for i in range(lower, upper + 1):
 num_writes[i] += 1
 return num_writes


def blockStorageRewrites(blockCount, writes, threshold):
 num_writes = build_writes(blockCount, writes)
 return list(group_blocks(num_writes, threshold))


block_count = 10**5
writes = []
for _ in range(10**4):
 a = random.randrange(0, block_count)
 b = random.randrange(a, block_count)
 writes.append([a, b])


cProfile.run('blockStorageRewrites(block_count, writes, 10**4)')

Resulting in the following profile:

 200008 function calls in 25.377 seconds

 Ordered by: standard name

 ncalls tottime percall cumtime percall filename:lineno(function)
 1 0.002 0.002 25.377 25.377 <string>:1(<module>)
 100001 0.019 0.000 0.025 0.000 lus.py:10(<genexpr>)
 1 25.342 25.342 25.342 25.342 lus.py:16(build_writes)
 1 0.000 0.000 25.375 25.375 lus.py:24(blockStorageRewrites)
 1 0.000 0.000 0.033 0.033 lus.py:6(group_blocks)
 100000 0.007 0.000 0.007 0.000 lus.py:8(<lambda>)
 1 0.000 0.000 25.377 25.377 built-in method builtins.exec
 1 0.007 0.007 0.033 0.033 built-in method builtins.sum
 1 0.000 0.000 0.000 0.000 method 'disable' of '_lsprof.Profiler' objects

Changing the code as per Georgy's comment to:

def build_writes(block_count, writes):
 num_writes = dict(enumerate([0] * block_count))
 for lower, upper in writes:
 num_writes[lower] += 1
 num_writes[upper] -= 1
 return list(accumulate(num_writes))

Gets the following profile, which is orders of magnitude faster:

 200011 function calls in 0.066 seconds

 Ordered by: standard name

 ncalls tottime percall cumtime percall filename:lineno(function)
 1 0.002 0.002 0.066 0.066 <string>:1(<module>)
 100002 0.021 0.000 0.028 0.000 lus.py:10(<genexpr>)
 1 0.025 0.025 0.025 0.025 lus.py:16(build_writes)
 1 0.003 0.003 0.064 0.064 lus.py:24(blockStorageRewrites)
 2 0.000 0.000 0.036 0.018 lus.py:6(group_blocks)
 100000 0.008 0.000 0.008 0.000 lus.py:8(<lambda>)
 1 0.000 0.000 0.066 0.066 built-in method builtins.exec
 2 0.008 0.004 0.036 0.018 built-in method builtins.sum
 1 0.000 0.000 0.000 0.000 method 'disable' of '_lsprof.Profiler' objects

answered Mar 13 at 16:17

Peilonrayz

26k338110

1

$begingroup$
The return list(accumulate(num_writes)) line should be return list(accumulate(num_writes.values())) else I'd be accumulating the keys. Other then that, this is a great teaching moment for me. I should know what the bottlenecks are before optimization :)
$endgroup$
– Ludisposed
Mar 14 at 8:13

1

$begingroup$
And I did had to do (block_count + 1) and num_writes[upper + 1] -= 1 to make the upper bound work correctly
$endgroup$
– Ludisposed
Mar 14 at 8:17

$begingroup$
@Ludisposed Ah oops :( My main aim was to show how to go about improving performance, so I didn't test at all.
$endgroup$
– Peilonrayz
Mar 14 at 14:06

$begingroup$
No problem, as the "You should have profiled the code before starting to optimize" was the (for me) important part of the answer. I do wonder did you find any other mistakes, simplifications or just quit after profiling?
$endgroup$
– Ludisposed
Mar 14 at 14:14

1

$begingroup$
@Ludisposed I quit after profiling, but the first code looks good otherwise. If there wasn't a TLE then I'd say the original code is good and say to stop there. If there's another TLE then I'd look into it more, but I think 25 seconds -> 0.5 would mean there isn't one.
$endgroup$
– Peilonrayz
Mar 14 at 14:19

|
show 2 more comments

import cProfile
import random

from itertools import groupby

def group_blocks(num_writes, threshold):
 i = 0
 for key, group in groupby(num_writes, lambda x : x >= threshold):
 # This is faster compared to len(list(g))
 length = sum(1 for _ in group)
 if key:
 yield [i, i + length - 1]
 i += length


def build_writes(block_count, writes):
 num_writes = [0] * block_count
 for lower, upper in writes:
 for i in range(lower, upper + 1):
 num_writes[i] += 1
 return num_writes


def blockStorageRewrites(blockCount, writes, threshold):
 num_writes = build_writes(blockCount, writes)
 return list(group_blocks(num_writes, threshold))


block_count = 10**5
writes = []
for _ in range(10**4):
 a = random.randrange(0, block_count)
 b = random.randrange(a, block_count)
 writes.append([a, b])


cProfile.run('blockStorageRewrites(block_count, writes, 10**4)')

Resulting in the following profile:

 200008 function calls in 25.377 seconds

 Ordered by: standard name

 ncalls tottime percall cumtime percall filename:lineno(function)
 1 0.002 0.002 25.377 25.377 <string>:1(<module>)
 100001 0.019 0.000 0.025 0.000 lus.py:10(<genexpr>)
 1 25.342 25.342 25.342 25.342 lus.py:16(build_writes)
 1 0.000 0.000 25.375 25.375 lus.py:24(blockStorageRewrites)
 1 0.000 0.000 0.033 0.033 lus.py:6(group_blocks)
 100000 0.007 0.000 0.007 0.000 lus.py:8(<lambda>)
 1 0.000 0.000 25.377 25.377 built-in method builtins.exec
 1 0.007 0.007 0.033 0.033 built-in method builtins.sum
 1 0.000 0.000 0.000 0.000 method 'disable' of '_lsprof.Profiler' objects

Changing the code as per Georgy's comment to:

def build_writes(block_count, writes):
 num_writes = dict(enumerate([0] * block_count))
 for lower, upper in writes:
 num_writes[lower] += 1
 num_writes[upper] -= 1
 return list(accumulate(num_writes))

Gets the following profile, which is orders of magnitude faster:

 200011 function calls in 0.066 seconds

 Ordered by: standard name

 ncalls tottime percall cumtime percall filename:lineno(function)
 1 0.002 0.002 0.066 0.066 <string>:1(<module>)
 100002 0.021 0.000 0.028 0.000 lus.py:10(<genexpr>)
 1 0.025 0.025 0.025 0.025 lus.py:16(build_writes)
 1 0.003 0.003 0.064 0.064 lus.py:24(blockStorageRewrites)
 2 0.000 0.000 0.036 0.018 lus.py:6(group_blocks)
 100000 0.008 0.000 0.008 0.000 lus.py:8(<lambda>)
 1 0.000 0.000 0.066 0.066 built-in method builtins.exec
 2 0.008 0.004 0.036 0.018 built-in method builtins.sum
 1 0.000 0.000 0.000 0.000 method 'disable' of '_lsprof.Profiler' objects

answered Mar 13 at 16:17

Peilonrayz

26k338110

import cProfile
import random

from itertools import groupby

def group_blocks(num_writes, threshold):
 i = 0
 for key, group in groupby(num_writes, lambda x : x >= threshold):
 # This is faster compared to len(list(g))
 length = sum(1 for _ in group)
 if key:
 yield [i, i + length - 1]
 i += length


def build_writes(block_count, writes):
 num_writes = [0] * block_count
 for lower, upper in writes:
 for i in range(lower, upper + 1):
 num_writes[i] += 1
 return num_writes


def blockStorageRewrites(blockCount, writes, threshold):
 num_writes = build_writes(blockCount, writes)
 return list(group_blocks(num_writes, threshold))


block_count = 10**5
writes = []
for _ in range(10**4):
 a = random.randrange(0, block_count)
 b = random.randrange(a, block_count)
 writes.append([a, b])


cProfile.run('blockStorageRewrites(block_count, writes, 10**4)')

Resulting in the following profile:

 200008 function calls in 25.377 seconds

 Ordered by: standard name

 ncalls tottime percall cumtime percall filename:lineno(function)
 1 0.002 0.002 25.377 25.377 <string>:1(<module>)
 100001 0.019 0.000 0.025 0.000 lus.py:10(<genexpr>)
 1 25.342 25.342 25.342 25.342 lus.py:16(build_writes)
 1 0.000 0.000 25.375 25.375 lus.py:24(blockStorageRewrites)
 1 0.000 0.000 0.033 0.033 lus.py:6(group_blocks)
 100000 0.007 0.000 0.007 0.000 lus.py:8(<lambda>)
 1 0.000 0.000 25.377 25.377 built-in method builtins.exec
 1 0.007 0.007 0.033 0.033 built-in method builtins.sum
 1 0.000 0.000 0.000 0.000 method 'disable' of '_lsprof.Profiler' objects

Changing the code as per Georgy's comment to:

def build_writes(block_count, writes):
 num_writes = dict(enumerate([0] * block_count))
 for lower, upper in writes:
 num_writes[lower] += 1
 num_writes[upper] -= 1
 return list(accumulate(num_writes))

Gets the following profile, which is orders of magnitude faster:

 200011 function calls in 0.066 seconds

 Ordered by: standard name

 ncalls tottime percall cumtime percall filename:lineno(function)
 1 0.002 0.002 0.066 0.066 <string>:1(<module>)
 100002 0.021 0.000 0.028 0.000 lus.py:10(<genexpr>)
 1 0.025 0.025 0.025 0.025 lus.py:16(build_writes)
 1 0.003 0.003 0.064 0.064 lus.py:24(blockStorageRewrites)
 2 0.000 0.000 0.036 0.018 lus.py:6(group_blocks)
 100000 0.008 0.000 0.008 0.000 lus.py:8(<lambda>)
 1 0.000 0.000 0.066 0.066 built-in method builtins.exec
 2 0.008 0.004 0.036 0.018 built-in method builtins.sum
 1 0.000 0.000 0.000 0.000 method 'disable' of '_lsprof.Profiler' objects

answered Mar 13 at 16:17

Peilonrayz

26k338110

answered Mar 13 at 16:17

Peilonrayz

26k338110

answered Mar 13 at 16:17

Peilonrayz

26k338110

answered Mar 13 at 16:17

Peilonrayz

26k338110

1

$begingroup$
The return list(accumulate(num_writes)) line should be return list(accumulate(num_writes.values())) else I'd be accumulating the keys. Other then that, this is a great teaching moment for me. I should know what the bottlenecks are before optimization :)
$endgroup$
– Ludisposed
Mar 14 at 8:13

1

$begingroup$
And I did had to do (block_count + 1) and num_writes[upper + 1] -= 1 to make the upper bound work correctly
$endgroup$
– Ludisposed
Mar 14 at 8:17

$begingroup$
@Ludisposed Ah oops :( My main aim was to show how to go about improving performance, so I didn't test at all.
$endgroup$
– Peilonrayz
Mar 14 at 14:06

$begingroup$
No problem, as the "You should have profiled the code before starting to optimize" was the (for me) important part of the answer. I do wonder did you find any other mistakes, simplifications or just quit after profiling?
$endgroup$
– Ludisposed
Mar 14 at 14:14

1

$begingroup$
@Ludisposed I quit after profiling, but the first code looks good otherwise. If there wasn't a TLE then I'd say the original code is good and say to stop there. If there's another TLE then I'd look into it more, but I think 25 seconds -> 0.5 would mean there isn't one.
$endgroup$
– Peilonrayz
Mar 14 at 14:19

|
show 2 more comments

1

$begingroup$
The return list(accumulate(num_writes)) line should be return list(accumulate(num_writes.values())) else I'd be accumulating the keys. Other then that, this is a great teaching moment for me. I should know what the bottlenecks are before optimization :)
$endgroup$
– Ludisposed
Mar 14 at 8:13

1

$begingroup$
And I did had to do (block_count + 1) and num_writes[upper + 1] -= 1 to make the upper bound work correctly
$endgroup$
– Ludisposed
Mar 14 at 8:17

$begingroup$
@Ludisposed Ah oops :( My main aim was to show how to go about improving performance, so I didn't test at all.
$endgroup$
– Peilonrayz
Mar 14 at 14:06

$begingroup$
No problem, as the "You should have profiled the code before starting to optimize" was the (for me) important part of the answer. I do wonder did you find any other mistakes, simplifications or just quit after profiling?
$endgroup$
– Ludisposed
Mar 14 at 14:14

1

$begingroup$
@Ludisposed I quit after profiling, but the first code looks good otherwise. If there wasn't a TLE then I'd say the original code is good and say to stop there. If there's another TLE then I'd look into it more, but I think 25 seconds -> 0.5 would mean there isn't one.
$endgroup$
– Peilonrayz
Mar 14 at 14:19

The return list(accumulate(num_writes)) line should be return list(accumulate(num_writes.values())) else I'd be accumulating the keys. Other then that, this is a great teaching moment for me. I should know what the bottlenecks are before optimization :)

– Ludisposed
Mar 14 at 8:13

And I did had to do (block_count + 1) and num_writes[upper + 1] -= 1 to make the upper bound work correctly

– Ludisposed
Mar 14 at 8:17

@Ludisposed Ah oops :( My main aim was to show how to go about improving performance, so I didn't test at all.

– Peilonrayz
Mar 14 at 14:06

No problem, as the "You should have profiled the code before starting to optimize" was the (for me) important part of the answer. I do wonder did you find any other mistakes, simplifications or just quit after profiling?

– Ludisposed
Mar 14 at 14:14

@Ludisposed I quit after profiling, but the first code looks good otherwise. If there wasn't a TLE then I'd say the original code is good and say to stop there. If there's another TLE then I'd look into it more, but I think 25 seconds -> 0.5 would mean there isn't one.

– Peilonrayz
Mar 14 at 14:19

|
show 2 more comments

draft saved

draft discarded

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

8hc0QTl Vzr3Mns7XtB488EsTx0uE2

搜尋此網誌

Fdtxjr

Summary

Examples

Constraints

First try

After trying to optimize

Questions

Summary

Examples

Constraints

First try

After trying to optimize

Questions

Summary

Examples

Constraints

First try

After trying to optimize

Questions

Summary

Examples

Constraints

First try

After trying to optimize

Questions

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Football at the 1986 Brunei Merdeka Games Contents Teams Group stage Knockout stage References Navigation menu"Brunei Merdeka Games 1986".

Solar Wings Breeze Design and development Specifications (Breeze) References Navigation menu1368-485X"Hang glider: Breeze (Solar Wings)"e

Summary

Examples

Constraints

First try

After trying to optimize

Questions

Summary

Examples

Constraints

First try

After trying to optimize

Questions

Summary

Examples

Constraints

First try

After trying to optimize

Questions

Summary

Examples

Constraints

First try

After trying to optimize

Questions

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Football at the 1986 Brunei Merdeka Games Contents Teams Group stage Knockout stage References Navigation menu"Brunei Merdeka Games 1986".

Solar Wings Breeze Design and development Specifications (Breeze) References Navigation menu1368-485X"Hang glider: Breeze (Solar Wings)"e

1 Answer
1

1 Answer
1

1 Answer
1