Replace a list of characters with indices in a string in python

Instead of string concatenation (wich is wasteful due to created / destroyed string instances), use a list:

coordinates = [[1,5], [10,15], [25, 35]] # sorted

line = 'ATCACGTGTGTGTACACGTACGTGTGNGTNGTTGAGTGKWSGTGAAAAAKCT'

result = list(line)
# opted for exclusive end pos
for r in [range(start,end) for start,end in coordinates]:
    for p in r:
        result[p]='N'

res = ''.join(result)
print(res)

To get:

ANNNNGTGTGNNNNNACGTACGTGTNNNNNNNNNNGTGKWSGTGAAAAAKCT

optimized to use slicing and exclusive end:

for start,end in coordinates:
    result[start:end] = ["N"]*(end-start)

res = ''.join(result)
print(line)
print(res)

gives you your wanted output:

ATCACGTGTGTGTACACGTACGTGTGNGTNGTTGAGTGKWSGTGAAAAAKCT 
ANNNNGTGTGNNNNNACGTACGTGTNNNNNNNNNNGTGKWSGTGAAAAAKCT

Good question, this should work.

coordinates = [[1,5], [10,15], [25, 35]]
line = 'ATCACGTGTGTGTACACGTACGTGTGNGTNGTTGAGTGKWSGTGAAAAAKCT'
for L,R in coordinates:
    line = line[:L] + "N"*(R-L) + line[R:]
print(line)

You may need to adjust this depending on how the coordinates are defined, eg. inclusive/1-indexed.

We need more people working with DNA, so great work.

Tags:

Python