Wednesday 7 August 2013

Python: Looping over 2mln lines

Python: Looping over 2mln lines

I have to loop over large file with 2mln lines, that looks like this
P61981 1433G_HUMAN
P61982 1433G_MOUSE
Q5RC20 1433G_PONAB
P61983 1433G_RAT
P68253 1433G_SHEEP
Currently I have the following function, that take the every entry in the
list, and if the entry in this large file - it took the row with the
occurence, but it's slow (~10min). Probably due to the looping scheme, can
you please suggest optimization?
up = "database.txt"
def mplist(somelist):
newlist = []
with open(up) as U:
for row in U:
for i in somelist:
if i in row:
newlist.append(row)
return newlist

No comments:

Post a Comment