2010年12月15日水曜日

pythonのリストでunique(修正)

バイナリ辞書を作る処理でリストをsort+uniqueする必要があり、pythonでどうやろうか考えてみた。

sortする必要があるため、元のリストの順番を維持する必要がないので、itertools.groupbyを使って素直に出来た。そのあとset+sortでもっと素直に出来た…

ついでに順番の維持が必要な場合も考えた。ソート済みの配列に既にその要素があるか、または(性能的な理由から)setで覚えて確認するのがすぐ思いついたのでやってみた。

テストプログラム
import itertools
import hotshot, hotshot.stats
from random import randint


f1 = lambda l: [i for i, g in itertools.groupby(sorted(l))]

def f2(l):
    l2 = []
    for i in l:
        if not i in l2:
            l2.append(i)
    return l2

def f3(l):
    l2 = []
    s = set()
    for i in l:
        if not i in s:
            s.add(i)
            l2.append(i)
    return l2

f4 = lambda l: sorted(set(l))


for l in ([5,4,3,2,1,1,2,3,4,5,10,5,4,3,2,1,10] * 100000,
          [randint(0, x) for x in range(500)],
          [randint(0, x) for x in range(1000)],
          [randint(0, x) for x in range(10000)],
          [randint(0, x) for x in range(50000)]):
    print "orig:", l[:10], "size:", len(l)
    for f in (f1, f2, f3, f4):
        prof = hotshot.Profile('uniq.prof')
        l2 = prof.runcall(f, l)
        prof.close()
        print "result:", l2[:10], "size:", len(l2)
        stats = hotshot.stats.load('uniq.prof')
        stats.strip_dirs()
        stats.print_stats(5)


  • わざとuniqueな件数が減るようにしています(小さい数の方が多くなる)
  • f1とf4がsortedかつunique
  • f2とf3が順番を維持するunique
  • f1 sorted+groupby
  • f2 リストを使って追加済み要素をチェック
  • f3 setで追加済み要素をチェック
  • f4 setにしてsorted


結果要約
  • f4がやっぱり速い
  • f2は予想通り遅くなる
  • f1はそれほど悪くない気がするが、f4には負ける
表にしたもの

6/1700000 251/500 501/1000 5007/10000 24967/50000
f1 0.514 0 0 0.005 0.055
f2 0.303 0.001 0.004 0.468 11.787
f3 0.19 0 0 0.003 0.023
f4 0.069 0 0 0.001 0.014

結果全文
hideaki@siduxbox:~$ python uniq.py 
orig: [5, 4, 3, 2, 1, 1, 2, 3, 4, 5] size: 1700000
result: [1, 2, 3, 4, 5, 10] size: 6
         1 function calls in 0.514 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.514    0.514    0.514    0.514 uniq.py:6(<lambda>)
        0    0.000             0.000          profile:0(profiler)


result: [5, 4, 3, 2, 1, 10] size: 6
         1 function calls in 0.303 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        0    0.000             0.000          profile:0(profiler)
        1    0.303    0.303    0.303    0.303 uniq.py:8(f2)


result: [5, 4, 3, 2, 1, 10] size: 6
         1 function calls in 0.190 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        0    0.000             0.000          profile:0(profiler)
        1    0.190    0.190    0.190    0.190 uniq.py:15(f3)


result: [1, 2, 3, 4, 5, 10] size: 6
         1 function calls in 0.069 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        0    0.000             0.000          profile:0(profiler)
        1    0.069    0.069    0.069    0.069 uniq.py:24(<lambda>)


orig: [0, 1, 1, 3, 3, 4, 4, 0, 3, 8] size: 500
result: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] size: 251
         1 function calls in 0.000 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 uniq.py:6(<lambda>)
        0    0.000             0.000          profile:0(profiler)


result: [0, 1, 3, 4, 8, 11, 9, 12, 5, 16] size: 251
         1 function calls in 0.001 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        0    0.000             0.000          profile:0(profiler)
        1    0.001    0.001    0.001    0.001 uniq.py:8(f2)


result: [0, 1, 3, 4, 8, 11, 9, 12, 5, 16] size: 251
         1 function calls in 0.000 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        0    0.000             0.000          profile:0(profiler)
        1    0.000    0.000    0.000    0.000 uniq.py:15(f3)


result: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] size: 251
         1 function calls in 0.000 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        0    0.000             0.000          profile:0(profiler)
        1    0.000    0.000    0.000    0.000 uniq.py:24(<lambda>)


orig: [0, 1, 1, 0, 3, 3, 4, 7, 2, 8] size: 1000
result: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] size: 501
         1 function calls in 0.000 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 uniq.py:6(<lambda>)
        0    0.000             0.000          profile:0(profiler)


result: [0, 1, 3, 4, 7, 2, 8, 9, 5, 17] size: 501
         1 function calls in 0.004 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        0    0.000             0.000          profile:0(profiler)
        1    0.004    0.004    0.004    0.004 uniq.py:8(f2)


result: [0, 1, 3, 4, 7, 2, 8, 9, 5, 17] size: 501
         1 function calls in 0.000 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        0    0.000             0.000          profile:0(profiler)
        1    0.000    0.000    0.000    0.000 uniq.py:15(f3)


result: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] size: 501
         1 function calls in 0.000 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        0    0.000             0.000          profile:0(profiler)
        1    0.000    0.000    0.000    0.000 uniq.py:24(<lambda>)


orig: [0, 1, 1, 1, 0, 1, 4, 2, 5, 2] size: 10000
result: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] size: 5007
         1 function calls in 0.005 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.005    0.005    0.005    0.005 uniq.py:6(<lambda>)
        0    0.000             0.000          profile:0(profiler)


result: [0, 1, 4, 2, 5, 3, 14, 11, 16, 18] size: 5007
         1 function calls in 0.468 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        0    0.000             0.000          profile:0(profiler)
        1    0.468    0.468    0.468    0.468 uniq.py:8(f2)


result: [0, 1, 4, 2, 5, 3, 14, 11, 16, 18] size: 5007
         1 function calls in 0.003 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        0    0.000             0.000          profile:0(profiler)
        1    0.003    0.003    0.003    0.003 uniq.py:15(f3)


result: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] size: 5007
         1 function calls in 0.001 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        0    0.000             0.000          profile:0(profiler)
        1    0.001    0.001    0.001    0.001 uniq.py:24(<lambda>)


orig: [0, 0, 1, 1, 4, 3, 4, 6, 3, 8] size: 50000
result: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] size: 24967
         1 function calls in 0.055 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.055    0.055    0.055    0.055 uniq.py:6(<lambda>)
        0    0.000             0.000          profile:0(profiler)


result: [0, 1, 4, 3, 6, 8, 11, 5, 2, 19] size: 24967
         1 function calls in 11.787 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        0    0.000             0.000          profile:0(profiler)
        1   11.787   11.787   11.787   11.787 uniq.py:8(f2)


result: [0, 1, 4, 3, 6, 8, 11, 5, 2, 19] size: 24967
         1 function calls in 0.023 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        0    0.000             0.000          profile:0(profiler)
        1    0.023    0.023    0.023    0.023 uniq.py:15(f3)


result: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] size: 24967
         1 function calls in 0.014 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        0    0.000             0.000          profile:0(profiler)
        1    0.014    0.014    0.014    0.014 uniq.py:24(<lambda>)

0 件のコメント:

コメントを投稿