To get the most out of the following blog post, first try writing your partition function, which does the same thing as Data.List.partition.
*Partition> partition even [1..10]
([2,4,6,8,10],[1,3,5,7,9])
Then write a testing function, tpf, which checks your partition function against a variety of input, including large or infinite input. (Which means we don’t just use quickcheck).
Then read the blog post to see what obstacles I hit when I went through this process.
source here if you’re irked by wordpress aggressively cutting off the right side of the page.
{-# LANGUAGE NoMonomorphismRestriction #-}
module Partition where
import Debug.Trace.Helpers
import qualified Data.List as DL
import Debug.Trace
-- first attempt
partition f [] = ([],[])
partition f (x:xs) =
let (as,bs) = partition f xs
in if (f x) then (x:as,bs) else (as,x:bs)
t1 = tpf partition
-- testing function.
tpf partif =
let a = last . snd . partif even $ [1..(10^6)]
b = head . snd . partif even $ [1,2..]
c = last . snd . partif even $ [1..(10^8)]
in (a,b,c)
-- sanity checks / benchmarks. our own code should be at least as fast as Data.List versions.
-- a is immediate, b is immediate, c takes about 10 seconds.
tDL = tpf DL.partition
-- a test and b test are fine but that 10^8 list seems to take forever.
-- let's think about this. What algorithm are we using?
-- Actually, I don't know the answer to that.
-- but looking at that let/in, I don't have a clear sense of crunching through a list and building up a result, which is usually the ideal.
-- So let's try this with a fold.
-- Next question, what kind of fold?
-- Easy answer: just try both and see what happens!
-- Once you've written one, you have the other just by flipping the arguments in the helper function
-- (Note: for the foldl, we try strict foldl' by default, as there is usually no reason to use lazy foldl:
-- http://haskell.org/haskellwiki/Foldr_Foldl_Foldl'
partitionR f xs = foldr g ([],[]) xs
where g next accum@(as,bs) = if f next then (next:as,bs) else (as,next:bs)
partitionL' f xs = DL.foldl' g ([],[]) $ reverse xs
where g accum@(as,bs) next = if f next then (next:as,bs) else (as,next:bs)
-- Harder answer, which requires thought: we probably want a foldr here, because the b test uses an infinite list and a strict fold
-- won't return anything if you feed it an infinite list.
-- OK, let's try foldr
t2 = tpf partitionR -- result: immediate stack overflow on the a test, and if you try them separately, it overflows all 3 tests.
-- that's kind of sad.
-- for no particular reason, let's try a strict foldl version (warning! be careful running this!)
t3 = tpf partitionL'
-- at least the a test succeeds after a few seconds, confirming the intuition that strict foldl' crunches through a list
-- and returns output after it's done.
-- the b test exhausts computer memory, eventually the fan starts whirring louder than I can think, mouse stops responding and have to reboot
-- so like I said, careful running this, and hit control c!
--okay, I give up, let's peek at the standard libs (SL) Data.List
-- this isn't word for word what the ghc standard libs Data.List has, so as to be more similar to the functions we wrote so far,
-- but it's the same algo, and same performance.
-- the important difference is the lazy match
partitionSL p xs = foldr g ([],[]) xs
where g next accum = if (p next) then (next:fst accum,snd accum) else (fst accum, next:snd accum)
-- which is the same as below, using irrefutable pattern syntax:
-- g next ~accum@(as,bs) = if (p next) then (next:as,bs) else (as, next:bs)
-- Sure enough, same performance as Data.List.Partition
-- Note: to get the same performance as lazy list, you need to compile the module before loading in ghci, eg
-- ghc --make PartitionM.hs, then ghci PartitionM
-- if you do ghci PartitionM.hs, this doesn't terminate (at least, not in under a minute I'm an impatient guy.)
tSL = tpf partitionSL
-- So, why does partition need an irrefutable pattern in the accum argument?
-- Let's look at the original right folded partition again, rewritten as follows, for a 3 element list
partitionR2 f xs = foldr (select f) ([],[]) xs
select f next accum@(as,bs) = if f next then (next:as,bs) else (as,next:bs)
-- select' g next accum = if (p next) then (next:fst accum,snd accum) else (fst accum, next:snd accum)
tpr2 :: ([Int],[Int])
tpr2 = partitionR even [1..3]
tpr2a = 1 `g` ( 2 `g` ( 3 `g` ([],[]) ) )
g = select even
-- when the accum argument to select is a strict pattern match, the algorithm can't proceed past the first element
-- without evaluating everything inside the first set of parenthesis.
-- For a million element list, this is going to be a problem.
-- if the accum arg is lazy (just accum without the @ binding), the algorithm can proceed to the if test,
-- calculate the tuple with the list parts as just a thunk to be evaluabed later, and keep chugging along,
-- by calculating the list parts next.
Read Full Post »