Can this code be changed to run faster?

I'm concentrating on the calculation of samplemanycyclesper5years and samplecycledistributions. For the first one, you select 1826 samples randomly and calculate the total. This is done 10^7 times. We can pack the random total into a compiled function that chooses 1826 random integer positions, accesses cyclesperday and calculates the total

rand = Compile[{{cycl, _Real, 1}, {i, _Integer, 0}},
   Module[{pos = RandomInteger[{1, Length[cycl]}, i]},
     Total[cycl[[pos]]]
    ],
   RuntimeAttributes -> {Listable},
   Parallelization -> True,
   RuntimeOptions -> "Speed"
   ];

The parameter i is how many random values of cycl should be totaled. In your case always 1826. Let's test this

rand[cyclesperday, Array[1826 &, 10^5]]; // AbsoluteTiming
(* {0.921493, Null} *)

and compare

ParallelTable[Total[RandomChoice[cyclesperday, 1826]], {10^5}]; // AbsoluteTiming
(* {5.89441, Null} *)

So this needs only 15% of the time your ParallelTable needs. The next step is to do the same for the estimation of the LogNormalDistribution. The estimation of the parameters is actually very simple with a maximum likelihood estimator and you can write this down yourself

maxLikelihood = Compile[{{values, _Real, 1}},
  Module[{μ = Mean[Log[values]]},
   {μ, Sqrt@Mean[(Log[values] - μ)^2]}
   ],
  RuntimeAttributes -> {Listable},
  Parallelization -> True
]

First a quick check:

EstimatedDistribution[cycledatatofit[[10]], 
 LogNormalDistribution[μ, σ]]
(* LogNormalDistribution[7.42205, 0.042639] *)

maxLikelihood[cycledatatofit[[10]]]
(* {7.42205, 0.042639} *)

Excellent. Now let's time it

samplecycledistributions = 
   ParallelTable[EstimatedDistribution[cycledatatofit[[i]], 
     LogNormalDistribution[μ, σ]], {i, 1, nc}]; // AbsoluteTiming
(* {15.8202, Null} *)

and

maxLikelihood[cycledatatofit]; // AbsoluteTiming
(* {0.167895, Null} *)

So this needs only 1% of the original time. Your complete calculation looks like this

nc = 10^5;
samplemanycyclesper5years = 
 rand[cyclesperday, Array[1826 &, 100*nc]];
cycledatatofit = Partition[samplemanycyclesper5years, 100];
samplecycledistributions = maxLikelihood[cycledatatofit];
cyclesamples = 
 Round[ParallelTable[
   RandomVariate[LogNormalDistribution @@ parms], {parms, 
    samplecycledistributions}]];

and I was able to bring it from 654 seconds to 53 seconds. I checked the final histograms and they match perfectly, but please verify each step yourself.

Process cyclesperday thus:

cyclesperday = Developer`ToPackedArray[cyclesperday, Real];

Your data gets loaded as a mix of Real and Integer values. To take best advantage of the CPU, the data should all be of the same type and in a packed array. The second argument causes the integer values to be converted to Real.

Then the rest of the code takes about 150 sec. to run. (I cannot tell you how long the original takes on my computer, because it ran out of memory. Packed arrays save memory, too.)

Can this code be changed to run faster?

Tags:

Random

Performance Tuning

Parallelization

Compile

Distributions

Related

Recent Posts