Fruit bagging factory

Python 3, 9796 bags

Building on Jonathan's answer:

import itertools as it

def powerset(iterable):
    s = list(iterable)
    return it.chain.from_iterable(it.combinations(s, r) for r in range(len(s)+1))

def f(a,m,l):
 r=[];b=[]
 while a:
  c =  min(list(powerset(a[:l])),key=lambda w: abs(sum(b)+sum(w)-m))
  if sum(c)==0:
   c = a[:l]
  b+=[a.pop(a.index(min(c,key=lambda w: abs(sum(b)+w-m))))]
  if sum(b)>=m:r+=[b];b=[]
 return r

This relies on powerset from itertool's cookbook. First finds optimal subset of the buffer based on minimizing difference from target weight for all subsets, and then picks an element out of this subset based on same criterion. If no optimal subset selects from the whole buffer.


Python 3, 9964 9981 bags

The idea of this solution is similar to those of Jonathan, JayCe and fortraan, but with a scoring function =)

This solution appends the best subsets of the lookahead area accoring to the score.

score provides an order over the subsets using the following scheme:

  • A subset completing a bag is better than one that's not
  • One subset completing a bag is better than another if it has less excess weight
  • One subset not completing a bag is better than another if its mean is closer to what's expected to be in a bag

expected_mean tries to predict what the rest of the values should look like (assuming their choice is optimal).

UPD:

Here is another observation: you can place any oranges from the best subset into the bag without hurting the algorithm's performance. Moving any part of it still allows to move the rest afterwards, and the remainder should still be the best option (without new oranges) if the scoring is right. Moreover, this way there is a chance to dynamically improve the set of candidates to put into the bag by seeing more oranges before filling the bag. And you want to know as much information as possible, so there is no point in moving more than one orange to the bag at any given time.

import itertools as it
import math
from functools import partial
from collections import Counter


mean, std = 170, 13


def powerset(list_, max_items):
    return it.chain.from_iterable(it.combinations(list_, r) for r in range(1, max_items + 1))


def expected_mean(w):
    spread = std * 1
    return max(mean - spread, min(mean + spread, w / max(1, round(w / mean))))


def score(weight_needed, candidate):
    c_sum = sum(candidate)
    c_mean = c_sum / len(candidate)
    if c_sum >= weight_needed:
        return int(-2e9) + c_sum - weight_needed
    return abs(expected_mean(weight_needed) - c_mean)


def f(oranges, min_bag_weight, lookahead):
    check = Counter(oranges)

    oranges = oranges.copy()
    result = []
    bag = []

    while oranges:
        weight_needed = min_bag_weight - sum(bag)

        lookahead_area = oranges[:lookahead]
        tail = oranges[lookahead:]

        to_add = min(powerset(lookahead_area, lookahead),
                     key=partial(score, weight_needed))
        to_add = min(powerset(to_add, 1),
                     key=partial(score, weight_needed))

        bag.extend(to_add)
        for x in to_add:
            lookahead_area.remove(x)
        oranges = lookahead_area + tail

        if sum(bag) >= min_bag_weight:
            result.append(bag)
            bag = []

    assert check == Counter(oranges) + Counter(bag) + sum(map(Counter, result), Counter())

    return result


if __name__ == '__main__':
    with open('oranges') as file:
        oranges = list(map(int, file))
    res = [f(oranges, 1000, l) for l in range(2, 7+1)]
    print(sum(map(len, res)))

Try it!


C++17, 9961.58 (average over some random seeds)

(scroll down for explanation if you don't know C++)

#include<iostream>

#include<vector>
#include<random>

std::mt19937 engine(279); // random engine
// random distribution of the oranges
std::normal_distribution dist (170.,13.);

int constexpr N_NEW_ORANGES=7;

/** Input format: Current remaining weight of the bag (remain) and 
the weight of the oranges (weights)
Output: the index of the orange to be pick.
*/
struct pick_orange{
    std::vector<int>weights,sum_postfix;int remain;

    /// returns {min excess, mask}, (index) is the LSB
    std::pair<int,int> backtrack(int index, int remain) {
        if(sum_postfix[index]<remain)return {1e9,0};
        int min_excess=1e9, good_mask=0;
        for(int next=index;next<N_NEW_ORANGES;++next){
            if(weights[next]==remain){
                return {0, 1<<(next-index)};
            }else if(weights[next]>remain){
                if(weights[next]-remain<min_excess){
                    min_excess=weights[next]-remain;
                    good_mask=1<<(next-index);
                }
            }else{
                auto[excess,mask]=backtrack(next+1,remain-weights[next]);
                if(excess<min_excess){
                    min_excess=excess;
                    good_mask=(mask<<1|1)<<(next-index);
                }
            }
        }
        return {min_excess,good_mask};
    } 

    int ans;

    pick_orange(std::vector<int> weights_,int remain_)
        :weights(std::move(weights_)),remain(remain_){

        int old_size=weights.size();

        std::vector<int> count (N_NEW_ORANGES, 0);
        weights.resize(N_NEW_ORANGES, 0);

        sum_postfix.resize(N_NEW_ORANGES+1);
        sum_postfix.back()=0;

        for(int _=0; _<500; ++_){

            for(int i=old_size;i<N_NEW_ORANGES;++i)
                weights[i] = dist(engine);

            // prepare sum postfix
            for(int i=N_NEW_ORANGES;i-->0;)
                sum_postfix[i]=weights[i]+sum_postfix[i+1];

            // auto[excess,mask]=backtrack(0,remain);
            int mask = backtrack(0,remain).second;

            for(int i=0; 

                mask
                // i < N_NEW_ORANGES

                ; mask>>=1, ++i){

                // if(mask&1)std::cout<<'(';
                // std::cout<<weights[i];
                // if(mask&1)std::cout<<')';
                // std::cout<<' ';

                count[i]+=mask&1;
            }

            // std::cout<<"| "<<remain<<" | "<<excess<<'\n';

        }

        std::vector<double> count_balanced(old_size, -1);
        for(int i=0;i<old_size;++i){
            if(count_balanced[i]>-1)continue;
            int sum=0,amount=0;
            for(int j=i;j<old_size;++j)
                if(weights[j]==weights[i]){sum+=count[j];++amount;}

            double avg=sum;avg/=amount;
            for(int j=i;j<old_size;++j)
                if(weights[j]==weights[i])count_balanced[j]=avg;
        }

        ans=old_size-1;
        for(int i=ans;i-->0;)
            if(count_balanced[i]>count_balanced[ans])ans=i;
        // Fun fact: originally I wrote `<` for `>` here and wonder
        // why the number of bags is even less than that of the
        // randomized algorithm
    }

    operator int()const{return ans;}
};


#include<iostream>
#include<fstream>
#include<algorithm>

int main(){
    // read input from the file "data"
    std::ifstream data ("data");
    std::vector<int> weights;
    int weight;while(data>>weight)weights.push_back(weight);

    int constexpr BAG_SIZE=1000;
    int total_n_bag=0;
    for(int lookahead=2;lookahead<=7;++lookahead){
        auto weights1=weights;
        std::reverse(weights1.begin(),weights1.end());

        int remain=BAG_SIZE,n_bag=0;
        std::vector<int> w;
        for(int _=lookahead;_--;){
            w.push_back(weights1.back());
            weights1.pop_back();
        }
        while(!weights1.empty()){
            int index=pick_orange(w,remain);

            remain-=w[index];
            if(remain<=0){
                ++n_bag;remain=BAG_SIZE;

                if(n_bag%100==0)
                    std::cout<<n_bag<<" bags so far..."<<std::endl;
            }
            w[index]=weights1.back();weights1.pop_back();
        }

        while(!w.empty()){
            int index=pick_orange(w,remain);
            remain-=w[index];
            if(remain<=0){++n_bag;remain=BAG_SIZE;}
            w.erase(w.begin()+index);
        }

        std::cout<<"lookahead = "<<lookahead<<", "
            "n_bag = "<<n_bag<<'\n';
        total_n_bag += n_bag;
    }

    std::cout<<"total_n_bag = "<<total_n_bag<<'\n';
}

// Fun fact: originally I wrote < for > here and wonder
// why the number of bags is even less than that of the
// randomized algorithm

(if < is used basically the algorithm tries to minimize the number of bags)

Inspired by this answer.

TIO link for 250 repetitions: Try it online!


Defines a function (actually it just looks like a function, it's a struct) pick_orange that, given a vector<int> weights the weight of the oranges and int remain the remaining weight of the bag, returns the index of the orange that should be picked.

Algorithm:

repeat 500 times {
generates random (fake) oranges (normal distribution with mean 170 and stddev 13) until there are N_NEW_ORANGES=7 oranges
pick any subset whose sum is smallest and not smaller than remain (the function backtrack does that)
mark all oranges in that subset as good
}

average out the number of times an oranges being marked as good of the (real) oranges with equal weight
return the best orange


There are 3 hardcoded constants in the program that cannot be inferred from the problem:

  • The random seed (this is not important)
  • N_NEW_ORANGES (the prediction length). Increasing this makes the program runs exponentially longer (because backtrack)
  • number of repetitions. Increasing this makes the program runs linearly longer.