Combinations, Power Sets No Idea where to even start

Обновить

April 2019

Просмотры

230 раз

1

У меня есть эта проблема, я надеялся, люди могли бы указать меня в правильном направлении выяснить, потому что я даже не знаю, с чего начать.

Вот настройки, у меня есть две таблицы в SQL Server, таблица А сводная таблица, таблица B является детали стола, так что-то вроде этого:

Table A
ParentID        Total Amount
1               100
2               587


Table B
ParentID        ChildID         Amount
1               1               8
1               2               7
1               3               18
1               4               93
2               5               500
2               6               82
2               7               5
2               8               10

Таким образом, для каждого ParentID, мне нужно придумать комбинации детей, суммы которых их сумма равна общая сумма Родителя.

Таким образом, для ParentId 1 (100), было бы ChildIDs 2 и 4 (7 + 93), и я бы просто игнорировать ChildIDs 1 и 3.

Для ParentId 2 было бы дети 5, 6, 7, и я бы игнорировать 8.

Там нет фиксированного размера для детей комбинации, которые могут быть объединены, чтобы равняться Parent.

Так делают некоторые исследования, кажется, мне нужно, чтобы получить набор питания всех детей для каждого родителя. Затем оттуда я могу суммировать их общее количество и посмотреть, если любой из них равен Parent. Тем не менее, поправьте меня, если я ошибаюсь, но если есть N элементы в наборе, то набор мощности будет состоять из 2 ^ N числа комбинаций.

Некоторые из этих родителей имеют более 750 детей и 2 ^ 750 является очень очень очень большим числом. Я основном .NET / парень SQL Server, но я открыт к попытке любых технологий, что люди будут думать имеют право на работу.

Так несколько вопросов.

1) Должен ли я идти по пути , пытаясь выяснить булеана для каждого родителя или я ложное дерево с этим?
2) Является ли это alogrithm , который уже понял, и я просто делаю плохую работу найти его на Google? 3) Предполагая , что это может быть сделано, что бы правильный подход к ее решению?

4 ответы

0

A bit of research tells me you can solve this in N*2^P where N is the number of children and P is the number of bits needed to store the largest number. Look, say, here: http://en.wikipedia.org/wiki/Subset_sum_problem#Polynomial_time_approximate_algorithm

=============================================

As long as the number of children per parent is small, then working out the powerset is fine, but note that the powerset of N children is 2^n, which grows very fast. 2^750 is hopelessly large, about 10^225.

There are plenty of functions for finding the powerset, I mostly work in Java and I know there is one in Guava, and I think there is also one in Apache Commoms Math. Constructing a powerset is not difficult, intuitively you can think of it is a binary tree of depth N, where every level is "Do I include this element Yes/No".

I don't write in c# but in pseudo code

Set<Set<Object>> Powerset(Set<Set<Object>> powerset, Object newItem){
    Set<Set<Object>> newSet = powerset.clone();
    for (Set<Object> set : newSet){
        set.add(newItem)
    }
    return newSet.addAll(powerset)
}

So this takes a powerset of N elements, and returns the power set of N+1 elements. So you can just call it repeatedly to build the powerset starting with the empty set.

For larger numbers of children, rather than build the powerset, use teh same function, but remove any set whose sum exceed the target. (As clearly its monotonically increasing, so once the target is exceeded it cannot be right to continue. E.g. Assume Object.hasValue() returns a number then do:

Set<Set<Object>> Powerset(Set<Set<Object>> powerset, Object newItem, int target){
    Set<Set<Object>> newSet = powerset.clone();
    for (Set<Object> set : newSet){
        set.add(newItem)
    }


    Set<Set<Object>> result = new Set<Set<Object>>();
    for(Set<Object> set : newSet){
        int sum = 0;
        for(Object o : set){
            sum += o.hasvalue();
        }
        if(sum <= target){
            result.add(set)
        }
    }
    return result.addAll(powerset);
}

Various optimisations are possible (e.g. you should add the largest numbers first, as its quickest if you exclude numbers as early as possible. (if have a number greater than the target you will only have to add it to one set if you do if first, but to 2^n-1 sets if you do it last). You can also make it so the Set carries the sum of its components directly, so elminating the sum loop, and you can improve the space complexity by storing it as a tree with elements pointing to their parent element, and doing a DFS so you only have one branch of the tree in memory at a time, and only keeping successful branches.

0

If some parents have 750 children, you are going to run out of time. You should investigate some sort of parallel cloud computing solution if you want to get an answer before the sun burns out.

2^750 = 5922386521
        532855740161817506647119
        732883018558947359509044
        845726112560091729648156
        474603305162988578607512
        400425457279991804428268
        870599332596921062626576
        000993556884845161077691
        136496092218188572933193
        945756793025561702170624

Now, lets assume we are really lucky, about as lucky as winning the lottery and find the correct combination really early.

Now, lets assume that we have fast computer that can calculate a billion sums a second.

Its going to take somthing like

6.332987 * 10^135 years

to find the answer. Now that is still an unimaginably long period of time. You can think of it as,

4.52356 * 10^125 ages of the universe.

Or more expressively, longer that the age of the universe multiplied by the number of atoms in the universe. Thats a long time.

http://xkcd.com/287/

This is some conjecture, but I suspect there is not enough material in the universe to make enough computers to parallelize the calculation enough to complete it before the sun runs out of fuel. (Whithin the bounds of exisiting technology.)

I'd suggest a brute force power set approach should be abandoned. Computers are fast but they're not that fast.

0

Actually your situation isn't as dire as the 2^750 analysis suggests. Abandon the power set solution and go for dynamic programming instead. One option might look like:

public static IEnumerable<int> FindSet(IEnumerable<int> amounts, int target)
{
    var results = new Dictionary<int, List<int>>();
    results[0] = new List<int>();
    foreach(var amount in amounts)
    {
        for(int i = 0; i <= target; i++)
        {
            if(!results.ContainsKey(i) || results[i].Contains(amount))
                continue;
            var combination = new List<int>(results[i]);
            combination.Add(amount);
            if (i + amount == target)
                return combination;
            results[i + amount] = combination;
        }
    }
    return null;
}

The premise is that we start by saying we know how to reach a sum of 0, with the empty set. Then for each available amount we go through and say that if we know how to reach a sum of n without using amount, then we also now how to reach a sum of n+amount- by taking the previous result and adding amount to it.

As you can see from the loops, this runs in order O(NK), where N is the number of values in the set, and K is the target sum. Much better than O(2^N).

This is just to give a sketch of the algorithm. There's probably plenty of room for performance tweaks. Particularly the lists could be replaced with some kind of "Node" class supporting a tree structure.

For a sketch of what I mean by Node class, something like:

public class Node
{
    public int Value;
    public Node Parent;

    public List<int> ToList()
    {
        if(Parent == null)
        {
            return new List<int> { Value };
        }
        var result = Parent.ToList();
        result.Add(Value);
        return result;
    }
}

Then you don't have to keep copying around lists

1

The problem is reducable to subset problem which can be reduced to simple knapsack problem. There is a dynamic programming solution to the problem :-

W = knapsack capacity = Total Amount of parent.

item weight = item cost = child amount.

maximize profit and if W = profit then there exists a subset else not.

Use DP solution of kanpsack to solve this problem and get result by backtracking.

Here is a solution in JAVA maybe you can convert to C# :-

public class SubSetSum {
    static int[][] costs;

    public static void calSets(int target,int[] arr) {

        costs = new int[arr.length][target+1];
        for(int j=0;j<=target;j++) {
            if(arr[0]<=j) {

                costs[0][j] = arr[0]; 
            }
        }
        for(int i=1;i<arr.length;i++) {

            for(int j=0;j<=target;j++) {
                costs[i][j] = costs[i-1][j];
                if(arr[i]<=j) {
                    costs[i][j] = Math.max(costs[i][j],costs[i-1][j-arr[i]]+arr[i]);
                }
            }

        }

        System.out.println("total amount: "+costs[arr.length-1][target]);
       if(costs[arr.length-1][target]==target) {
           System.out.println("Sets :");
           printSets(arr,arr.length-1,target,"");
       } 

       else System.out.println("No such Set found");

    } 

    public static void printSets(int[] arr,int n,int w,String result) {


        if(w==0) {
            System.out.println(result);
            return;
        }

        if(n==0) {
           System.out.println(result+","+0);
            return; 
        }

        if(costs[n-1][w]==costs[n][w]) {
            printSets(arr,n-1,w,new String(result));
        }
        if(arr[n]<=w&&(costs[n-1][w-arr[n]]+arr[n])==costs[n][w]) {
            printSets(arr,n-1,w-arr[n],result+","+n);
        }
    }

    public static void main(String[] args) {
        int[] arr = {1,2,3,8,9,7};
        calSets(10,arr);
    }
}

Note :-

In some cases brute force is more feasible than DP as space and time complexity for DP = O(ParentAmount*totalchildren) and whereas time complexity for brute force = O(2^n) and space complexity = O(1). You may choose according to the problem.