Have you ever tried to store hierarchies in SQL. That’s easy but, have you ever tried to do some calculation in that hierarchy. That’s bit harder.
Let’s take example situation that I have some hierarchy and some cost that should be allocated in that hierarchy. Let assume that you have an HierarchyItem that could have another item many other HierarchyItems as child of it and many parents also. Then you could have some cost allocated to that hierarchy let’s say DirectCost. You share this according to weight between the HierarchyItem connection say ConnectionShare.
Here is the class diagram of this.
So HierarchyItem cost is sum of direct costs and cost of items child according to share ratios.
Example if have file server and it costs are 1000 euros. Application A file usage is 80GB and Application B file usage is 120GB:
Application A has costs from file server (80 / (80 + 120)) * 1000 euros = 400 euros
Application B has costs from file server (120 / (80 + 120)) * 1000 euros = 600 euros
In this easily solved with recursion as flows (Detailed code in C#):
public float GetCost(string costType)
{
float retval = 0;
//Calculate first all direct costs
foreach(DirectCost directCost in DirectCosts)
{
if(directCost.CostType.Equals(costType))
retval += directCost.Value;
}
//Add all child costs to this
foreach(ConnectionShare child in ChildConnectionShares)
{
retval += child.GetChildCost(costType);
}
return retval;
}
Problem converting this to SQL is that you can’t really do recursion in SQL. So the solution in my case was to simulate the cost in time to time. Simulation is a good solution if you don’t data bee accurate all the time.
So if you think about how you would calculate this by hand. You just would start calculating this from the bottom. First calculate those items that do not have any child’s. After that you would calculate items which child’s has been calculated all ready. And the just iterate the process.
So SQL solution comes from the same idea. First calculate all those items that have no childs and after that calculate those items which child’s has been calculated. Or if I be more precise you can calculate those items where your child count is the same as calculated child count.
So if I would have an tables HierarchyItem, ConnectionShare, DirectCost and HierarchyItemCosts.
HierarchyItem you would all the hierarchy items.
ConnectionShare would include all connections between HierarchyItems
DirectCost Table would include all direct cost related to HierarchyItem.
HierarchyItemCosts would store all the simulated costs.
Let’s assume that we already have some data in there. And go step by step…
1. Find all those items that can be calculated - those items where your child count is the same as calculated child count.
SELECT Parent.ID
FROM HierarchyItem Parent LEFT JOIN ConnectionShare Child
ON Parent.ID = Child.ParentID
LEFT JOIN HierarchyItemCost CalulcatedItems
ON Child.ChildID = CalulcatedItems.HierarchyItemID
GROUP BY Parent.ID
HAVING COUNT(Child.ChildID) = COUNT(CalulcatedItems.HierarchyItemID)
2. Now Select all the items that should be calculated
SELECT HI.ID
FROM HierarchyItem HI
WHERE HI.ID IN (Query from the step 1)
3. Add all CostTypes to calculation
SELECT HI.ID,
TmpCostType. CostType
FROM HierarchyItem HI CROSS JOIN
(SELECT DISTINCT CostType FROM DirectCost) TmpCostType
WHERE HI.ID IN (Query from the step 1)
4. Add the direct cost calculation
SELECT HI.ID,
TmpCostType. CostType,
(SELECT COALESCE(SUM(Value),0)
FROM DirectCost
WHERE HierarchyItemID = HI.ID
AND CostType = TmpCostType.CostType)
FROM HierarchyItem HI CROSS JOIN
(SELECT DISTINCT CostType FROM DirectCost) TmpCostType
WHERE HI.ID IN (Query from the step 1)
5. Add the child costs to calculation and helpper view.
CREATE VIEW HierarchyItemTotalShare AS
SELECT HI.ID,COALESCE(SUM(Share),0)
FROM HierarchyItem HI LEFT JOIN ConnectionShare CS
ON HI.ID = CS.ChildID
GROUP BY HI.ID
SELECT COALESCE(SUM(Child.Value * (Relation.Share/TotalShare.TotalShare) ),0)
FROM HierarchyItemCost Child JOIN ConnectionShare Relation
ON Child.HierarchyItemID = Relation.ChildID
JOIN HierarchyItemTotalShare TotalShare
ON Child.HierarchyItemID = TotalShare.ID
WHERE Relation.ParentID = HI.ID
AND Child.CostType = TmpCostType.CostType
AND TotalShare.TotalShare <> 0
6. The final insert statement that is run until all row’s have been instered.
INSERT INTO HierarchyItemCost (HierarchyItemID,CostType,Value)
SELECT HI.ID,
TmpCostType.CostType,
--Items cost is sum of direct cost +
(SELECT COALESCE(SUM(Value),0) FROM DirectCost WHERE HierarchyItemID = HI.ID AND CostType = TmpCostType.CostType)
+
--Sum of all child cost acording to share
(SELECT COALESCE(SUM(Child.Value * (Relation.Share/TotalShare.TotalShare) ),0)
FROM HierarchyItemCost Child JOIN ConnectionShare Relation
ON Child.HierarchyItemID = Relation.ChildID
JOIN HierarchyItemTotalShare TotalShare
ON Child.HierarchyItemID = TotalShare.ID
WHERE Relation.ParentID = HI.ID AND Child.CostType = TmpCostType.CostType AND TotalShare.TotalShare <> 0)
FROM HierarchyItem HI CROSS JOIN (SELECT DISTINCT CostType FROM DirectCost) TmpCostType
WHERE HI.ID IN
–All those HierarchyItems witch childs has been calculated
(SELECT Parent.ID
FROM HierarchyItem Parent LEFT JOIN ConnectionShare Child
ON Parent.ID = Child.ParentID
LEFT JOIN HierarchyItemCost CalulcatedItems
ON Child.ChildID = CalulcatedItems.HierarchyItemID
GROUP BY Parent.ID
HAVING COUNT(Child.ChildID) = COUNT(CalulcatedItems.HierarchyItemID))
–Only calculate every item once
AND HI.ID NOT IN (SELECT HierarchyItemID FROM HierarchyItemCost)
Voilá!
If you have SqlServer you can test the solution with this script.