• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Flatten a hierarchy with recursive LAMBDA formula ?

Lolo

Member
Cross posted w/o links: https://stackoverflow.com/questions/79478227/flatten-a-hierarchy-with-recursive-lambda-formula-or-with-power-query
Hello,

In excel I have this table (it is just an example) :
Table called T_DATA

idlParentId
dtm|fact
dtm|dim
fact|raw1
dim|raw2
dtm2|raw3
Etc…

I want to flat the hierarchy based on a value the user populate in cella D1 for example

For example if in D1 I populate : dtm
the result expected would be :
(row 1) : dtm>fact>raw1
(row 2) : dtm>dim>raw2

if in A1 I populate : Dtm2
the result expected would be :
(row 1) : dtm2>raw3

So in summary, each path found must be concatened (separated by a '>'),
and there is 1 path by row.

I managed to do it and it works for the 1st path only, with the following recursive function/formula called GetPath:
=LAMBDA(start,
LET(
nextItems,FILTER(T_DATA[ParentId];T_DATA[Id]=start);
IF(nextItem="";start;start&">"&GetPath(nextItem))
)
)
and by using it like this : GetPath(D1)

But I cannot figure out a way to get ALL paths based on the criteria searched in A1
I tried other options found on internet, but the formula returns #N/A each time :(

See attached file.

If anyone can help me, it would be very appreciated.
 

Attachments

If someone has a solution in Powerquery, it is also a valid option :)

Honestly I'm stucked with this problem :(
 
Please see the attached workbook for a possible solution. The recursive function is defined as follows:

GET.PATH:
Code:
=LAMBDA(node,lookup_arr,return_arr,
   IF(
      ROWS(node) = 1,
      LET(
         v, @node,
         a, FILTER(return_arr, lookup_arr = TEXTAFTER(v, ">", -1,, 1)),
         IF(ISNUMBER(ROWS(a)), GET.PATH(v & ">" & a, lookup_arr, return_arr), v)
      ),
      VSTACK(
         GET.PATH(TAKE(node, ROWS(node) / 2), lookup_arr, return_arr),
         GET.PATH(DROP(node, ROWS(node) / 2), lookup_arr, return_arr)
      )
   )
)

It should work fine if you only intend to search by parent nodes (e.g. "APP1" and "APP2" in your sample workbook):

Code:
=LET(
   pId, SCAN("", IF(LEFT(T_DATA[Dataset], 3) = "app", T_DATA[Dataset], ""), LAMBDA(a,v, IF(v = "", a, v))),
   arr, FILTER(T_DATA, pId = Param.SearchNode),
   IF(ISNUMBER(ROWS(arr)), GET.PATH(Param.SearchNode, TAKE(arr,, 1), TAKE(arr,,-1)), NA())
)

Searching by child nodes is also possible, as demonstrated in the Example2 worksheet...
 

Attachments

Thank you I will have a look tomorrow. Seems by far more complex than I m able to produce in term of formula ... :(
I will let you know. Thank you again for your help
 
You're welcome. In all likelihood, it may end up returning some invalid paths. It will really only work properly with a one-to-many table, where one parent can have many child records. With a many-to-many table, where child records can also have many parents, it will return all subsequent levels for both parents. For example, in your sample file, Raw4 appears as a child record for both Dim1 and Dim2. If Raw4 were to then have a child record, Test1, it would be picked up by both parents (e.g. APP1>DTM1>Dim1>Raw4>Test1 and APP1>DTM1>Dim2>Raw4>Test1), whether that was intended or not. The same goes for Raw5, which appears as a child record for both Dataset2 and Dim2.

Furthermore, the formula will return #NUM! as written, when circular parent-child records are present, e.g. Dataset2>Dataset3; Dataset3>Raw2; Raw2>Dataset2. To help mitigate this issue, the formula can be modified to only search the remaining records at each iteration:

GET.PATH:
Code:
=LAMBDA(node,lookup_arr,return_arr,
   IF(
      ROWS(node) = 1,
      LET(
         v, @node,
         b, lookup_arr = TEXTAFTER(v, ">", -1,, 1),
         a, FILTER(return_arr, b),
         IF(
            ISNUMBER(ROWS(a)),
            LET(
               n, XMATCH(TRUE, b) - 1,
               GET.PATH(v & ">" & a, DROP(lookup_arr, n), DROP(return_arr, n))
            ),
            v
         )
      ),
      VSTACK(
         GET.PATH(TAKE(node, ROWS(node) / 2), lookup_arr, return_arr),
         GET.PATH(DROP(node, ROWS(node) / 2), lookup_arr, return_arr)
      )
   )
)

But it's definitely not perfect...
 

Attachments

Thank you.

1/ to comment your example :
APP1>DTM1>Dim1>Raw4>Test1 and APP1>DTM1>Dim2>Raw4>Test1
this is what I want indeed. It is normal. It means that Raw4 is sourced from Test1

2/ Not yet time to check formulas in details, but I opened the Excel file and it seems to do the job.
I will try to understand the formula, this is impressive ! :)
Thank you so much.

However seems I cannot select an intermediate node, for example Dataset2
and get this result :
- Dataset2>Dataset3>Raw2
- Dataset2>Raw5
But not a big issue. Your proposal with method 'Selected Node via ALL Nodes:' is also valid (but it need to have all hierarchies flatten before)
- APP1>DTM1>Dataset2>Dataset3>Raw2
- APP1>DTM1>Dataset2>Raw5

If we could have a Third option 'Selected Node via ALL Nodes (but see only the selection and childs):' to get only the result :
- Dataset2>Dataset3>Raw2
- Dataset2>Raw5
It would be perfect

NB1: Now, I need to see if it works with a big and real hierarchy (and see the performance)
NB2: I found also a solution via Power Query, but I need to adapt the M code to fit exactly my needs. when done I will share it.
 
To get the 'Selected Node via ALL Nodes' and see only the selected node with child records:
Code:
=LET(
   num, SEARCH(Param.SearchNode, H3#),
   FILTER(REPLACE(H3#, 1, num - 1, ), ISNUMBER(num))
)

I've also optimized the 'ALL Nodes' formula in cell H3 of the attached file, which should help to improve performance. I tested it with 10,000 rows of data and it worked just fine. Check it out, if desired...
 

Attachments

Hello, sorry for my late reply, but was not able to work on my topic and your proposal on the last months.So I will check and let you know, but sounds very promising for my problem. I just have to adapt, and understand the logic of the formula :)

thank you again !
 
Back
Top