I ran into this problem where I wanted to generate balanced trees for test cases, and I decided to crack it with math.
Let X = the number of remaining open parenthesis required to reach your target length, and Y = the number of remaining close parenthesis.
Let P(X) = X/(X+Y) * (1 + 1/(Y-X+1)). Generate an open parenthesis with probability P(X), and otherwise a close parenthesis.
This will efficiently generate an unbiased random plane tree. The reason this works has to do with counting the valid ways to complete the plane tree, using a calculation similar to the one in the article.
As someone who has never heard of most of these concepts before (plane trees, catalan numbers, ballot sequences, depth vectors), I found the question "Can you think of a way to efficiently generate a random plane tree?" confusing, and I only understood the problem being solved by first trying to understand the solution. After reading through, it seems like it's asking about generating a random plane tree drawn from a uniform distribution of all possible plane trees with a given number of nodes? Cool idea once I understood it though!
Maybe this should be another article, but there's a cool isomorphism between binary trees and forests of plane trees. Connects my article to the reason that you can count the number of ways to parenthesize n elements by C_n, the n-th Catalan number.
Didn’t expect a few lines of APL plus ballot-sequence magic to make random plane trees feel this intuitive—super elegant construction. Curious how this could be applied in practice, e.g. generative graphics or random UI/tree structures?
> Looking at the tail sequence starting at s_i, we know it must end in 1. That means s_i is N+1 below the tail.
I think the first sentence is very confusing, as the two most obvious ways to interpret it are wrong:
1. The tail sequence beginning at s_i consists of just 1 and -1 values, and it's certainly not the case that the "end" (last element) of that sequence must be 1 -- indeed, in the example it's -1.
2. What if we interpret "the end of the tail sequence beginning at s_i" as the sum of that sequence? But then this is also not always true: You can easily construct a tail sequence where this sum is much higher than 1.
I think the intended meaning is: "We know that the last element in the sequence of all partial sums must be 1 [because there is in total one more 1 than -1 in the original sequence, and the rest cancel in pairs]." I.e., with no mention of s_i at all (yet).
But even this is not enough to establish that the partial sums in just the tail sequence never dip below 1, which is required for the next step (rotating it to the front) to go through.
I think there's a much clearer way to prove this, which actually makes use of the "rightmost lowest" choice of i:
Suppose to the contrary that some j > i exists such that the sum of elements at positions i+1, i+2, ..., j is N' <= 0. Then this contradicts the "rightmost lowest" choice of i, since the sum of the first j elements of the entire sequence is N+N' <= N (at least as low) and j > i (strictly to the right).
That contradiction establishes that the tail sequence is itself free of dangerous dips. To establish that the sum of all elements in the sequence is 1-N (which is needed to show that rotating it to the start will be enough to boost the original "head" sequence to safety), it's enough to observe that the entire sequence sums to 1, and the head sequence sums to N, so what's left (which is exactly the elements of the tail sequence) must sum to 1-N.
Fair. You clearly engage with my article on a real level. More comments like yours will probably make my writing better. So, thanks!
That said, the code does precisely define s_i, which is s[i] in linearized notation. I scribbled the article together at 2am after a long day at work because a colleague asked for it and assume that you know APL or are at least willing try reading the code blocks.
> I think there's a much clearer way to prove this...
Honestly, I'm not a fan of arguments by contradiction when a clear constructive proof is available: The sequence of partial sums ends in 1; there is a rightmost partial sum that achieves the minimum partial sum, -N, say at index i. The remaining 1s and ¯1s, from index i+1 to the end, clearly must sum to N+1. Thus, left-rotating the sequence by i elements results in a strict ballot sequence. This is procedural and directly translates to the code.
Thanks for responding. I don't know APL, so was just trying to follow the body text. I should have also mentioned at the outset: This idea and algorithm is very neat!
> The remaining 1s and ¯1s, from index i+1 to the end, clearly must sum to N+1. Thus, left-rotating the sequence by i elements results in a strict ballot sequence
I'm claiming that one part of this argument remains to be shown: Namely, that the sequence of partial sums of the tail sequence itself (i.e., the sequence s_{i+1}, s_{i+1}+s_{i+2}, ..., s_{i+1}+...+s_n) never dips below 1. This is clearly a necessary condition (since if it's violated, the result after rotation will be invalid), and I don't yet see how it's implied by what you've written (certainly I can construct a sequence of 1s and -1s that sum to N+1 but do have some initial partial sum below 1).
For me, the fact that the argument you have made so far doesn't "make use of" the "rightmost lowest" property of i's construction is also a hint that there might be a gap here. (In my experience, when you choose/construct a witness this way, i.e., to optimise some quantity, it's always so that you can later show a contradiction to rule out some troublesome case.)
I ran into this problem where I wanted to generate balanced trees for test cases, and I decided to crack it with math.
Let X = the number of remaining open parenthesis required to reach your target length, and Y = the number of remaining close parenthesis.
Let P(X) = X/(X+Y) * (1 + 1/(Y-X+1)). Generate an open parenthesis with probability P(X), and otherwise a close parenthesis.
This will efficiently generate an unbiased random plane tree. The reason this works has to do with counting the valid ways to complete the plane tree, using a calculation similar to the one in the article.
As someone who has never heard of most of these concepts before (plane trees, catalan numbers, ballot sequences, depth vectors), I found the question "Can you think of a way to efficiently generate a random plane tree?" confusing, and I only understood the problem being solved by first trying to understand the solution. After reading through, it seems like it's asking about generating a random plane tree drawn from a uniform distribution of all possible plane trees with a given number of nodes? Cool idea once I understood it though!
Awesome! You can simulate a universe with a binary tree and help align AI agents: https://docs.google.com/document/d/1Rqt7jQEP5QAmwEcMd-X0nsfx...
Maybe this should be another article, but there's a cool isomorphism between binary trees and forests of plane trees. Connects my article to the reason that you can count the number of ways to parenthesize n elements by C_n, the n-th Catalan number.
Didn’t expect a few lines of APL plus ballot-sequence magic to make random plane trees feel this intuitive—super elegant construction. Curious how this could be applied in practice, e.g. generative graphics or random UI/tree structures?
> Looking at the tail sequence starting at s_i, we know it must end in 1. That means s_i is N+1 below the tail.
I think the first sentence is very confusing, as the two most obvious ways to interpret it are wrong:
1. The tail sequence beginning at s_i consists of just 1 and -1 values, and it's certainly not the case that the "end" (last element) of that sequence must be 1 -- indeed, in the example it's -1.
2. What if we interpret "the end of the tail sequence beginning at s_i" as the sum of that sequence? But then this is also not always true: You can easily construct a tail sequence where this sum is much higher than 1.
I think the intended meaning is: "We know that the last element in the sequence of all partial sums must be 1 [because there is in total one more 1 than -1 in the original sequence, and the rest cancel in pairs]." I.e., with no mention of s_i at all (yet).
But even this is not enough to establish that the partial sums in just the tail sequence never dip below 1, which is required for the next step (rotating it to the front) to go through.
I think there's a much clearer way to prove this, which actually makes use of the "rightmost lowest" choice of i:
Suppose to the contrary that some j > i exists such that the sum of elements at positions i+1, i+2, ..., j is N' <= 0. Then this contradicts the "rightmost lowest" choice of i, since the sum of the first j elements of the entire sequence is N+N' <= N (at least as low) and j > i (strictly to the right).
That contradiction establishes that the tail sequence is itself free of dangerous dips. To establish that the sum of all elements in the sequence is 1-N (which is needed to show that rotating it to the start will be enough to boost the original "head" sequence to safety), it's enough to observe that the entire sequence sums to 1, and the head sequence sums to N, so what's left (which is exactly the elements of the tail sequence) must sum to 1-N.
> I think the first sentence is very confusing...
Fair. You clearly engage with my article on a real level. More comments like yours will probably make my writing better. So, thanks!
That said, the code does precisely define s_i, which is s[i] in linearized notation. I scribbled the article together at 2am after a long day at work because a colleague asked for it and assume that you know APL or are at least willing try reading the code blocks.
> I think there's a much clearer way to prove this...
Honestly, I'm not a fan of arguments by contradiction when a clear constructive proof is available: The sequence of partial sums ends in 1; there is a rightmost partial sum that achieves the minimum partial sum, -N, say at index i. The remaining 1s and ¯1s, from index i+1 to the end, clearly must sum to N+1. Thus, left-rotating the sequence by i elements results in a strict ballot sequence. This is procedural and directly translates to the code.
Thanks for responding. I don't know APL, so was just trying to follow the body text. I should have also mentioned at the outset: This idea and algorithm is very neat!
> The remaining 1s and ¯1s, from index i+1 to the end, clearly must sum to N+1. Thus, left-rotating the sequence by i elements results in a strict ballot sequence
I'm claiming that one part of this argument remains to be shown: Namely, that the sequence of partial sums of the tail sequence itself (i.e., the sequence s_{i+1}, s_{i+1}+s_{i+2}, ..., s_{i+1}+...+s_n) never dips below 1. This is clearly a necessary condition (since if it's violated, the result after rotation will be invalid), and I don't yet see how it's implied by what you've written (certainly I can construct a sequence of 1s and -1s that sum to N+1 but do have some initial partial sum below 1).
For me, the fact that the argument you have made so far doesn't "make use of" the "rightmost lowest" property of i's construction is also a hint that there might be a gap here. (In my experience, when you choose/construct a witness this way, i.e., to optimise some quantity, it's always so that you can later show a contradiction to rule out some troublesome case.)