top of page
Search

Day 38 | Trees made simpler (Basics) | C++

  • Writer: Ayush Mahajan
    Ayush Mahajan
  • Sep 25, 2022
  • 4 min read

Updated: Sep 27, 2022



I want to solve some problems with you but couldn't because we need range queries for that for which I will discuss the segment tree in the future. But first I want us to have a good understanding of what a tree is.


Btw, you have an understanding of what a graph is already, right? Okay cool, then let's start with the tree.


Official Definition [Okay if you don't remember or understand] -

A tree is one of the data structures that represent hierarchical data


Layman Understanding -

For a graph with N vertices and all connected as a single component, it has exactly N-1 edges, then it is a tree.


Remember this definition will help you a lot, it also has a lot of significance-


No Cycles Since the graph is completely connected, therefore it takes N-1 edges to connect N vertices. There is no extra vertex, thus it cannot form a cycle. Take a copy pen, try to make a cycle in a graph with the following (You really cannot do it, if you can reply to me and I will tell you where you went wrong)

  1. Single component

  2. N vertices

  3. N-1 edges

Single Component

You can reach any node from any node


No need to input the number of Edges.

In the problem, you will be given a tree with a number of Nodes, then N-1 lines as edges. Unlike graph, there will be no input on the count of edges


This is how a tree looks

This tree is given input like this

7 // this is N
1 2
2 4
2 5
1 3
3 6
3 7

We take input just like we take in the graph

int n;
cin >> n;
vector<vector<int>> g(n+1);
for(int i = 0; i < n-1; i++){
    int a,b;
    cin >> a >> b;
    g[a].push_back(b);
    g[b].push_back(a);
}


What is root of tree?

Have you heard about the roots of a real tree? It is the part of a tree from where it originates. Everything comes after it. In other words, it is the topmost node of the tree.


The tree is a hierarchical data structure, which means it stores data in the form of a parent-child format. The root is a node that will have no parent.


Have you heard of the term Family tree or Generation Tree

This data is hierarchical and shows relationships between nodes so you can understand the tree.

Here, Grandfather is root because we don't have data for Grandfather's parents.


Note - Root does not mean visually most node, it is just the origin. For the example above tree can also have Uncle as a root (which does not make sense but does not break any rule if we remove the meaning of uncle, father, and other words)

Anything can be a root until stated


Some important terms that you will encounter regularly

1. Root - Node with no parent


2. Parents and Children - Take any node, and think of going towards the root. The first node in the path towards the root is a parent. If X is the parent of Y, then Y is a child of X (Normal Logic)

3. Siblings - Set of Nodes that have the same parent are called siblings

4. Ancestors - Any Node that comes in the path towards the root. In simpler words parent of parent ... of X

5. Leaves - Nodes that do not have any children

6. Sub Tree - If you have nodes from the tree and observe this node and its children (and their children and so on) then you will notice that it is also a tree. This is a subtree with the mentioned node as root.

Writing DFS and BFS in tree

DFS in the tree is quite simple because there are no cycles, therefore except for parents a the an the sothe we are not worried about encountering any visited Nodes.

​Learning Moment

Do you know the above line is an answer to a famous interview problem -


To check if an undirected graph has a cycle or not.


Answer - Run DFS and maintain parent. If you find any visited node other than parent in adjacency list, then it there must be a cycle. I will explain more about in problem I am going to discuss tomorrow.

Depth First Search

void dfs(int u, int parent){
    for(int v: graph[u]){
        if( v != parent ) dfs(v, u);
    }
}

​Learning Moment

Since the biggest advantage of using BFS in Graph is to calculate shortest path on unweighted graph and tree does not have cycles, to there is only one path between two nodes hence even DFS yields shortest path. Thus I have never used BFS on trees as far as I can remember.

For BFS Code (Won't use it in trees) - Same as Graph Code


What better way to learn than solving some problems

Note-

  1. We are told that the root is 1, we can make it a direction graph instead of the undirected graph as the root is fixed. We make it undirected because we generally don't know the root and may need to run DFS from any point

  2. We can also get tree input in form of the parent of the ith node like in the problem.

Solution-

The problem is quite straightforward - count the number of subordinates. For every node, their children and their children, and so on are their subordinates.


We learned that it is called subtree. Thus we need to count subtree nodes (except self) to count the number of subordinates.


We can code it like this

#include<bits/stdc++.h>
using namespace std;

vector<vector<int>> g;
vector<int> ans;

int dfs(int u){
    int cnt = 0;
    for(int i : g[u]){
        cnt+=dfs(i);
    }
    ans[u] = cnt;
    cnt++;
    return cnt;
}

int main(){
    int n;
    cin >> n;
    g = vector<vector<int>> (n+1);
    ans = vector<int> (n+1);
    for(int i = 2; i <= n; i++){
        int p;
        cin >> p;
        g[p].push_back(i);
    }
    dfs(1);
    for(int i = 1; i <= n; i++){
        cout << ans[i] << " ";
    }
}

This is a very classic problem. Do you want to know a trick to solve it?

It can be done in two steps

  1. Run a DFS from any node and find the farthest node

  2. Run DFS from that farthest node again after resetting all distances.

  3. The answer is the distance of the node that now have the maximum distance

How this works - Diameter will always be ranging from 1 Leaf to another and if you arbitrarily take a node and run DFS, at least one of those nodes will be at the furthest distance. Again when DFS from that furthest point, this time another node will be furthest away.


You can implement it like


#include<bits/stdc++.h>
using namespace std;

vector<vector<int>> g;
vector<int> dist;

void dfs(int u, int parent = -1){
    if(parent == -1)dist[u] = 0;
    else dist[u] = dist[parent] + 1;
    for(int v: g[u]){
        if(v == parent)continue;
        dfs(v, u);
    }
}

int main(){
    int n;
    cin >> n;
    g = vector<vector<int>> (n+1);
    dist = vector<int> (n+1, -1);
    for(int i = 1; i < n; i++){
        int a,b;
        cin >> a >> b;
        g[a].push_back(b);
        g[b].push_back(a);
    }
    dfs(1);
    int farthest_node = max_element(dist.begin(), dist.end()) - dist.begin();
    
    dist = vector<int> (n+1, -1);
    dfs(farthest_node);
    farthest_node = max_element(dist.begin(), dist.end()) - dist.begin();

    cout << dist[farthest_node];

}

Some Problems for you to try before we learn new concepts -

Please like and share my posts if they are helpful.



 
 
 

Comments


bottom of page