CLIMATE CERBERUS

A Brief overview of international events related to the release of the exclusive report on climate and the future of humanity. Why is the year 2024 extremely important? What is the “climate…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Decision Trees

Decision Trees are a type of Supervised Machine Learning (that is you explain what the input is and what the corresponding output is in the training data) where the data is continuously split according to a certain attribute.in decision trees, each internal node represents a test on an attribute, and each branch represents the outcome of the test, and each leaf node represents the decision taken.

An example of a decision tree can be explained using above binary tree where we want to predict whether a person is fit given their information like age, eating habit, and physical activity, the decision nodes here are questions like ‘What’s the age?’, ‘Does he exercise?’, ‘Does he eat a lot of pizzas’? And the leaves, which are outcomes like either ‘fit’, or ‘unfit’.this is a simple case of binary classification problem.

There are two main types of Decision Trees:

Before we move to algorithm of building a decision tree, here are few definitions to keep in mind.

Entropy : Entropy is denoted by H(S) for a finite set S. it can also be understood as information required (in decision trees concept).

Intuitively, it tells us about the predictability of a certain event. Example, consider a coin toss whose probability of heads is 0.5 and probability of tails is 0.5. Here the entropy is the highest possible, since there’s no way of determining what the outcome might be. Alternatively, consider a coin which has heads on both the sides, the entropy of such an event can be predicted perfectly since we know beforehand that it’ll always be heads. In other words, this event has no randomness hence it’s entropy is zero.

Information gain: Information gain, in simple words can be understood as what is the change in information required (or entropy) after splitting a node on a particular attribute.

where IG (S, A) is the information gain by applying feature A. H(S) is the Entropy of the entire set, while the second term calculates the Entropy after applying the feature A, where P(x) is the probability of event x.

Gini index : Gini index measures the impurity, a data partition or set of tuples as : Gini(D) = 1 — Σ(p²), Here p is the probability, that a tuple in D belongs to class Ci. And this is estimated by Ci/D. The sum is computed over m classes. lower gini index is always better. here, we split on that feature where change in gini index is maximum.

ID3 algorithm : There are many algorithms out there which construct Decision Trees, but one of the best is called as ID3 Algorithm. ID3 Stands for Iterative Dichotomiser 3.

ID3 Algorithm will perform following tasks recursively

Instead of using information gain(IG) here, we can also use Gain ratio(a simple modification of Information Gain) or Gini index metric to split upon a attribute.

Applications of Decision Trees:

Below is the summary of what we studied in this blog:

Hope you developed a better understanding of Decision Trees, feel free to give some claps.

Add a comment

Related posts:

April Fools Gunfighter

Confidence T. Brown lived in Reno Nevada. His father was the Mayor and their family had a lot of money, made from land sold for silver mining. Confidence’s mother had died giving birth to him. He had…

Do one thing and do it well

Our product Yogatree is built using Ruby On Rails. It is a simple mobile optimised Web Application. The application is not a Single Page Application. But some places needed a lot of Javascript. I…

County to reclaim grabbed fish landing sites

Kwale County government has warned land grabbers eyeing public utility land and fish landing sites. Deputy Governor Fatuma Achani made the remarks in reference to reports that land set aside for the…