Introduction to concept learning and predict the most specific hypothesis in machine learning
Introduction to concept learning and predict the most specific hypothesis in machine learning
Introduction to Concept Learning
Let’s, start with a concept. Consider a basket with fruits. Here, I am having two baskets. One basket has six fruits. Another one basket has three fruits. So, I am going to say, that the first basket has more fruits than basket two. How can I say this?
Based on the, number of fruits in the basket. Consider, these two animals. What is the different between these two picture? Giraffe is taller than tiger. So , the concept behind this one is based on the height , we can say , tall and short.
Consider , one more example. In this , one guy is thin and another person is fat. So the concept behind this picture is , based on weight and physic of a person , we can say thin and fat. So , from all these examples, concept learning plays an important role.
Concept learning can be formulated, as a problem of searching, through a predefined space. In many cases this search can be efficiently organized by taking the advantage of a naturally occurring structure over the hypothesis space. Consider the following example.
From the observation, the picture has two facial expressions like happy or sad. Consider, happy is one and sad is zero. Consider, we can have a dataset which is already classified as positive and negative. So the concept learning is defined as the process of inducing a function into a Boolean output.
We are living in this world. We can see so many objects in this universe. Along with that we can have different concepts as well. Consider this rectangle is universe.
In this, I am taking the concepts of electronic gadget. In these take a phone. We can have different concepts like tablet, phone, and then smart phones. Let it be any brand. In a tablet, there are different number of features are there. It has many features.
I have considered four features for our convenience such as size, color, screen type and shape.
So ,I am considering first feature as a size. It may be small or large. Next, feature is color. it may white or black. Next feature is screen type. It is flat or folded. Final feature is shape, such as square and rectangle. When you look out these features the values are binary.
So it is called as binary valued attribute. Each of the phones can be uniquely identified by each of these features. Each feature is represented by a name called x1, x2, x3 and x4.
C= <x1,x2,x3 ,x4>
In this each of the attributes can take only one value at a time. Now, I am expressing the concepts of tablet. Suppose, I am representing the tablet like
Tablet = < large, white, flat, square>
Smartphone = < small, black, folded, rectangle>
I have uniquely defined each of these concepts by a set of features as a binary value.
Hypothesis representation
The representation of general hypothesis is given below.
Most General Hypothesis: h = <?, ?, ?, ?, ?, ?>
It means, it can accept all.
The representation of most specific hypothesis is given below.
Most Specific Hypothesis: h = < Ø, Ø, Ø, Ø, Ø, Ø>
It means, it will reject all.
Example
If I want to communicate with the family during an emergency and then I don’t care about the size, colour and etc. So , I accept any kind of phone in that situation. This is called as accept all represented by ? and called as general hypothesis.
Then we consider one more representation using phi. This is called as reject-all or most specific hypothesis.
For example, I want to purchase my dream mobile phone with the features like small, white, rectangle and flat. I will purchase a mobile with all these features. If any one of the features fails to satisfy the expectation, then I will reject it. This is called as specific hypothesis.
Find S Algorithm
Find S algorithm is used to find the most specific hypothesis for given dataset. From the training examples, we could find the most specific hypothesis. Afterwards, this output hypothesis is used to train the machine to predict the output for future input.
Let see an intelligent irrigation system dataset
Instances | Crop | Moisture | Temperature | Water Supply |
x1 | Cotton | Over-dry | high | 1 |
x2 | Banana | Wet | low | 0 |
x3 | Cassava | Saturated dry | high | 1 |
Steps involved in find s algorithm
- Initialize the most specific hypothesis h0=< Ø, Ø, Ø>
- Then, h1= x1 (If x1 is positive)
- Consider the next positive instances in the data set
- Compare the instances of h with x
- If it matches , then do nothing
- If there is a mismatch, replace a question mark in the hypothesis
- Ignore negative samples
Example
Step1
H0=< Ø, Ø, Ø>
X1=< Cotton, Over-dry, high >
Now the first instance x1 and it is positive.
H1= < Cotton, Over-dry, high >
Step 2
Consider the next instance x2
In this, it is a negative sample. So ignore this.
x2= <Banana , Wet, low>
Now H2 = H1
Such as H2=< Cotton, Over-dry, high >
Step 3
Consider the next instance x3
x3 =<Cassava, Saturated dry, high >
In this, it is a positive sample. Now compare x3 and H2. If there is any mismatch, then put a question mark.
H3= <?, ?, high >
So H3 is the final output specific hypothesis using find s algorithm.