According to wikipedia TF-IDF is: “ In information retrieval, tf–idf or TFIDF, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.”
It is a statistical assumption and it has a purpose. What is its purpose? When we use eigenvalues in PCA algorithm to reduce dimension, we select most useful created features to explain target value. In vectorizing, we try to use best informational gain for machine learning and other purposes as meaningful information gain. How can we make an assumption that all the information gain is meaningful?
Trending AI Articles:
1. Basics of Neural Network
2. Bursting the Jargon bubbles — Deep Learning
3. How Can We Improve the Quality of Our Data?
4. Machine Learning using Logistic Regression in Python with Code
According to Shannon entrophy, when we have more information, the negative entrophy increases so that the entrophy decreases. Because, it is so crucial knowledge that the formation and order is against to physics’ nature.
Thus we try to maximize the information gain from documents or text. TF-IDF is just a heuristic formula to capture information from documentation. We can use all the meaningful information from document.
Information Gain Algorithm
For some agents and mathematical structures in nature, we should maximize the information gain. When we maximize the information gain, every particle or agent or perceptron could receive a high potential. Then, we should analyze the structure about how well formed the information gain is. As human brain, we receive information from nature also from outsource, however, the formation of this information is another problem for intelligent agents.
In agent form the information source is generally classified as useful source, harmful source and redundant source. However, when we run a exploration algorithm on agent, the all the outsource information may classified as useful source or needful source. Than we should classify this source of information as 3,4 or more segments.
In a document or text document domain, there are possible information sources we can benefit from.
- The Cardinality of words in document.
- The Ordinality of a word in a document.
- The relation between Cardinality and Ordinality of word
- The sentence ratio in document which a word found and not found.
- The cardinality sum of words ratio to sum of words of sentence which they are found
- So on.
We can create and manifest a finite or inifinite(?) number of mathmetical structures.
Nature of Information
In this example, heuristically we try to maximize our vectorizer by forming a good mathematical formula for needed body. Generally, we try to explain nature behavior information by mathematics. But with which operations we can maximize our transformer function. Basically, the mathematical formation transformers and information A into information B. The A information is encoded relatively to AI agent which try to use the outsource or targetted information in A. The problem is, how can we generate such a mathematical formation or mathematical this transformer as mathematical operations is optimal? We can create a procedure algorithm which can suppress the mathematical operations by means of AI. in the system of
f(A) → transformer function → B the functional system does not have to be a mathematical formulation but could be a functional system.
For example, we can create a systematical function begins from a paremeter. We can attend a functional transformer to any information unit in same group of units. So sentence is a commen purpose group of information units and paragraphs is a sibernetic body of groups. With this assumption, we can attend mathematical functions to extend the uniqeness of each unit in linear space. we can attend each unit a function as x → x , x → x² , x → x³ so on and when they in each group we can transform these functions into these group of words.
For example in a paragraph we see the information group as.
the list of functions are:
a : x → x²
b: x → x³
c: x → x⁴
and in every group of words, the units shares its functional structre with the common group units so,
a in group 2 can follow this functions as ((a²)⁴)⁵ so on.
A Simple Mathematical Structure
We can summarize this body as like this:
first we can give this x as e, then in logarithm
log((totalProduct(powers of words in every sentence) / the start value of word) * (totalProduct(the numbers in document * the start value of word) / (totalProduct(document degrees that word is included))
So we product e number powers then divide them and taking a logarihm will give us an one dimensional value.
In conclusion, this sounds like group theory implementation.
Thus, I think, I should research on group theory and application with information theory.
Hope you enjoyed. Have a nice day.