.. toctree:: .. _DexWeight: Weights ------- *Weights* are commonly used in :ref:`Decision Analysis ` to model the importance of :ref:`attributes `. Weights are numbers, usually normalized to the sum or maximum of 1 or 100, which define the relative contribution of the corresponding attribute to the final :ref:`evaluation `. In Decision Analysis, :ref:`aggregation functions ` are commonly defined using some form of the weighted sum, for example: f(X\ :sub:`1`, X\ :sub:`2`,..., X\ :sub:`n`) = w\ :sub:`0`\ + w\ :sub:`1`\ ×X\ :sub:`1` + w\ :sub:`2`\ ×X\ :sub:`2` + ... + w\ :sub:`n`\ ×X\ :sub:`n` Here, w\ :sub:`i` denote weights and X\ :sub:`i` denote attributes. In :ref:`qualitative DEX models `, there is natively no room for weights: attributes are symbolic and aggregation functions are defined by decision rules. However, to bridge the gap between qualitative and quantitative models, it is possible to introduce weights - in a somewhat approximate and imprecise way - also in qualitative models. Principle ~~~~~~~~~ .. image:: images/CarWei.png :alt: Utility function and hyperplane The figure above illustrates the basic approach. It shows the CAR aggregation function as defined in the :ref:`Car Evaluation ` model, represented by points (dots) in a three-dimensional space. Each point represents one defined decision rule. To find out the weights, we place a (hyper)plane (shown in red) into this space so that it matches the points as closely as possible (using the least squares measure). Once done, weights can be determined directly from the slopes of the hyperplane: the higher the slope in the direction of an attribute, the higher the corresponding relative weight. In the above figure, the weights of PRICE and TECH.CHAR. are both 50. These are *local normalized* weights (see the definition below). In DEX, weights are used for two purposes: - as an approximate representation of aggregation functions, used primarily for verification and overview, and - for defining aggregation functions or their parts (see :ref:`using weights `). .. _DexWeightTypes: Weight Types ~~~~~~~~~~~~ Actually, DEX uses four types of weights, as illustrated with the following weights from the :ref:`Car Evaluation ` model: .. image:: images/CarWei4.png :alt: Car: Weights The difference between *local* and *global* weights is due to the :ref:`tree of attributes `: - *Local* weights always refer to a single aggregate attribute and a single corresponding aggregation function, so that the sum of weights of the attribute's immediate descendants (function arguments) is 100%. - *Global* weights, on the other hand, take into account the structure of the tree and relative importance of its sub-trees. A global weight of an attribute is calculated as a product of the local weight and the global weight of the parent attribute. A global weight of the root attribute is 100%. For example: the global normalized weight of BUY.PRICE is 50% (its local normalized weight) × 50% (global normalized weight of PRICE), which gives 25%. Weights can also be *normalized* or not. This is because some :ref:`scales ` can have more values than others. Geometrically, larger scales appear longer, they have lower slopes and, consequently, smaller weights. *Normalization* refers to the procedure in which all scales are adjusted to the same length (unit interval) before determining the weights. Usually, this is the better method of weight assessment and comparison of attributes. .. _DexAdvancedWeight: Advanced Weights ~~~~~~~~~~~~~~~~ The weight-assessment method described above is standard in DEX, but is not the only possible one. While it is suitable for typical DEX decision tables, it might become inaccurate for non-linear or non-monotone functions. For this reason, standard weights can be complemented with weights assessed using alternative methods, which are widely imployed in machine learning: Information Gain, Gain Ratio, Gini Gain and Chi-Square. .. image:: images/CarWeiAdvanced.png :alt: Car: Advanced Weights **Information Gain** Information gain is based on entropy, which measures disorder: .. math:: Entropy(S) = - \sum_{i=1}^C p_i \log_2(p_i) Here, :math:`S` is a decision table, :math:`C` is the scale size of the output attribute, and :math:`p_i` is the probability of its :math:`i`-th value appearing in the decision table. *Information Gain* of attribute :math:`A` is then: .. math:: Gain_{Info}(S, A) = Entropy(S) - \sum_{v \in Values(A)} \frac{|S_v|}{|S|} Entropy(S_v) :math:`S_v` indicates a subset of :math:`S` where :math:`A` takes the value :math:`v`. :math:`|S|` indicates the table size in terms of decision rules. **Gain Ratio** Normalizes *Information Gain* by the *split information*: .. math:: SplitInfo(S, A) = - \sum_{v \in Values(A)} \frac{|S_v|}{|S|} \log_2 \left( \frac{|S_v|}{|S|} \right) *Gain Ratio* is defined as: .. math:: GainRatio(S, A) = \frac{Gain_{Info}(S, A)}{SplitInfo(S, A)} **Gini Gain** *Gini Index* measures the impurity of a dataset: .. math:: Gini(S) = 1 - \sum_{i=1}^C p_i^2 *Gini Gain* of an attribute :math:`A` is then: .. math:: Gain_{Gini}(S, A) = Gini(S) - \sum_{v \in Values(A)} \frac{|S_v|}{|S|} Gini(S_v) **Chi-Square (χ²) Statistic** Measures the independence between attribute :math:`A` and the class: .. math:: \chi^2 = \sum_{i=1}^C \sum_{v \in Values(A)} \frac{(O_{iv} - E_{iv})^2}{E_{iv}} where: - :math:`O_{iv}` = observed frequency of class :math:`i` with attribute value :math:`v`, - :math:`E_{iv}` = expected frequency under independence assumption. The four measures :math:`Gain_{Info}(S, A)`, :math:`GainRatio(S, A)`, :math:`Gain_{Gini}(S, A)` and :math:`\chi^2` are interpreted as relative weights of the corresponding attributes. These are normalized and displayed in the same way as :ref:`standard DEX weights `.