Abstract:
We consider Bayesian multiple statistical classification problem in the case where the unknown source distributions are estimated from the labeled training sequences, then the estimates are used as nominal distributions in a robust hypothesis test. Specifically, we employ the DGL test due to Devroye et al. and provide non-asymptotic, exponential upper bounds on the error probability of classification. The proposed upper bounds are simple to evaluate and reveal the effects of the length of the training sequences, the alphabet size and the numbers of hypothesis on the error exponent. The proposed method can also be used for large alphabet sources when the alphabet grows sub-quadratically in the length of the test sequence. The simulations indicate that the performance of the proposed method gets close to that of optimal hypothesis testing as the length of the training sequences increases.