原标题:Dataset2Vec Learning DatasetMeta-Features
作者:Hadi S. Jomaa, Josif Grabocka, LarsSchmidt-Thieme
关键词:Meta-Features, Dataset Summarization Techniques,Meta-Learning, Hyper-parameter Optimization
中文摘要:比如像对一个数据集的模型进行超参数优化或者小样本学习等这种机器学习任务如果不是对每一个新的数据集从头开始,而是从以前的运行中继承结果,是可以大大提高训练速度的。元学习是利用整个数据集的特征进行学习,如实例数量、预测器数量、预测器均值等等,所谓的元特性,数据集摘要统计或简单的数据集特征,迄今为止都是手动操作的。近年来,基于变分自动编码器的无监督数据集编码模型已经成功地学习了在某些情境下都遵循相同模式数据集的特征。在本文中,我们提出了一个新的模型叫Dataset2Vec,该模型能够用潜在特征向量来描述数据集,从而能够将具有相同模式的数据集推广到任意的数据集。为此,本文对数据集的批次使用辅助学习任务,特别是从不同的数据集中区分批次。研究结果表明,从相似的数据集中批量去除的自己,它们的元特征在一个嵌入空间中保持了原来的相似性。同样,我们还发现在超参数优化模型中使用Dataset2Vec学习到的数据集特征,比迄今为止使用其他方法的效果还要好。借此,我们也提出了当前最先进的超参数优化结果。
英文摘要:Machine learning tasks such asoptimizing the hyper-parameters of a model for a new dataset or few-shotlearning can be vastly accelerated if they are not done from scratch for everynew dataset, but carry over findings from previous runs. Metalearning makes useof features of a whole dataset such as its number of instances, its number ofpredictors, the means of the predictors etc., so called meta-features, datasetsummary statistics or simply dataset characteristics, which so far have been hand-crafted,often specifically for the task at hand. More recently, unsupervised datasetencoding models based on variational auto-encoders have been successful inlearning such characteristics for the special case when all datasets follow thesame schema, but not beyond. In this paper we design a novel model,Dataset2Vec, that is able to characterize datasets with a latent feature vectorbased on batches and thus is able to generalize beyond datasets having the sameschema to arbitrary (tabular) datasets. To do so, we employ auxiliary learningtasks on batches of datasets, esp. to distinguish batches from differentdatasets. We show empirically that the meta-features collected from batches ofsimilar datasets are concentrated within a small area in the latent space,hence preserving similarity. We also show that using the datasetcharacteristics learned by Dataset2Vec in a state-of-the-art hyper-parameteroptimization model outperforms the hand-crafted meta-features that have beenused in the hyper-parameter optimization literature so far. As a result, weadvance the current state-of-the-art results for hyper-parameter optimization.
论文总结:










点击“阅读原文”,了解论文详情!




