LightGBM 'Using categorical_feature in Dataset.' Warning?

I presume that you get this warning in a call to lgb.train. This function also has argument categorical_feature, and its default value is 'auto', which means taking categorical columns from pandas.DataFrame (documentation). The warning, which is emitted at this line, indicates that, despite lgb.train has requested that categorical features be identified automatically, LightGBM will use the features specified in the dataset instead.

To avoid the warning, you can give the same argument categorical_feature to both lgb.Dataset and lgb.train. Alternatively, you can construct the dataset with categorical_feature=None and only specify the categorical features in lgb.train.


Like user andrey-popov described you can use the lgb.train's categorical_feature parameter to get rid of this warning.

Below is a simple example with some code how you could do it:

# Define categorical features
cat_feats = ['item_id', 'dept_id', 'store_id', 
             'cat_id', 'state_id', 'event_name_1',
             'event_type_1', 'event_name_2', 'event_type_2']
    ...

# Define the datasets with the categorical_feature parameter
train_data = lgb.Dataset(X.loc[train_idx], 
                         Y.loc[train_idx], 
                         categorical_feature=cat_feats, 
                         free_raw_data=False)

valid_data = lgb.Dataset(X.loc[valid_idx], 
                         Y.loc[valid_idx], 
                         categorical_feature=cat_feats, 
                         free_raw_data=False)

# And train using the categorical_feature parameter
lgb.train(lgb_params, 
          train_data, 
          valid_sets=[valid_data], 
          verbose_eval=20, 
          categorical_feature=cat_feats, 
          num_boost_round=1200)

Tags:

Lightgbm