Multiple output regression or classifier with one (or more) parameters with Python

You could frame the problem as an optimization problem.

Let your (trained) regression model input values be parameters to be searched.

Define the distance between the model's predicted price (at a given input combination) and the desired price (the price you want) as the cost function.

Then use one of the global optimization algorithms (e.g. genetic optimization) to find such input combination that minimizes the cost (i.e. predicted price is closest to your desired price).

As mentioned by @Justas, if you want to find the best combination of input values for which the output variable would be max/min, then it is a optimization problem.

There are quite a good range of non-linear optimizers available in scipy or you can go for meta-heuristics such Genetic Algorithm, Memetic algorithm, etc.

On the other hand, if your aim is to learn the inverse function, which maps the output variable into a set of input variables then the go for MultiOuputRegresssor or MultiOutputClassifier. Both of them can be used as a wrapper on top of any base estimators such as linearRegression, LogisticRegresssion, KNN, DecisionTree, SVM, etc.

Example:

import pandas as pd
from sklearn.multioutput import MultiOutputRegressor, RegressorChain
from sklearn.linear_model import LinearRegression


dic = {'par_1': [10, 30, 13, 19, 25, 33, 23],
       'par_2': [1, 3, 1, 2, 3, 3, 2],
       'outcome': [101, 905, 182, 268, 646, 624, 465]}

df = pd.DataFrame(dic)

variables = df.iloc[:,:-1]
results = df.iloc[:,-1]

multi_output_reg = MultiOutputRegressor(LinearRegression())
multi_output_reg.fit(results.values.reshape(-1, 1),variables)

multi_output_reg.predict([[100]])

# array([[12.43124217,  1.12571947]])
# sounds sensible according to the training data

#if input variables needs to be treated as categories,
# go for multiOutputClassifier
from sklearn.multioutput import MultiOutputClassifier
from sklearn.linear_model import LogisticRegression

multi_output_clf = MultiOutputClassifier(LogisticRegression(solver='lbfgs'))
multi_output_clf.fit(results.values.reshape(-1, 1),variables)

multi_output_clf.predict([[100]])

# array([[10,  1]])

In most situations, finding one of the input variable value can help in predicting other variables. This approach can be achieved by ClassifierChain or RegressorChain.

To understand the advantage of ClassifierChain, please refer to this example.

Update:


dic = {'par_1': [10, 30, 13, 19, 25, 33, 23],
       'par_2': [1, 3, 1, 2, 3, 3, 2],
       'outcome': [0, 1, 1, 1, 1, 1 , 0]}

df = pd.DataFrame(dic)

variables = df.iloc[:,:-1]
results = df.iloc[:,-1]

multi_output_clf = MultiOutputClassifier(LogisticRegression(solver='lbfgs',
                                                            multi_class='ovr'))
multi_output_clf.fit(results.values.reshape(-1, 1),variables)

multi_output_clf.predict([[1]])
# array([[13,  3]])

Multiple output regression or classifier with one (or more) parameters with Python

Tags:

Python

Machine Learning

Scikit Learn

Related

Recent Posts