How Do You Build OOP Classes for Data Science?

Let’s build a Linear Regression class with some added functionality on top of the traditional built in methods you get with the sci-kit learn class.

The point of doing this exercise is to learn OOP principles in relation to your Data Science work.

Let’s get started with some basics of our Linear Regression class.

Data Science OOP Learning How To

In your Linear Regression class, self refers to the object itself.

__init__ (dunder init) = first method called when an object of this class type is created and can be used to initialize the class with necessary parameters (if any).

class MyLinearRegression:
    
    def __init__(self, fit_intercept=True):
        """
        Initialize the class.
        fit_intercept: Boolean switch to indicate
                       whether to include an intercept term in the model.
        """
        self.coef_ = None
        self.intercept_ = None
        self._fit_intercept = fit_intercept

Now, you can add a __repr__ method that will be called when you print out the object. You can add a description of the object for future context. Not completely necessary, but good for example purposes.

When you create a MyLinearRegression object, you can print it out and you will see the return of the __repr__ method:

Now, you add the core methods of the class. What are the main functions of your object? What does your Linear Regression object do for you? Fit? Predict? Score? What else can you add that you do with these predictions that will help you in the future?

You can start with adding the logical first step of the fit function:

Note the docstring describing the purpose of the method, what it does and what type of data it expects. All of these are part of good OOP principles.

class MyLinearRegression:
    
    def __init__(self, fit_intercept=True):
        """
        Initialize the class.
        fit_intercept: Boolean switch to indicate
                       whether to include an intercept term in the model.
        """
        self.coef_ = None
        self.intercept_ = None
        self._fit_intercept = fit_intercept
    
    def __repr__(self):
        return "I am a Linear Regression object."
    
    def fit(self, X, y):
        """
        Fit model coefficients given a dataset with features (X) and a target variable (y).

        Arguments:
        x: 1D or 2D numpy array
        y: 1D numpy array
        """

        # Check if X is 1D or 2D numpy array.
        # Note on numpy reshape method:
        # >>> a = np.arange(6).reshape((3, 2))
        # >>> a
        # array([[0, 1],
        #        [2, 3],
        #        [4, 5]])
        # X.reshape(rows, columns)
        # if either rows or cols is set to -1 in reshape,
        # it will take on the size of the array for either the row or col
        if len(X.shape) == 1:
            X = X.reshape(-1,1)

        # Add bias if fit_intercept is True
        if self._fit_intercept:
            X_biased = np.c_[np.ones(X.shape[0]), X]
        else:
            X_biased = X

        # Closed form solution
        xTx = np.dot(X_biased.T, X_biased)
        inverse_xTx = np.linalg.inv(xTx)
        xTy = np.dot(X_biased.T, y)
        coef = np.dot(inverse_xTx, xTy)

        # Set attributes
        if self._fit_intercept:
            self.intercept = coef[0]
            self.coef_ = coef[1:]
        
    
mlr = MyLinearRegression()

# Now you can fit the mlr object to the test data.
mlr.fit(X,y)

print("we've fit the data. we can get the regression coefficients now.")
print("Regression Coefficients:", mlr.coef_)

Sources:

[1] https://towardsdatascience.com/object-oriented-programming-for-data-scientists-build-your-ml-estimator-7da416751f64

Leave a Reply