The point of doing this exercise is to learn OOP principles in relation to your Data Science work.

Let’s get started with some basics of our Linear Regression class.

In your Linear Regression class, self refers to the object itself.

__init__ (dunder init) = first method called when an object of this class type is created and can be used to initialize the class with necessary parameters (if any).

```
class MyLinearRegression:
def __init__(self, fit_intercept=True):
"""
Initialize the class.
fit_intercept: Boolean switch to indicate
whether to include an intercept term in the model.
"""
self.coef_ = None
self.intercept_ = None
self._fit_intercept = fit_intercept
```

Now, you can add a __repr__ method that will be called when you print out the object. You can add a description of the object for future context. Not completely necessary, but good for example purposes.

When you create a MyLinearRegression object, you can print it out and you will see the return of the __repr__ method:

Now, you add the core methods of the class. What are the main functions of your object? What does your Linear Regression object do for you? Fit? Predict? Score? What else can you add that you do with these predictions that will help you in the future?

You can start with adding the logical first step of the fit function:

Note the **docstring** describing the purpose of the method, what it does and what type of data it expects. All of these are part of good OOP principles.

```
class MyLinearRegression:
def __init__(self, fit_intercept=True):
"""
Initialize the class.
fit_intercept: Boolean switch to indicate
whether to include an intercept term in the model.
"""
self.coef_ = None
self.intercept_ = None
self._fit_intercept = fit_intercept
def __repr__(self):
return "I am a Linear Regression object."
def fit(self, X, y):
"""
Fit model coefficients given a dataset with features (X) and a target variable (y).
Arguments:
x: 1D or 2D numpy array
y: 1D numpy array
"""
# Check if X is 1D or 2D numpy array.
# Note on numpy reshape method:
# >>> a = np.arange(6).reshape((3, 2))
# >>> a
# array([[0, 1],
# [2, 3],
# [4, 5]])
# X.reshape(rows, columns)
# if either rows or cols is set to -1 in reshape,
# it will take on the size of the array for either the row or col
if len(X.shape) == 1:
X = X.reshape(-1,1)
# Add bias if fit_intercept is True
if self._fit_intercept:
X_biased = np.c_[np.ones(X.shape[0]), X]
else:
X_biased = X
# Closed form solution
xTx = np.dot(X_biased.T, X_biased)
inverse_xTx = np.linalg.inv(xTx)
xTy = np.dot(X_biased.T, y)
coef = np.dot(inverse_xTx, xTy)
# Set attributes
if self._fit_intercept:
self.intercept = coef[0]
self.coef_ = coef[1:]
```

```
mlr = MyLinearRegression()
# Now you can fit the mlr object to the test data.
mlr.fit(X,y)
print("we've fit the data. we can get the regression coefficients now.")
print("Regression Coefficients:", mlr.coef_)
```

Sources:

[1] https://towardsdatascience.com/object-oriented-programming-for-data-scientists-build-your-ml-estimator-7da416751f64

]]>Source: https://www.shanelynn.ie/summarising-aggregation-and-grouping-data-in-python-pandas/

]]>- Datawig
- Uses Neural Networks to learn Machine Learning models in order to impute missing values.

- ARIMA Model
- Predicting your missing values for time series data

- Linear Regression
- KNN

- Lambda = Serverless
- Serverless = next generation of cloud computing that will essentially replace EC2 instances (for the most part)
- Simply write code, and run it without provisioning or managing compute servers like EC2 instances.
- Lambda executes your code only when needed and scales automatically
- You pay only for the compute time you consume there is no charge when your code is not running.
- Runs your code on a high-availability compute infrastructure and performs all of the admin of the compute resources
**automagically**— including server and operating system maintenance, capacity provisioning and automatic scaling, and code monitoring / logging. - Lambda currently supports Node.js, Java, C#, Ruby, Go, .NET Core, and Python
- Pros: no servers to manage, continuous scaling, pay only for your compute time, integrates with all other AWS services nicely
- Use cases: Data processing, real-time file processing, real-time stream processing, build serverless back-ends for web, mobile, IoT, and third-party API requests

- Relational DB = SQL DB i.e. Amazon Aurora, MySQL, MariaDB, PostgreSQL, Oracle, Microsoft SQL Server
- RDS DBs = cost efficient, re-sizeable, automates time consuming admin tasks such as hardware provisioning, DB setup, patching, and backups.
- DynamoDB NoSQL DB = consistent, single digit millisecond latency at any scale. Fully managed cloud DB and supports both document and key-value store models. Great fit for mobile, web, gaming, ad tech, IoT, and many other applications with low latency needs.
- NoSQL = used when data is fluid and can change (much like JSON data)

- ElastiCache = data caching service used to help improve speed/performance of web apps
- i.e. Redis = fast, open source, in memory data store and cache & Memcached = widely adopted memory object caching system
- Redshift = Large data warehouse (DWH) DB service designed to handle petabytes of data for analysis. Simple & cost effective, use SQL to analyze your data. Uses query optimization, columnar storage on high performance local disks, and massively parallel query execution for fast queries.

Use the above image and zoom in to find a model to use to solve your problem.

]]>```
echo $VIRTUAL_ENV # to check your virtualenvs
# Create virtualenv
mkvirtualenv --python=`which python3` new_env
# Activate virtualenv
workon new_env
# Deactivate virtualenv when done with dev
deactivate
```

]]>- A Network Load Balancer is a type of load balancer for increasing network performance, and managing TLS and TCP traffic.
- The Classic Load Balancer was the previous generation load balancer that managed HTTP, HTTPS, and TCP traffic. It is now used primarily for anyone utilizing the EC2-Classic network.
- An Application Load Balancer is a type of load balancer used for balancing traffic over HTTP and HTTPS.