Embarking on a journey to understand and implement machine learning can seem daunting, but it doesn’t have to be. This beginner machine learning tutorial aims to demystify the process, offering a comprehensive introduction to scikit-learn — a powerful Python library widely used for building ML models. Whether you are new to data science or looking to enhance your current understanding, this guide will provide you with essential knowledge and hands-on examples to get you started. Read on to learn how to use scikit-learn effectively and start building your own machine learning models today.
Introduction to Machine Learning and Scikit-Learn
Machine learning (ML) is a subset of artificial intelligence (AI) that focuses on the development of algorithms which enable computers to learn from and make decisions based on data. These algorithms can detect patterns, make predictions, and improve over time with minimal human intervention. Common applications of ML include recommendation systems, fraud detection, image recognition, and natural language processing.
Scikit-learn, also referred to as sklearn, is an open-source Python library that provides simple and efficient tools for data mining and data analysis. It is built on popular foundations like NumPy, SciPy, and Matplotlib, making it an essential library for anyone diving into machine learning with Python. Scikit-learn is beginner-friendly while also being powerful enough to support more advanced ML research and applications.
The main features of scikit-learn include:
- Classification: Identifying to which category an object belongs. Example algorithms are SVM, nearest neighbors, random forest, etc.
- Regression: Predicting a continuous-valued attribute associated with an object. Example algorithms include linear regression, ridge regression, etc.
- Clustering: Grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. Example algorithms are k-means, spectral clustering, etc.
- Dimensionality Reduction: Reducing the number of random variables to consider. Examples include PCA, feature selection, and non-negative matrix factorization.
- Model Selection: Comparing, validating, and choosing parameters and models. This includes functionalities such as grid search, cross-validation, and metrics.
- Preprocessing: Feature extraction and normalization. Examples include vectorizing, scaling, and handling missing values.
Scikit-learn follows a well-defined structure, which includes important modules:
- Datasets: Utilities to load and fetch datasets. Example:
datasets.load_iris()
. - Model Selection: Tools to tune models and split data into training and testing sets. Example:
model_selection.train_test_split()
. - Preprocessing: Tools for standardizing, normalizing, and encoding data. Example:
preprocessing.StandardScaler()
. - Metrics: Functions to measure the performance of ML models. Example:
metrics.accuracy_score()
.
To understand the power and simplicity of scikit-learn, let’s look at a small code snippet demonstrating a basic usage scenario: classifying the iris dataset using a k-nearest neighbors classifier.
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Load the dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Initialize and train the classifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
# Make predictions and evaluate the model
y_pred = knn.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
In this example, we load the iris dataset, split it into training and testing sets, preprocess the data by standardizing it, train a k-nearest neighbors classifier, and then evaluate its accuracy. This pipeline demonstrates how scikit-learn seamlessly integrates various steps in a machine learning workflow, ensuring a streamlined and intuitive experience.
For further details and comprehensive user guides, the official scikit-learn documentation is an excellent resource to explore more features and techniques essential for building robust ML models.
Scikit-Learn Installation: Setting Up Your Environment
Setting up your environment to use Scikit-Learn effectively involves a few important steps. This guide will take you from having no setup to having a fully functional Scikit-Learn installation, ready for you to build ML models.
Prerequisites
Before installing Scikit-Learn, ensure you have Python (at least version 3.6) installed on your machine. If not, you can download it from the official Python website.
Using Virtual Environments
To avoid conflicts with other projects, it’s a good practice to use virtual environments. You can create a virtual environment with venv
or virtualenv
.
# Using venv
python -m venv myenv
# Activate the virtual environment
# On Windows
myenv\Scripts\activate
# On macOS/Linux
source myenv/bin/activate
Installing Scikit-Learn
Once your virtual environment is active, you can install Scikit-Learn using pip
. The following command will install Scikit-Learn and its dependencies, including NumPy, SciPy, and joblib.
pip install scikit-learn
Alternatively, you can also install Scikit-Learn via conda if you are using Anaconda or Miniconda. This can be more convenient as it handles the dependencies more efficiently.
conda install scikit-learn
Verifying Installation
To confirm the installation, you can open a Python interpreter and run:
import sklearn
print(sklearn.__version__)
If the above script runs without errors and shows the version of Scikit-Learn, your installation is successful.
Installing Additional Tools
For a complete data science environment, consider installing Jupyter Notebook and Pandas. Jupyter Notebook allows you to create and share documents that contain live code, equations, visualizations, and narrative text. Pandas is essential for data manipulation and analysis.
pip install jupyter pandas
Checking Compatibility
Scikit-Learn releases updates that may introduce new features or deprecate old ones. To ensure compatibility with your existing code, you can freeze your environment’s current package versions to a requirements file:
pip freeze > requirements.txt
Whenever you need to recreate this environment, you can use the following command:
pip install -r requirements.txt
To check for any issues related to the Scikit-Learn version and dependencies, refer to the official Scikit-Learn documentation.
Integrated Development Environments (IDEs)
Using an Integrated Development Environment (IDE) such as PyCharm, Visual Studio Code, or JupyterLab can enhance your ML development experience. These environments support Scikit-Learn and offer valuable tools for coding, debugging, and visualization.
For a streamlined setup, most modern IDEs integrate well with virtual environments. For instance, in Visual Studio Code, you can configure your settings.json
to use the virtual environment automatically:
{
"python.pythonPath": "myenv/bin/python"
}
Following these steps will ensure that your environment is properly configured, allowing you to proceed smoothly with building, testing, and deploying your ML models using Scikit-Learn.
Understanding Core Concepts: Machine Learning Basics
Machine Learning Basics revolves around several fundamental concepts that serve as the building blocks for understanding and implementing machine learning models. Before diving into practical aspects such as building and tuning models with Scikit-Learn, it is crucial to grasp these core ideas.
Supervised vs. Unsupervised Learning
Supervised Learning involves training a model on a labeled dataset, which means the target outcomes are known. Typical applications include classification (e.g., spam detection in emails) and regression (e.g., predicting house prices).
Here’s an example of supervised learning using Scikit-Learn’s LinearRegression
:
from sklearn.linear_model import LinearRegression
import numpy as np
# Sample data
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.dot(X, np.array([1, 2])) + 3
# Initialize and fit the model
model = LinearRegression().fit(X, y)
predictions = model.predict(np.array([[3, 5]]))
print(predictions) # Output: Predicted value based on the model
Unsupervised Learning, on the other hand, deals with unlabeled data. The algorithm tries to learn the patterns and structure from the data itself. Common examples include clustering (e.g., customer segmentation) and dimensionality reduction (e.g., reducing the number of features for visualization).
Example of K-Means clustering with Scikit-Learn:
from sklearn.cluster import KMeans
import numpy as np
# Sample data
X = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]])
# Initialize and fit the model
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
labels = kmeans.labels_
print(labels) # Output: Cluster labels for each data point
Features and Labels
In machine learning terminology:
- Features are the input variables (independent variables) used to make predictions. For instance, in predicting house prices, features could include size, location, and age of the property.
- Labels are the output variables (dependent variables), which are the results we want to predict.
Model Training and Evaluation
Model training involves feeding the algorithm with data so it can learn the mapping between input features and outputs (labels). Evaluation determines how well the model performs by testing it on new, unseen data. Metrics such as accuracy, precision, recall, and the F1 score are commonly used to evaluate classification models, while metrics like Mean Absolute Error (MAE) and Mean Squared Error (MSE) are used for regression models.
Example of evaluating a classification model using accuracy:
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
# Load dataset
iris = load_iris()
X, y = iris.data, iris.target
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train the model
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}') # Output: Model accuracy score
Overfitting and Underfitting
Overfitting occurs when a model is too complex and captures noise in the data rather than the intended outputs. This can result in poor performance on unseen data. Underfitting happens when a model is too simple to capture the underlying structure of the data, leading to poor performance even on training data.
Regularization techniques such as Lasso and Ridge regression are often used to combat overfitting in Scikit-Learn:
from sklearn.linear_model import Ridge
# Ridge Regression to prevent overfitting
ridge_model = Ridge(alpha=1.0).fit(X_train, y_train)
ridge_predictions = ridge_model.predict(X_test)
print(ridge_predictions) # Output: Predicted values with Ridge Regression
Understanding these core concepts equips you with the necessary foundation to delve deeper into machine learning, allowing for more effective application and troubleshooting as you progress with Scikit-Learn. For further reading, check out the detailed Scikit-Learn documentation here.
Building Your First ML Model: A Step-by-Step Guide
Building your first ML model with Scikit-Learn is a manageable and rewarding experience, especially if you’re new to machine learning. This guide will walk you through a step-by-step process, from loading your data to evaluating your model’s performance.
Step 1: Import Necessary Libraries
First, ensure that you have Scikit-Learn installed. You can verify this by running:
pip install scikit-learn
Then, import the necessary libraries:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
Step 2: Load Your Dataset
For this example, we’ll use the famous Iris dataset, which is included in Scikit-Learn:
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
Step 3: Split the Dataset
To evaluate the performance of our model, it is essential to split our dataset into training and testing sets:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
Here, test_size=0.3
denotes that we are using 30% of the data for testing, and random_state=42
ensures reproducibility.
Step 4: Choose an Algorithm and Instantiate the Model
We will use Logistic Regression for this example. Scikit-Learn provides a variety of ML algorithms, but Logistic Regression is straightforward and works well with this dataset:
model = LogisticRegression()
Step 5: Train the Model
Fit the model using the training data:
model.fit(X_train, y_train)
Step 6: Make Predictions
Use the trained model to make predictions on the test set:
y_pred = model.predict(X_test)
Step 7: Evaluate the Model
Evaluate the model’s performance using accuracy as the metric:
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')
Scikit-Learn also offers other evaluation metrics, such as precision, recall, and F1-score, accessible through sklearn.metrics
.
Example and Best Practices
Here’s the complete script to clarify each step:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris
# Load Dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Instantiate and Train Model
model = LogisticRegression()
model.fit(X_train, y_train)
# Make Predictions
y_pred = model.predict(X_test)
# Evaluate Model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')
Alternative Algorithms
While Logistic Regression is suitable for beginners, Scikit-Learn provides various algorithms that might be more appropriate depending on your specific problem. For instance, Decision Trees (DecisionTreeClassifier
), Random Forests (RandomForestClassifier
), and Support Vector Machines (SVC
) are popular alternatives. Explore the official Scikit-Learn documentation for a comprehensive list of algorithms and their usage.
Exploring Scikit-Learn Examples: Practical Applications
Scikit-learn, a powerful and versatile Python library, provides a plethora of examples that demonstrate how to implement various machine learning algorithms in real-world scenarios. These examples cover a broad range of tasks, showcasing the applicability of different ML models on diverse datasets. Let’s dive into some practical applications of the most commonly used models using Scikit-learn.
Linear Regression: Housing Prices Prediction
One classic example is predicting housing prices based on a set of features such as the size of the house, the number of bedrooms, and the age of the house. Scikit-learn’s LinearRegression
class is ideal for this type of problem.
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Load dataset
dataset = load_boston()
X, y = dataset.data, dataset.target
# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize and train model
model = LinearRegression()
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
Classification: Handwritten Digits Classification
Another popular example is classifying handwritten digits using a Support Vector Machine (SVM). Scikit-learn’s svm.SVC
class can accomplish this task with high accuracy.
from sklearn import datasets, svm, metrics
from sklearn.model_selection import train_test_split
# Load dataset
digits = datasets.load_digits()
# Flatten the images and split dataset
n_samples = len(digits.images)
X = digits.images.reshape((n_samples, -1))
y = digits.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)
# Initialize and train model
model = svm.SVC(gamma=0.001)
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
print(f'Classification report:\n{metrics.classification_report(y_test, y_pred)}')
Clustering: Customer Segmentation
Clustering is an unsupervised learning technique often used in customer segmentation. The KMeans algorithm is one of the simplest and most commonly used clustering methods in Scikit-learn.
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
# Generate synthetic dataset
X, _ = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)
# Initialize and fit model
model = KMeans(n_clusters=4)
model.fit(X)
# Predict cluster labels
y_kmeans = model.predict(X)
# Plot results
plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')
centers = model.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='red', s=200, alpha=0.75, marker='X')
plt.show()
Decision Trees: Breast Cancer Classification
Decision Trees are another popular classification algorithm, particularly useful when interpretability is crucial. Scikit-learn’s DecisionTreeClassifier
is straightforward to implement.
from sklearn.datasets import load_breast_cancer
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target
# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize and train model
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')
These examples illustrate just a fraction of what you can achieve using Scikit-learn. Whether you are a machine learning beginner or looking to experiment with advanced techniques, these practical applications can serve as a solid foundation to build upon. You can explore more examples directly from the official Scikit-learn documentation.
Tuning and Evaluating ML Models: Best Practices and Tips
Once you have built an initial ML model using Scikit-Learn, the next critical phase involves tuning and evaluating the model to ensure it offers the best performance possible. Here are some best practices and tips that can help you in this phase:
Hyperparameter Tuning
Hyperparameters are parameters that are set before the learning process begins and are not learned from the data. They play a crucial role in model performance. Scikit-Learn provides several tools to facilitate hyperparameter optimization:
- GridSearchCV: This exhaustive search helps you find the optimal hyperparameters by trying every possible combination.
from sklearn.model_selection import GridSearchCV # Example using a RandomForestClassifier param_grid = { 'n_estimators': [50, 100, 200], 'max_depth': [None, 10, 20, 30], 'min_samples_split': [2, 5, 10] } grid_search = GridSearchCV(estimator=RandomForestClassifier(), param_grid=param_grid, cv=5, n_jobs=-1) grid_search.fit(X_train, y_train) print(f'Best parameters: {grid_search.best_params_}')
- RandomizedSearchCV: This technique is more efficient than GridSearchCV as it randomly samples a subset of hyperparameter combinations.
from sklearn.model_selection import RandomizedSearchCV from sklearn.ensemble import RandomForestClassifier param_distributions = { 'n_estimators': [50, 100, 200], 'max_depth': [None, 10, 20, 30], 'min_samples_split': [2, 5, 10] } random_search = RandomizedSearchCV(estimator=RandomForestClassifier(), param_distributions=param_distributions, n_iter=10, cv=5, n_jobs=-1) random_search.fit(X_train, y_train) print(f'Best parameters: {random_search.best_params_}')
Model Evaluation
Model evaluation metrics are critical for assessing the performance of your machine learning models. Scikit-Learn offers various metrics and tools to help you in this process:
- Cross-validation: This helps determine how the model generalizes to an independent dataset. A common approach is to use K-Fold Cross-Validation.
from sklearn.model_selection import cross_val_score scores = cross_val_score(estimator=RandomForestClassifier(), X=X_train, y=y_train, cv=5, scoring='accuracy') print(f'Cross-validation scores: {scores}') print(f'Mean cross-validation score: {scores.mean()}')
- Confusion Matrix and Classification Report: These provide more detailed error analysis for classification tasks.
from sklearn.metrics import confusion_matrix, classification_report y_pred = model.predict(X_test) cm = confusion_matrix(y_test, y_pred) cr = classification_report(y_test, y_pred) print(f'Confusion Matrix:\n{cm}') print(f'Classification Report:\n{cr}')
- Receiver Operating Characteristic (ROC) Curve: This is useful for evaluating binary classifiers based on their true positive vs. false positive rates across different thresholds.
from sklearn.metrics import roc_curve, auc y_prob = model.predict_proba(X_test)[:,1] fpr, tpr, thresholds = roc_curve(y_test, y_prob) roc_auc = auc(fpr, tpr) print(f'ROC AUC: {roc_auc}')
Other Tips
- Feature Scaling: Ensure that your features are scaled, especially when using algorithms sensitive to the scale of data such as SVM or KNN.
from sklearn.preprocessing import StandardScaler scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test)
- Handling Imbalanced Datasets: If your dataset is imbalanced, consider techniques like oversampling the minority class, undersampling the majority class, or using algorithms that can handle imbalances like SMOTE or class weights in your models.
from imblearn.over_sampling import SMOTE smote = SMOTE() X_resampled, y_resampled = smote.fit_resample(X_train, y_train)
Following these best practices and tips will significantly enhance your model’s robustness and accuracy. For further reading, refer to specific Scikit-Learn documentation pages on model selection and metrics.
Advanced Techniques: Beyond the Basics of Scikit-Learn
Once you’re comfortable with the foundational aspects of Scikit-Learn and have built your initial ML models, you might find yourself looking to enhance the sophistication and performance of your models. Scikit-Learn offers a suite of advanced techniques that can help you fine-tune your models, handle more complex datasets, and improve predictive accuracy.
Feature Engineering with Pipelines
Scikit-Learn’s Pipeline class is a powerful utility that allows for a streamlined sequence of transformations and model fitting. Pipelines are instrumental for feature engineering as they help maintain the fidelity of complex workflows.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.svm import SVC
pipe = Pipeline([
('scaler', StandardScaler()),
('pca', PCA(n_components=2)),
('svc', SVC(kernel='linear'))
])
pipe.fit(X_train, y_train)
In this example, data Standardization, Principal Component Analysis (PCA), and fitting a Support Vector Classifier (SVC) are executed sequentially. This ensures that each step in the model training process is applied consistently during both training and testing.
Hyperparameter Tuning with GridSearchCV and RandomizedSearchCV
Choosing the right hyperparameters can drastically improve your model’s performance. Scikit-Learn provides GridSearchCV
and RandomizedSearchCV
for hyperparameter tuning.
GridSearchCV
GridSearchCV
exhaustively searches over a specified parameter grid:
from sklearn.model_selection import GridSearchCV
param_grid = {'C': [0.1, 1, 10], 'gamma': [1, 0.1, 0.01]}
grid = GridSearchCV(SVC(), param_grid, refit=True, verbose=2)
grid.fit(X_train, y_train)
print(grid.best_params_)
RandomizedSearchCV
On the other hand, RandomizedSearchCV
selects random combinations of parameters and is more efficient when the parameter space is large.
from sklearn.model_selection import RandomizedSearchCV
param_dist = {'C': [0.1, 1, 10], 'gamma': [1, 0.1, 0.01]}
random_search = RandomizedSearchCV(SVC(), param_distributions=param_dist, n_iter=10, cv=5, verbose=2)
random_search.fit(X_train, y_train)
print(random_search.best_params_)
Cross-Validation Strategies
Cross-validation is essential for assessing the generalizability of your model. Beyond the basic K-Fold Cross-Validation, Scikit-Learn offers variations like StratifiedKFold and TimeSeriesSplit.
StratifiedKFold
Particularly useful for classification tasks:
from sklearn.model_selection import StratifiedKFold
skf = StratifiedKFold(n_splits=5)
for train_index, test_index in skf.split(X, y):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
TimeSeriesSplit
Designed for time-series data where sequential dependency is important:
from sklearn.model_selection import TimeSeriesSplit
tscv = TimeSeriesSplit(n_splits=3)
for train_index, test_index in tscv.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
Advanced Model Evaluation Metrics
While accuracy is a common metric, more nuanced metrics such as Precision, Recall, F1-Score, and ROC-AUC are often more insightful, especially for imbalanced datasets.
from sklearn.metrics import precision_score, recall_score, f1_score, roc_auc_score
y_pred = model.predict(X_test)
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("F1 Score:", f1_score(y_test, y_pred))
print("ROC AUC:", roc_auc_score(y_test, y_pred_proba[:, 1]))
Ensemble Techniques
Leverage the power of ensemble methods with Scikit-Learn’s implementations of Voting Classifier and Stacking:
Voting Classifier
Combines multiple models into a single model to enhance performance.
from sklearn.ensemble import VotingClassifier
ensemble = VotingClassifier(estimators=[('lr', LogisticRegression()),
('rf', RandomForestClassifier()),
('svc', SVC(probability=True))], voting='soft')
ensemble.fit(X_train, y_train)
Stacking
Ensemble strategy where multiple models’ outputs are used as inputs for a final estimator.
from sklearn.ensemble import StackingClassifier
estimators = [('lr', LogisticRegression()), ('rf', RandomForestClassifier())]
stacking = StackingClassifier(estimators=estimators, final_estimator=SVC())
stacking.fit(X_train, y_train)
These advanced techniques can significantly improve the functionality, efficiency, and accuracy of your Scikit-Learn models, helping you tackle more complex and demanding machine learning tasks.
For further reading, refer to the Scikit-Learn Documentation.