Get Started

Intel Get Started User guide

  • Hello! I've had a look through the user manual for the Intel oneAPI Data Analytics Library (oneDAL). This document provides information on how to use the library to speed up big data analysis, including preprocessing, modeling, and validation. I can assist with questions about the library's features, algorithms, or setup instructions.
  • Where is oneDAL located?
    What operating system is oneDAL available for?
    How to read data from CSV file into a table?
Get Started with Intel® oneAPI oneAPI
Data Analytics Library
Contents
Chapter 1: Get Started with the Intel® oneAPI Data Analytics Library
Get Started with Intel® oneAPI oneAPI Data Analytics Library
2
Get Started with the Intel®
oneAPI Data Analytics Library 1
Intel® oneAPI Data Analytics Library (oneDAL) is a library that helps speed up big data analysis by providing
highly optimized algorithmic building blocks for all stages of data analytics (preprocessing, transformation,
analysis, modeling, validation, and decision making) in batch, online, and distributed processing modes of
computation.
For general information about oneDAL, visit oneDAL official page.
Before You Begin
oneDAL is located in <install_dir>/dal directory where <install_dir> is the directory in which Intel®
oneAPI Base Toolkit was installed.
The current version of oneDAL with SYCL support is available for Linux* and Windows* 64-bit operating
systems. The prebuilt oneDAL libraries can be found in the <install_dir>/dal/<version>/redist
directory.
To learn about the system requirements and the dependencies needed to build examples, refer to the System
Requirements page.
End-to-end Example
Below you can find a typical usage workflow for a oneDAL algorithm on GPU. The example is provided for
Principal Component Analysis algorithm (PCA).
The following steps depict how to:
Read the data from CSV file
Run the training and inference operations for PCA
Access intermediate results obtained at the training stage
1. Include the following header that makes all oneDAL declarations available.
#include "oneapi/dal.hpp"
/* Standard library headers required by this example */
#include <cassert>
#include <iostream>
2. Create a SYCL* queue with the desired device selector. In this case, GPU selector is used:
const auto queue = sycl::queue{ sycl::gpu_selector{} };
3. Since all oneDAL declarations are in the oneapi::dal namespace, import all declarations from the
oneapi namespace to use dal instead of oneapi::dal for brevity:
using namespace oneapi;
4. Use CSV data source to read the data from the CSV file into a table:
const auto data = dal::read<dal::table>(queue, dal::csv::data_source{"data.csv"});
5. Create a PCA descriptor, configure its parameters, and run the training algorithm on the data loaded
from CSV.
const auto pca_desc = dal::pca::descriptor<float>
.set_component_count(3)
.set_deterministic(true);
const dal::pca::train_result train_res = dal::train(queue, pca_desc, data);
Get Started with the Intel® oneAPI Data Analytics Library 1
3
6. Print the learned eigenvectors:
const dal::table eigenvectors = train_res.get_eigenvectors();
const auto acc = dal::row_accessor<const float>{eigenvectors};
for (std::int64_t i = 0; i < eigenvectors.row_count(); i++) {
/* Get i-th row from the table, the eigenvector stores pointer to USM */
const dal::array<float> eigenvector = acc.pull(queue, {i, i + 1});
assert(eigenvector.get_count() == eigenvectors.get_column_count());
std::cout << i << "-th eigenvector: ";
for (std::int64_t j = 0; j < eigenvector.get_count(); j++) {
std::cout << eigenvector[j] << " ";
}
std::cout << std::endl;
}
7. Use the trained model for inference to reduce dimensionality of the data:
const dal::pca::model model = train_res.get_model();
const dal::table data_transformed =
dal::infer(queue, pca_desc, data).get_transformed_data();
assert(data_transformed.column_count() == 3);
Build and Run Examples
Perform the following steps to build and run examples demonstrating the basic usage scenarios of oneDAL
with SYCL support. Go to <install_dir>/dal/<version> and then set up an environment as shown in the
example below:
NOTE All content below that starts with # is considered a comment and should not be run with the
code.
1. Set up the required environment for oneDAL (variables such as CPATH, LIBRARY_PATH, and
LD_LIBRARY_PATH):
On Linux, there are two possible ways to set up the required environment: via vars.sh script or via
modulefiles.
Setting up oneDAL environment via vars.sh script
Run the following command:
source ./env/vars.sh
Setting up oneDAL environment via modulefiles
1.Initialize modules:
source $MODULESHOME/init/bash
NOTE Refer to Environment Modules documentation for details.
2.Provide modules with a path to the modulefiles directory:
module use ./modulefiles
1 Get Started with Intel® oneAPI oneAPI Data Analytics Library
4
3.Run the module:
module load dal
On Windows, run the following command:
/env/vars.bat
2. Copy ./examples/oneapi/dpc to a writable directory if necessary (since it creates temporary files):
cp –r ./examples/oneapi/dpc ${WRITABLE_DIR}
3. Set up the compiler environment for Intel® oneAPI DPC++/C++ Compiler. See Get Started with Intel®
oneAPI DPC++/C++ Compiler for details.
4. Build and run the examples that show how to use oneDAL with SYCL support:
NOTE You need to have write permissions to the examples folder to build examples, and execute
permissions to run them. Otherwise, you need to copy examples/oneapi/dpc and examples/
oneapi/data folders to the directory with right permissions. These two folders must be retained in
the same directory level relative to each other.
On Linux:
# Navigate to the directory containing examples and then build them:
cd /examples/oneapi/dpc
make so example=svm_two_class_thunder_dense_batch # This will compile and run Correlation
example using Intel(R) oneAPI DPC++/C++ Compiler
make so mode=build # This compiles all examples in the current directory
On Windows:
# Navigate to the directory containing examples and then build them:
cd /examples/oneapi/dpc
nmake dll example=svm_two_class_thunder_dense_batch+ # This will compile and run Correlation
example using Intel(R) oneAPI DPC++/C++ Compiler
nmake dll mode=build # This compiles all examples in the current
directory
To see all available parameters of the build procedure, type make on Linux* or nmake on Windows*.
5. The resulting example binaries and log files are written into the _results directory.
NOTE You should run the examples from examples/oneapi/dpc folder, not from _results folder.
Most examples require data to be stored in examples/oneapi/data folder and to have a relative link
to it started from examples/oneapi/dpc folder.
You can build traditional C++ examples located in examples/oneapi/cpp folder in a similar way.
Compile and build applications with pkg-config
The pkg-config tool is a widely used tool for building software with dependencies. Intel® oneAPI Data
Analytics Library provides files with pkg-config metadata for compiling and linking an application to the
library.
Set up the environment
To use pkg-config, build the library and then set up the environment using vars.sh or vars.bat scripts:
On Linux: source ./env/vars.sh
On Windows: /env/vars.bat
Get Started with the Intel® oneAPI Data Analytics Library 1
5
Choose a metadata file
The metadata files provided by oneDAL cover only host device configuration on 64-bit Linux, macOS, or
Windows operating system for C++.
Choose the metadata file based on oneDAL threading mode and linking method you use:
oneDAL pkg-config metadata files
Single-threaded (non-threaded) Multi-threaded (internally
threaded)
Static linking dal-static-sequential-host dal-static-threading-host
Dynamic linking dal-dynamic-sequential-host dal-dynamic-threading-host
Compile a program using pkg-config
To compile a test.cpp program with oneDAL and pkg-config, provide the name of the oneDAL pkg-config
metadata file as an input parameter. For example:
On Linux or macOS:
icc test.cpp pkg-config --cflags --libs dal-dynamic-threading-host
On Windows:
for /F "delims=," %i in ('pkg-config --cflags --libs dal-dynamic-threading-host) do icl test.cpp
%i
A sample code for svm_two_class_thunder_dense_batch example with SYCL support. Run the following
from the examples/oneapi/cpp directory:
On Linux or macOS:
icc -I source/ source/svm/svm_two_class_thunder_dense_batch.cpp icc test.cpp pkg-config --cflags
--libs dal-dynamic-threading-host
On Windows:
for /F "delims=," %i in ('pkg-config --cflags --libs dal-dynamic-threading-host) do icl -I
source/ icl svm_two_class_thunder_dense_batch.cpp %i
Find More
Document Description
Developer Guide and Reference Refer to oneDAL Developer Guide and Reference for
detailed information about implemented algorithms.
System Requirements Check system requirements before you install Intel® oneAPI
Data Analytics Library.
Release Notes Refer to release notes for Intel® oneAPI Data Analytics
Library to learn about new updates in the latest release.
Code Samples Learn how to use oneDAL with daal4py, a Python* API.
oneDAL Specification Learn about requirements for implementations of oneAPI
Data Analytics Library.
Notices and Disclaimers
Intel technologies may require enabled hardware, software or service activation.
1 Get Started with Intel® oneAPI oneAPI Data Analytics Library
6
No product or component can be absolutely secure.
Your costs and results may vary.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its
subsidiaries. Other names and brands may be claimed as the property of others.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this
document.
The products described may contain design defects or errors known as errata which may cause the product
to deviate from published specifications. Current characterized errata are available on request.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of
merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from
course of performance, course of dealing, or usage in trade.
Get Started with the Intel® oneAPI Data Analytics Library 1
7
/