2020년 2월 26일 수요일

Costs of Partitioning

Costs of Partitioning




Depending on the actual value distribution and partitioning criteria, the main memory consumption of a table might increase or decrease when it is changed from a non-partitioned to a partitioned table. While this does not initially appear very intuitive, the root cause for this lies in the dictionary compression that is applied.

  • Increased memory consumption due to partitioning

    A table has two attributes, MONTH and YEAR, and contains data for all 12 months and two distinct years (2013 and 2014). When the table is partitioned by YEAR, the dictionary for the MONTH attribute needs to be held in memory twice (both for 2013 and 2014), therefore increasing memory consumption.
  • Decreased memory consumption due to partitioning

    A table has two attributes, GENDER and FIRSTNAME, and stores data about German customers. When the table is partitioned by GENDER, it is divided into two groups (female and male). In Germany, there is a limited set of first names for both females and males. As a result, the FIRSTNAME dictionaries are implicitly partitioned as well into two almost distinct groups, both containing almost n/2 distinct values, compared to the unpartitioned table with n distinct values. Therefore, to represent those values in the index vector, only n-1 bits are required instead of n bits in the original table. As there is virtually no redundancy in the dictionaries, memory consumption can be reduced by partitioning.

2020년 2월 25일 화요일

SAP HANA Query Execution Engine

SAP HANA Query Execution Engine






1. HEX engine

The SAP HANA Execution Engine (HEX) is a new engine that combines the functionality of other engines, such as the join engine and OLAP engine. Queries that are not supported by HEX or where an execution is not considered beneficial are automatically routed to the former engine.


2. ESX engine

The SAP HANA Extended SQL Executor (ESX) is a new frontend execution engine that replaces the row engine in part, but not completely. It retrieves database requests at session level and delegates them to lower-level engines like the join engine and calculation engine.





3. Join engine

The join engine is used to run plain SQL. Column tables are processed in the join engine.


4. OLAP engine

The OLAP engine is primarily used to process aggregate operations. Calculated measures (unlike calculated columns) are processed in the OLAP engine.


5. Calculation engine

Calculation views, including star joins, are processed by the calculation engine. To do so, the calculation engine may call any of the other engines directly or indirectly.


6. Row engine

The row engine is designed for OLTP scenarios. Some functionality, such as particular date conversions or window functions, are only supported in the row engine. The row engine is also used when plain SQL and calculation engine functions are mixed in a calculation view.


7. MDS engine

SAP HANA multi-dimensional services (MDS) is used to process multidimensional queries including aggregation, transformation, and calculation.
The queries are translated into an SAP HANA calculation engine execution plan or SQL, which is executed by the SAP HANA core engines.
MDS is integrated with the SAP HANA Enterprise Performance Management (EPM) platform and is used by reporting and planning applications.
The query languages currently supported use the Information Access (InA) model. The InA model simplifies the definition of queries with rich or even complex semantics. Data can be read from all kinds of SAP HANA views, EPM plan data containers, and so on. The InA model also includes spatial (GIS) and search features.

SQL Processing



SQL Processing

The SAP HANA SQL process starts with a SELECT statement, which is then looked up in the cache, parsed, checked, optimized, and made into a final execution plan. An overview is shown below:
The main components involved in processing an SQL query are the session, the SQL frontend, the SQL optimizer, and the execution plan.





Query Optimization Steps: Overview

2020년 2월 24일 월요일

Costs Associated with Indexes


Costs Associated with Indexes

Indexes entail certain costs. Memory consumption and incremental maintenance are two major cost factors that need to be considered.

Memory Consumption

To store the mapping information of value IDs to records, each index uses an inverted list that needs to be kept in main memory. This list typically requires an amount of memory that is of the same order of magnitude as the index vector of the corresponding attribute. When creating a concatenated index (for more information, see above), there is even more overhead because an additional dictionary containing the concatenated values of all participating columns needs to be created as well. It is difficult to estimate the corresponding overhead, but it is usually notably higher than the summed-up size of the dictionaries of the participating columns. Therefore, concatenated indexes should be created with care.
The memory needed for inverted individual indexes includes the memory for inverted indexes on all indexed columns. This memory usage can be queried as a sum of M_CS_COLUMNS.MEMORY_SIZE_INDEX for all indexed columns.
The main advantage of the inverted individual index is its low memory footprint. The memory needed for inverted individual indexes is much smaller than the memory used for the internal index key column that is required for inverted value and inverted hash indexes. In addition, the data and log I/O overhead, table load time, CPU needed for the delta merge, and the overhead of DML update operations are also reduced because an internal column does not need to be maintained. Similarly, the DDL to create an inverted individual index is also much faster than for the other index types because the concatenated string does not need to be generated.

Incremental Maintenance

Whenever a DML operation is performed on the base table, the corresponding index structures need to be updated as well (for example, by inserting or deleting entries). These additional maintenance costs add to the costs on the base relation, and, depending on the number of indexes created, the number of attributes in the base table, and the number of attributes in the individual indexes, might even dominate the actual update time. Again, this requires that care is taken when creating additional indexes.

Index structure in SAP HANA

The example below illustrates the direct mapping of dictionary values IDs to table row IDs using an inverted index (shown on the right)
The column dictionary contains all existing column values in sorted order, but it does not provide any information about which rows of the table contain the individual values. 
The mapping between the dictionary value IDs and the related table row IDs is only available through the inverted index. 
Without the index, the whole column would have to be scanned to find a specific value:




Recomendation when creating indexes in SAP HANA


Recomendation when creating indexes in SAP HANA


  • Avoid non-unique indexes
Columns in a column table are inherently index-like and therefore do not usually benefit from additional indexes. In some scenarios (for example, multiple-column joins or unique constraints), indexes can further improve performance.
Start without any indexes and then add them if needed.

  • Create as few indexes as possible
Every index imposes an overhead in terms of space and performance, so you should create as few indexes as possible.


  • Ensure that the indexes are as small as possible
Specify as few columns as possible in an index so that the space overhead is minimized.


  • Prefer single-column indexes in the column store
Single-column indexes in the column store have a much lower space overhead because they are just light-weight data structures created on top of the column structure. Therefore, you should use single-column indexes whenever possible.
Due to the in-memory approach in SAP HANA environments, it is generally sufficient to define an index on only the most selective column. (In other relational databases, optimal performance can often only be achieved by using a multi-column index.)