Arenadata Documentation
Our passion is to build efficient flexible solutions that scale up to dozens of petabytes
Products
Explore our range of solutions in the world of Big Data

Arenadata Hadoop

Arenadata Hadoop (ADH) is a commercial distribution of the open-source Apache Hadoop software. It is a big data platform designed for storing, processing, and analyzing large volumes of structured and unstructured data.
Arenadata Hadoop includes various tools and components that are part of the Hadoop ecosystem, such as the Hadoop Distributed File System (HDFS), MapReduce, YARN, and various other Apache projects. It also includes additional software components and tools that are designed to make it easier to deploy, manage, and use Hadoop in enterprise environments.
Use cases
Big data analytics

ADH can be used to process and analyze large volumes of data, such as clickstream data, sensor data, social media data, and financial data. This can help businesses gain valuable insights into customer behavior, market trends, and other important metrics.

Machine learning and artificial intelligence

ADH can be used as a data processing platform for machine learning and artificial intelligence applications. This can help businesses to build predictive models, detect anomalies, and automate decision-making processes.

Data integration

ADH can be used to integrate data from multiple sources and formats into a unified, centralized data repository. This can help businesses to eliminate data silos and provide a single, consistent view of data.

Fraud detection and prevention

ADH can be used to detect and prevent fraud by analyzing large volumes of data in real-time. This can help businesses to identify and respond to fraudulent activities quickly, reducing losses and protecting their reputation.

Log analytics

ADH can be used to process and analyze log data generated by IT systems and applications. This can help businesses to troubleshoot issues, identify performance bottlenecks, and improve system reliability.

Enterprise
Community
Support for key Hadoop components
High availability and disaster recovery features
Advanced security features, including encryption, role-based access control
Automated management and monitoring tools
Deploy & upgrade automation
Offline installation
Technical support 24/7
Corporate training courses
Tailored solutions
Available integrations
ADQM
Arenadata QuickMarts
  • ADQM Spark connector provides the ability of high speed parallel data exchange between ADH Apache Spark and Arenadata QuickMarts (ADQM).
  • Hive JdbcStorageHandler supports reading from a JDBC data source in Hive.
  • Flink JDBC connector allows reading data from and writing data into any relational databases with a JDBC driver.
ADB
ADB
  • ADB Spark connector provides the ability of high speed parallel data exchange between Apache Spark and Arenadata DB (ADB).
  • Hive JdbcStorageHandler supports reading from a JDBC data source in Hive.
  • Flink JDBC connector allows reading data from and writing data into any relational databases with a JDBC driver.
ADPG
ADPG
  • Spark JDBC connector connects Spark to any JDBC-compatible database like Arenadata Postgres (ADPG) and unlocks new opportunities for data analysis, processing, and visualization.
  • Hive JdbcStorageHandler supports reading from a JDBC data source in Hive.
  • Flink JDBC connector allows reading data from and writing data into any relational databases with a JDBC driver.
ADS
ADS
  • Spark Streaming streamlines your real-time data processing with Spark Streaming, Kafka, or Arenadata Streaming (ADS), enabling seamless data ingestion, processing, and analysis at scale.
  • Flink Apache Kafka connector provides high-performance stream processing, enabling real-time data analysis, transformation, and visualization at scale.
Oracle
Oracle
  • Spark JDBC connector connects Spark to any JDBC-compatible database like Oracle and unlocks new opportunities for data analysis, processing, and visualization.
  • Hive JdbcStorageHandler supports reading from a JDBC data source in Hive.
  • Flink JDBC connector allows reading data from and writing data into any relational databases with a JDBC driver.
MS SQL
MS SQL
  • Spark JDBC connector connects Spark to any JDBC-compatible database like MS SQL and unlocks new possibilities for data analysis, processing, and visualization.
  • Hive JdbcStorageHandler supports reading from a JDBC data source in Hive.
  • Flink JDBC connector allows reading data from and writing data into any relational databases with a JDBC driver.
AWS S3
AWS S3
  • Hadoop AWS module provides support for AWS integration.
  • S3a connector provides a fast and efficient way to access data stored in Simple Storage Service (S3) from Spark applications.
  • Flink S3 connector allows to use S3 with Flink for reading and writing data as well as in conjunction with the streaming state backends.
Azure Storage
Azure Storage
  • Hadoop Azure module provides support for integration with ASB.
  • Spark WASB (Windows Azure Storage Blob) connector is an Apache Spark library that enables Spark applications to read and write data from Azure Blob Storage.
Azure Datalake
Azure Datalake
  • Spark ABFS (Azure Blob File System) connector provides an API for Spark applications to read and write data directly from ADLS Gen2 without the need to stage data on a local disk.
  • Flink ABS allows to use Azure Blob Storage with Flink for reading and writing data.
GCS
GCS
  • Spark GS connector provides an API for Spark applications to read and write data directly from Google Cloud Storage (GCS) without the need to stage data on a local disk.
  • Flink GCP can be used for reading and writing data and for the checkpoint storage.
JDBC
JDBC
  • Spark JDBC connector connects Spark to any JDBC-compatible database and unlocks new possibilities for data analysis, processing, and visualization.
  • Hive JdbcStorageHandler supports reading from JDBC data source in Hive.
  • Flink JDBC connector allows reading data from and writing data into any relational databases with a JDBC driver.
Solr
Solr
The Spark Solr integration is a library that allows Spark applications to read and write data to Apache Solr. With the Spark Solr integration, Spark applications can read data from Solr using SolrRDD, which allows the parallelization of data processing across a Spark cluster.
Phoenix
Phoenix

The Spark Apache Phoenix integration is a library that enables Spark applications to interact with Apache Phoenix, which is an open-source SQL wrapper for Apache HBase, that provides a way to use an SQL-like syntax to query and manage data stored in HBase.

With the Spark Apache Phoenix integration, Spark applications can read data from Phoenix tables using PhoenixRDD, which provides a distributed representation of the data stored in a Phoenix table.

Zeppelin
Zeppelin

Apache Zeppelin is a web-based notebook interface for interactive data analytics with Apache Hadoop. It allows to create and execute data-driven workflows using a variety of languages within a single, integrated environment.

Airflow
Airflow
Airflow2 is a platform for creating, scheduling, and monitoring data workflows. It provides a web-based interface for creating and managing workflows, which can include tasks such as data ingestion, transformation, and loading.
AVRO
AVRO
AVRO is a binary data format that is designed to be compact and fast. It supports schema evolution, which allows data schemas to change over time without requiring data to be rewritten or reloaded.
PARQUET
PARQUET
PARQUET is a columnar storage format that is optimized for processing large datasets. It stores data in a columnar fashion, which allows for faster access to individual columns and improved compression ratios.
ORC
ORC
ORC (Optimized Row Columnar) is another columnar storage format that is designed to be highly efficient and scalable. It supports compression and predicate push-down, which can greatly improve query performance.
DELTA
DELTA
DELTA is a transactional storage format that is built on top of Parquet and provides support for ACID transactions. It also supports schema evolution and provides features like versioning and time travel.
XML
XML
XML is a markup language used for representing structured data. Spark can handle XML data by using libraries like spark-xml.
JSON
JSON
JSON (JavaScript Object Notation) is a lightweight data format that is commonly used for exchanging data between applications. Spark has built-in support for reading and writing JSON data.
Operating systems
Alt Linux
Alt Linux 8.4 SP is supported
CentOS
CentOS 7 is supported
RedHat
RedHat 7 is supported
Astra Linux
Astra Linux SE 1.7 Orel is supported
Ubuntu
Ubuntu 22.04.2 LTS in development
RedOS
RedOS 7.3 in development
Support for key Hadoop components
High availability and disaster recovery features
Advanced security features, including encryption, role-based access control
Automated management and monitoring tools
Deploy & upgrade automation
Offline installation
Technical support 24/7
Corporate training courses
Tailored solutions
Available integrations
ADQM
Arenadata QuickMarts
Available only for Enterprise
ADB
ADB
Available only for Enterprise
ADPG
ADPG
  • Spark JDBC connector connects Spark to any JDBC-compatible database like Arenadata Postgres (ADPG) and unlocks new opportunities for data analysis, processing, and visualization.
  • Hive JdbcStorageHandler supports reading from a JDBC data source in Hive.
  • Flink JDBC connector allows reading data from and writing data into any relational databases with a JDBC driver.
ADS
ADS
  • Spark Streaming streamlines your real-time data processing with Spark Streaming, Kafka, or Arenadata Streaming (ADS), enabling seamless data ingestion, processing, and analysis at scale.
  • Flink Apache Kafka connector provides high-performance stream processing, enabling real-time data analysis, transformation, and visualization at scale.
Oracle
Oracle
  • Spark JDBC connector connects Spark to any JDBC-compatible database like Oracle and unlocks new opportunities for data analysis, processing, and visualization.
  • Hive JdbcStorageHandler supports reading from a JDBC data source in Hive.
  • Flink JDBC connector allows reading data from and writing data into any relational databases with a JDBC driver.
MS SQL
MS SQL
  • Spark JDBC connector connects Spark to any JDBC-compatible database like MS SQL and unlocks new possibilities for data analysis, processing, and visualization.
  • Hive JdbcStorageHandler supports reading from a JDBC data source in Hive.
  • Flink JDBC connector allows reading data from and writing data into any relational databases with a JDBC driver.
AWS S3
AWS S3
  • Hadoop AWS module provides support for AWS integration.
  • S3a connector provides a fast and efficient way to access data stored in Simple Storage Service (S3) from Spark applications.
  • Flink S3 connector allows to use S3 with Flink for reading and writing data as well as in conjunction with the streaming state backends.
Azure Storage
Azure Storage
  • Hadoop Azure module provides support for integration with ASB.
  • Spark WASB (Windows Azure Storage Blob) connector is an Apache Spark library that enables Spark applications to read and write data from Azure Blob Storage.
Azure Datalake
Azure Datalake
  • Spark ABFS (Azure Blob File System) connector provides an API for Spark applications to read and write data directly from ADLS Gen2 without the need to stage data on a local disk.
  • Flink ABS allows to use Azure Blob Storage with Flink for reading and writing data.
GCS
GCS
  • Spark GS connector provides an API for Spark applications to read and write data directly from Google Cloud Storage (GCS) without the need to stage data on a local disk.
  • Flink GCP can be used for reading and writing data and for the checkpoint storage.
JDBC
JDBC
  • Spark JDBC connector connects Spark to any JDBC-compatible database and unlocks new possibilities for data analysis, processing, and visualization.
  • Hive JdbcStorageHandler supports reading from JDBC data source in Hive.
  • Flink JDBC connector allows reading data from and writing data into any relational databases with a JDBC driver.
Solr
Solr
The Spark Solr integration is a library that allows Spark applications to read and write data to Apache Solr. With the Spark Solr integration, Spark applications can read data from Solr using SolrRDD, which allows the parallelization of data processing across a Spark cluster.
Phoenix
Phoenix

The Spark Apache Phoenix integration is a library that enables Spark applications to interact with Apache Phoenix, which is an open-source SQL wrapper for Apache HBase, that provides a way to use an SQL-like syntax to query and manage data stored in HBase.

With the Spark Apache Phoenix integration, Spark applications can read data from Phoenix tables using PhoenixRDD, which provides a distributed representation of the data stored in a Phoenix table.

Zeppelin
Zeppelin

Apache Zeppelin is a web-based notebook interface for interactive data analytics with Apache Hadoop. It allows to create and execute data-driven workflows using a variety of languages within a single, integrated environment.

Airflow
Airflow
Airflow2 is a platform for creating, scheduling, and monitoring data workflows. It provides a web-based interface for creating and managing workflows, which can include tasks such as data ingestion, transformation, and loading.
AVRO
AVRO
AVRO is a binary data format that is designed to be compact and fast. It supports schema evolution, which allows data schemas to change over time without requiring data to be rewritten or reloaded.
PARQUET
PARQUET
PARQUET is a columnar storage format that is optimized for processing large datasets. It stores data in a columnar fashion, which allows for faster access to individual columns and improved compression ratios.
ORC
ORC
ORC (Optimized Row Columnar) is another columnar storage format that is designed to be highly efficient and scalable. It supports compression and predicate push-down, which can greatly improve query performance.
DELTA
DELTA
DELTA is a transactional storage format that is built on top of Parquet and provides support for ACID transactions. It also supports schema evolution and provides features like versioning and time travel.
XML
XML
XML is a markup language used for representing structured data. Spark can handle XML data by using libraries like spark-xml.
JSON
JSON
JSON (JavaScript Object Notation) is a lightweight data format that is commonly used for exchanging data between applications. Spark has built-in support for reading and writing JSON data.
Operating systems
Alt Linux
Available only for Enterprise
CentOS
CentOS 7 is supported
RedHat
RedHat 7 is supported
Astra Linux
Available only for Enterprise
Ubuntu
Ubuntu 22.04.2 LTS in development
RedOS
Available only for Enterprise
Components
Hue

In development. HUE (Hadoop User Experience) is a web-based interface for the Hadoop ecosystem for data analytics.

Hue allows users to perform data analysis without losing any context. The goal is to promote self service and stay simple like Excel so users can find, explore, query and analyze data. One of the main advantages of Hue is the ability to connect to various data sources: Apache Hive, Impala, Flink SQL, Spark SQL, Phoenix, ksqlDB, Apache Hadoop HDFS, Ozone, HBase, etc.

Apache Ozone

In development. Apache Ozone is an open-source, scalable, and distributed object store designed for big data workloads. It is part of the Apache Hadoop ecosystem and is built on top of Hadoop Distributed File System (HDFS).

Ozone is designed to provide high performance and scalability for storing and processing large amounts of unstructured data such as log files, images, videos, and other large data objects. It is optimized for workloads that require high throughput and low latency, such as big data analytics, machine learning, and streaming data processing.

One of the key features of Ozone is its support for multiple storage classes, including hot, warm, and cold storage. This allows users to store data based on its access patterns and lifecycle, optimizing cost and performance.

Ozone also includes built-in data replication and distribution capabilities, enabling data to be stored across multiple nodes in a Hadoop cluster for improved availability and durability.

Smart Storage Manager
Technology preview.
Technology Preview Services are not intended for use in a production environment and may not be fully functional. They are under development and are provided to the client for review and testing.
Smart Storage Manager is a service that aims to optimize the efficiency of storing and managing data in the Hadoop Distributed File System. SSM collects HDFS operation data and system state information, and based on the collected metrics can automatically use methodologies such as cache, storage policies, heterogeneous storage management (HSM), data compression, and Erasure Coding. In addition, SSM provides the ability to configure asynchronous replication of data and namespaces to a backup cluster for the purpose of organizing DR.
Apache Kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide SQL on Data Warehouses and Lakehouses.

Kyuubi builds distributed SQL query engines on top of various kinds of modern computing frameworks, e.g. Apache Spark, Flink, Hive, Impala, etc., to query massive datasets distributed over fleets of machines from heterogeneous data sources.

Apache Impala

Apache Impala is an open-source massively parallel processing (MPP) SQL query engine for processing large volumes of data in real-time. It allows users to perform interactive queries on Apache Hadoop data stored in HDFS or Apache HBase. Impala was developed to address the need for a faster, more efficient SQL query engine for big data processing than traditional batch-oriented SQL engines.

Impala provides high-speed performance through its MPP architecture, which enables it to distribute processing across multiple nodes in a Hadoop cluster. It also includes support for advanced features such as complex joins, subqueries, and aggregation functions.

Impala is designed to be easy to use and integrate with existing BI and analytics tools. It supports standard SQL queries and JDBC/ODBC drivers for easy integration with a wide range of applications.

Apache ZooKeeper

Apache ZooKeeper is a distributed coordination service that is designed to help manage large distributed systems. It provides a centralized infrastructure for maintaining configuration information, naming, providing distributed synchronization, and providing group services. ZooKeeper is used extensively in Hadoop clusters to help manage the coordination of distributed systems and to ensure that each node in the cluster is aware of the state of the other nodes.

Hadoop Distributed File System (HDFS)

HDFS is a highly scalable and fault-tolerant distributed file system that forms the foundation of the ADH platform. It allows you to store large volumes of data across multiple nodes in a cluster, with built-in redundancy to ensure that data is always available, even in case of a node failure. HDFS is optimized for handling large files, making it an ideal choice for big data applications.

Apache YARN

YARN is a resource management and job scheduling framework that allows you to run multiple applications simultaneously on a Hadoop cluster. YARN enables you to allocate cluster resources dynamically, based on the needs of each application, and to monitor and manage those resources to ensure optimal performance.

Apache HBase

This is a NoSQL database that provides real-time read/write access to large datasets stored in Hadoop. HBase is designed to handle massive volumes of data and is optimized for random, real-time access to data, making it a popular choice for big data applications that require low-latency access to large datasets.

Apache Phoenix

Apache Phoenix is an open-source, SQL-like query engine for Hadoop that is designed to provide fast and efficient querying of large datasets. Phoenix is built on top of HBase, which means that it can handle massive amounts of data with low latency and provides support for real-time updates and access to data.

Apache Spark

Apache Spark is a fast and powerful open-source data processing engine that provides scalable, fault-tolerant data processing capabilities for big data workloads. The Apache Spark component of Arenadata Hadoop provides a high-performance and distributed computing framework that can process large datasets in parallel across a cluster of nodes. With its advanced analytics capabilities, including machine learning, graph processing, and SQL-like querying, Apache Spark can help businesses extract valuable insights from their data.

Apache Hive

Apache Hive is an open-source data warehouse infrastructure that provides data summarization, query, and analysis capabilities for large datasets stored in Hadoop. The Apache Hive component of Arenadata Hadoop provides a SQL-like interface for querying data in Hadoop, enabling businesses to perform ad-hoc queries, data analysis, and reporting. Hive translates SQL queries into MapReduce jobs, which can be executed on a Hadoop cluster. With its support for partitioning, indexing, and compression, Hive can help businesses optimize data storage and processing in Hadoop.

Apache Tez

Apache Tez is an open-source data processing framework that provides a flexible, efficient, and scalable way to execute complex data processing tasks on a Hadoop cluster. When used together with Apache Hive, Tez provides a faster and more efficient way to execute Hive queries, by replacing the MapReduce execution engine with a more optimized one. The Hive + Tez combination in Arenadata Hadoop provides a powerful and scalable platform for data warehousing, allowing businesses to perform ad-hoc queries, data analysis, and reporting at scale. With Tez's support for dynamic task scheduling and data partitioning, it can accelerate query processing by optimizing the data flow between Hive operators.

Apache Flink

Apache Flink is an open-source stream processing framework that enables the processing of large volumes of real-time data with low latency. The Apache Flink component of Arenadata Hadoop provides a distributed computing framework for real-time data processing that can be seamlessly integrated with batch processing. Flink supports event-driven processing and provides a unified programming model for both batch and stream processing, making it ideal for building end-to-end data processing pipelines. With its advanced features, including support for stateful streaming, windowing, and machine learning, Apache Flink can help businesses gain real-time insights from their data.

Apache Solr

Apache Solr is an open-source, enterprise-level search platform that is built on top of the Apache Lucene search library. Solr provides a robust and scalable search solution that is used by organizations of all sizes to power search functionality on their websites, mobile apps, and other applications.

Features
Time-saving
Reduced installation and configuration time compared to the manual installation
Easy to use
Users can easily install and configure Hadoop without requiring extensive technical knowledge
Standardization
Standardized installation across multiple machines, reducing the risk of errors and inconsistencies
Increased efficiency
Reduced risk of system downtime and overall improved system efficiency
Expertise
Our team evaluates bug fixes and enhancements from the broader Hadoop community and determines which ones to incorporate into their product
Arenadata Platform Security
Enterprise edition
Arenadata Platform Security (ADPS) is a combination of two security components:
Apache Ranger
Apache Ranger is an open-source security framework that provides centralized policy management for Hadoop and other big data ecosystems. The Arenadata platform integrates with Apache Ranger to provide policy-based access control and fine-grained authorization for data and analytics applications.
Apache Knox
Apache Knox is an open-source gateway that provides secure access to Hadoop clusters and other big data systems. The Arenadata platform integrates with Apache Knox to provide secure access to the platform and its services.
Together, ADPS provides a comprehensive security framework that includes policy-based access control, fine-grained authorization, and secure access to the platform and its services. This helps organizations protect sensitive data and ensure compliance with regulations.
ADB Spark Connector
The ADB Spark connector provides the possibility of high-speed, parallel data exchange between Apache Spark and Arenadata DB.
It has great flexibility in configuration and a multitude of features, including:
  • high speed of data transmission;
  • automatic data schema generation;
  • flexible partitioning;
  • support for push-down operators;
  • support for batch operations.
ADQM Spark Connector
Multifunctional connector with support for parallel read/write operations between Apache Spark and Arenadata QuickMarts.
It has great flexibility in configuration and a multitude of features, including:
  • high speed of data transmission;
  • automatic data schema generation;
  • flexible partitioning;
  • support for push-down operators;
  • support for batch operations.
Product comparison
Infrastructure
Management system
Arenadata Cluster Manager (ADCM)

A single tool for managing the lifecycle of all Arenadata products.

ADCM is installed with one command and only requires Docker.

Cloudera Manager

Automatic deployment and configuration.

Custom monitoring and reporting.

Built-in monitoring
Yes
Yes
Centralized upgrade
Yes
Yes
IT landscape support
Ability to deploy various combinations of bare metal, cloud
Yes

By using infrastructure bundles, ADH supports installation on physical and virtual servers (on-premises), in private and public clouds according to the IaaS model. Additionally, infrastructure bundles provide automatical installation on existing nodes and nodes creation "on the fly" for part of cloud providers (YC, VK).

Yes

Supported.

Support for cloud providers
Yandex Cloud;
VK Cloud;
Sber Cloud;
Google Cloud Platform.
Google Cloud Platform;
AWS;
Azure.
Domestic OS support
Alt Linux
Yes
No
Astra Linux
Yes
No
Features
Offline installation
Yes
Yes
High availability
Yes

ADH supports high availability for key critical platform data services (YARN, HDFS, Hive).

Yes
Integration with other products
Yes

ADH supports a number of proprietary solutions for integration:

  • Spark Tarantool (Picodata) Connector;
  • Spark Arenadata DB Connector;
  • Spark Arenadata QuickMarts Connector.

ADH also provides:

  • Kerberos support for PXF;
  • Informatica DEI 10.4 support for ADH 2.X.
Yes
Security settings
SSL encryption
Yes

Via ADCM.

Yes
Standard access separation based on Role Base Access Control
Yes

Flexible settings with Ranger in a separate ADPS product, which can serve multiple instances of ADH and other Arenadata products.

Yes
Single point of secure access
Yes

Knox as a part of ADPS.

Yes
Additionally
Technical support 24/7
Yes
Yes
On-demand fixes and improvements
Yes
Yes
Training/workshops
Yes

Full training on working with Arenadata products.

Not available for Russia
Community version
Yes

ADH is the only commercial distribution with a free version available. You can just download it.

No
Documentation
Yes

Detailed documentation in Russian and English languages for all services, their installation, configuration, and operation.

Publicly available.

Yes

Publicly available.

Registration in the register of domestic software
Yes
No
Successful deployments
Yes

ADH has been used for hundreds of thousands of hours in more than 20 Russian leader companies as a central data platform, which stores and processes up to 25 petabyte data.

Yes
Release history with descriptions
Yes

Complete release history with service versions and description of the upgraded functionality is available in the open domain.

Yes

Complete release history with service versions and description of the upgraded functionality is available in the open domain.

Comparison of current service versions
Service

ADH 3.2.4.2

Cloudera 6.3.4

HDFS & YARN
3.2.4
3.0.0
Impala
4.2.0_arenadata1
3.2.0
Hive
3.1.3_arenadata6
2.1.1
HBase
2.4.17_arenadata1
2.1.4
Phoenix
5.1.3_arenadata2
5.0
Tez
0.10.1_arenadata1
0.9.2
Zeppelin
0.8.1
0.8.2
ZooKeeper
3.5.10
3.4.5
Sqoop
1.4.7_arenadata2
1.4.7
Airflow2
2.6.3
Solr
8.11.2
7.4.0
Spark2
2.3.2_arenadata2
2.4.0
Spark3
3.4.2_arenadata1
3.0.1
Knox
1.6.0
1.2.0
Ranger
2.4.0_arenadata1
2.1.0
Flink
1.17.1_arenadata1
Kyuubi
1.18.0_arenadata1
SSM
1.6.0_arenadata1
Hue
Currently in development
4.4.0

“Product comparison” section is relevant on the date of 15.01.2024.

Releases
2023
ADH 3.2.4.2_b2
  • Removed the need to install Axiom JDK when using Astra Linux
  • Added the ability to set a custom value to the JAVA_HOME variable
  • Bug fixes
ADH 3.2.4.1_b3
  • Removed the need to install Axiom JDK when using Astra Linux
ADH 3.2.4.2_b1
  • Added a new service - Kyuubi
  • Added a new service - SSM
  • Upgraded Spark to 3.4.2
  • Added a new component for the Spark3 service - Spark Connect
  • Added Spark3 support for ADQM Spark connector
  • Added improvements related to the information security
ADH 3.2.4.1_b2
  • Patch release with bug fixes.
ADH 3.2.4.1_b1
  • Upgraded Hadoop to 3.2.4 and many other services
  • Added support for Astra Linux to ADH and ADPS
  • Added zstd support in HDFS
  • Excluded the vulnerability of the log4j library
  • Added the Spark3 Thrift Server component
  • Excluded Airflow1 from the bundle
ADH 3.1.2.1_b2
  • Patch release with bug fixes.
ADH 3.1.2.1
  • Added a new service - Apache Impala
  • HBase is upgraded to 2.2.7
  • Solr is upgraded to 8.11.2
  • Flink is upgraded to 1.16.2
  • Ranger is upgraded to 2.2.0
  • Intoroduced HA auto-management for ADH services
  • Hive is upgraded to 3.1.3_arenadata4 with some important fixes
  • Introduced the Maintanence mode which provides an ability to delete any node from a cluster
ADH 2.1.10
  • Added an ability to select a TLS version for ADH services
  • Added support for custom Zeppelin interpreters
  • Spark version updated to 3.3.2
  • Added the new component Spark History Server for Spark3
  • Hive version updated to 3.1.3 with some important fixes
ADH 2.1.8
  • Airflow2: added the high availability mode
  • Airflow2: added LDAP authentication/authorization support
  • Airflow2: added support for external broker configuration
  • Hive version updated to 3.1.3 with some important fixes
ADH 2.1.7
  • Added the livy-spark3 component to the Spark3 service
  • Added the Apply configs from ADCM checkbox for all services
  • Flink build 1.15.1 is available
  • Added the ability to connect to Flink JobManager in the high availability mode
  • Added package checks optimizations for the installation
ADH 2.1.6
  • Added support for Alt Linux 8.4
  • Added support for FreeIPA kerberization
  • Added support for customization of krb5.conf via ADCM
  • Added support for customization of ldap.conf via ADCM
ADH 2.1.4_b11
  • Added the ability to specify external nameservices
  • Added the ability to connect to HiveServer2 in the fault-tolerant mode
ADH 2.1.4_b10
  • The check box Rewrite current service SSL parameters is added for the Enable SSL action
  • Custom authentication (LDAP/AD) is enabled for Hive2Server
  • The Ranger plugin for Solr authorization is added
  • The ability to remove services from the cluster is added
  • The ability to customize configuration files via ADCM is added
  • The support of Kerberos REALM is added
ADH 2.1.4_b9
  • The Kerberos authentication is enabled for Web UI
  • The ability to configure SSL in the Hadoop clusters is added
ADH 2.1.4_b5
  • The ability to use Active Directory as Kerberos storage is implemented
  • The AD/LDAP/SIMPLE authorization is added for Zeppelin
ADH 2.1.4_b3
  • The MIT Kerberos integration is implemented in ADCM
  • The Ranger plugin is made operable on kerberized services
ADH 2.1.4_b2
  • Host actions are added
ADH 2.1.4_b1
  • The ability to use external PostgreSQL in Hive Metastore is added
  • Spark 3.1.1 is implemented for ADH 2.X
  • The offline installation is implemented for ADH
ADH 2.1.3
  • Implemented integration with Ranger 2.0.0
ADH 2.1.2.5
  • Client components for Flink are added
  • Client components for HDFS are added
  • Client components for YARN are added
ADH 2.1.2.3
  • The ADH bundle is divided into community and enterprise versions
  • The High Availability for NameNodes is implemented
ADH 2.1.2.2
  • The epel-release installation is disabled
  • Nginx is copied from the Epel repository to the ADH2 repository
ADH 2.1.2.1
  • Solr 8.2.0 is added for ADH 2.2
  • Sqoop is added into the ADH bundle
ADH 2.1.2.0
  • The ability to configure Hive ACID is added
  • Flink is added into the ADH bundle
  • GPU support is enabled for YARN
  • Airflow is added into the ADH bundle
ADH 2.1.1
  • YARN Scheduler configuration is implemented
  • HDFS mover is implemented
  • The cluster-wide Install button is added to the ADCM UI
ADH 2.1.0
Implemented service management for the following services:
  • Livy Server
  • Zeppelin
  • Spark Thrift Server
  • Spark Thrift Server
  • Spark Server
  • Phoenix Server
  • HBase Thrift
  • HBase Region Server
  • HBase Master
  • Node Manager
  • Resource Manager
  • Timeline Service
  • WebHCat
  • MySQL
  • Hive Metastore
  • Hive Server
  • DataNodes
  • Secondary NameNodes
  • NameNodes