Kognitio is an ultra-fast SQL engine with a number of different deployment options. Here are some of the most frequently asked questions that we receive from our users.
No, we have one version of Kognitio which comes with full functionality regardless of the deployment option.
We offer a range of support options for Kognitio, from web only to full enterprise support. For more information visit our support page.
There are large variations in the performance, flexibility and maturity of available SQL engines.
Some SQL implementations have been developed from scratch to run across multiple instances in the cloud. However, SQL is a very large, complex standard which is difficult enough to implement on a serial platform, but to implement it in parallel is incredibly difficult and time consuming.
In contrast, Kognitio has been developing a parallel SQL engine for 25 years. So our SQL engine is mature, and proven to scale-out for the high concurrency required by business users. Kognitio is also a true in-memory engine.
If you are using Kognitio in the Cloud, the Kognitio Launcher will automatically deal with the software version, and you won’t have to install any Kognitio software yourself.
If you intend to use Kognitio on a Hadoop cluster under YARN, you should install the “on Hadoop” version. Kognitio on Hadoop runs on Hortonworks Data Platform, Cloudera Data Platform, Amazon EMR, and Azure HDInsight.
If you intend to use Kognitio on MapR, you should download the MapR version.
For all other options deploy Kognitio Standalone
Whichever platform you deploy Kognitio on, the product is fundamentally the same. Whether you are running in the cloud, or on a Hadoop/MapR cluster, or on a standalone cluster of commodity servers, anything you create will be fully functional on another Kognitio system regardless of what platform it is deployed on.
Kognitio works well with Tableau, Qlik, Microstrategy and PowerBI. We partner with Tableau, Qlik and MicroStrategy, and have a working connector available for Microsoft PowerBI.
Kognitio will also work with any other tool that can use either an ODBC or JDBC driver to connect and run queries.
It depends on your specific use case.
If your existing SQL on Hadoop engine is working well for you, then you have no need to change. But if your incumbent platform is forcing you to compromise on the types of analysis you can perform on your data, then it’s not the right platform.
Kognitio is the ideal solution where delivery of knowledge needs to be very fast and for hundreds of concurrent users. If you have an idea for a data project that you’re not even attempting to run because you believe the results will arrive far too late, then you should try Kognitio.
Furthermore, Kognitio can be deployed as a YARN application alongside other SQL engines for your specific use cases where the delivery of knowledge needs to be very fast. So it can be a complement to what you already have in place.
There are large variations in the performance, flexibility and maturity of available SQL engines. Hive, Impala and SparkSQL, for example, are new SQL implementations that were developed from scratch for Hadoop.
Yet SQL is a very large, complex standard which is difficult enough to implement on a serial platform, but to implement it in parallel is incredibly difficult and time consuming. For 25 years, Kognitio has been developing parallel SQL so our SQL engine is much more mature and proven to scale-out for the high concurrency required by business users. Kognitio is also a true in-memory engine.
Ecosystem integration is an important part of the Kognitio architecture and we provide metadata connectors to allow Hive and AWS Glue tables to be used directly without the need to define columns. Files that have internal metadata (ORC, Parquet and Avro) can also be used without specifying column names or types.
For delimited or JSON files we have a very flexible external table facility to simplify access whatever format the data is in.
In addition to the standard connectors, you can easily build your own connector for any data source that can be read on Linux. All connectors are massively parallel and will read from the appropriate data sources for your platform (S3, ADL, HDFS etc).
Kognitio has very rich ANSI standard SQL support, as well as many compatibility functions for Oracle and SQL server syntax.
Although Kognitio is ACID compliant and is capable of doing transaction processing, it is not designed for fast performance as an OLTP (online transaction processing) database. It’s designed for amazing performance on analytical workloads i.e. complex queries over large subsets of data at very high concurrency. Read More
Kognitio’s primary language is SQL - with extensive SQL capability “out-of-the-box”.
If your analytics are too complex for SQL then additional languages can easily be used. Any Linux executable can be set-up as a Script Environment on Kognitio. When you deploy Kognitio on AWS both R and Python are pre-installed and ready to use. Put your code into an External Script - a simple SQL wrapper and let Kognitio manage the processing based on available resource. This allows your complex algorithms to be available on-demand to BI end-users submitting SQL from standard tools. Read more.
Kognitio is designed to be simple to install and administer.
Kognitio for the Cloud is managed from a web-based interface that allows you to easily launch, start, stop, administer and terminate instances. Kognitio Launcher can help you decide what infrastructure to deploy based on your data sizes and analytical workload.
Implementations on Hadoop and MapR can require some configuration of the Hadoop/MapR environment depending on the complexity of the platforms.
Standalone implementations are very straightforward.
Kognitio can scale both up and out. Increasing data volumes and workloads can be automatically accommodated by small increments in the size of the underlying hardware platform if you are using on-premise kit, or by launching a larger system in the cloud.
Any tool that uses JDBC or ODBC can connect to Kognitio. This includes all the major Business Intelligence tools and the vast majority of other query tools. See our Connecting to Kognitio documentation for details of how to do this on Linux, macOS and Windows.
Kognitio is deployed on a networked cluster of x86 (Intel / AMD) servers running a basic Linux kernel. The cluster can be as small as one server or run into hundreds of servers. Therefore, Kognito runs on virtually any of the mainstream cloud providers.
Kognitio for the Cloud uses the cloud providers native on-demand pricing mechanisms and has easy to use automated launching and setup. It is currently only supported on Amazon AWS, but Azure and Google Cloud versions will be coming soon. The Amazon AWS version can be found on Amazon Marketplace
Also Kognitio Standalone will run directly on a native cluster running Linux, while Kognitio on Hadoop also requires a Hadoop distribution to be running. Using either the Standalone or on-Hadoop versions will require the software to be installed manually and Standalone also requires a license key for installations with more than 512G of memory.
Kognitio for the Cloud is charged by the hour based on the amount of compute resource being used. You only pay for what you use when you need it.
The hourly price for Kognitio depends on the number and type of cloud virtual machine instances which are part of that system. This will usually be the same as the cloud provider’s hourly charge for your instance type. You will also need to pay the cloud provider’s charges for running the instances in addition to the cost of running Kognitio.
Kognitio is an in-memory data analytics engine which is optimised for in-memory data while Redshift is a disk-based data warehouse optimised to query data on an instance’s local disk.
Both integrate easily into the AWS cloud environment but Kognitio significantly outperforms Redshift. It does not require the sort of tuning Redshift users have to do to get good performance, provided the data fits into Kognitio memory. If you do not need the performance Kognitio offers then Redshift has the advantage that you can provision a smaller system to query data slowly (e.g. queries taking minutes/hours instead of seconds/minutes). You can try Kognitio on the Cloud via Amazon Marketplace
Querying S3 data from Kognitio is free after the standard S3 request charges. With Redshift Spectrum extra charges of $5/TB are incurred per query
Kognitio integrates with other compute engines (e.g. R, Python, Tensorflow), allowing users to embed other languages within SQL to get easy and seamless access to complex non-SQL analytics within queries. Redshift can’t do this so users need to do data extracts and invoke another computation engine externally to achieve the same results.
Kognitio is an analytics engine which runs on compute instances provisioned in the user’s cloud account. Athena is a serverless product which gives queries to Amazon’s infrastructure to be run. There are advantages and disadvantages to each and users can run them side by side to get the best that both products have to offer.
Athena bills by the query and the charges can be quite high. Athena costs $5 per TB scanned per query. With Kognitio you pay per hour based on the type and number of instances provisioned and can run whatever queries you like resulting in more predictable costs. You can try Kognitio via Amazon Marketplace
For frequently queried data Kognitio will usually be cheaper -- a kognitio system holding 1TB of data would typically cost around $30/hour, which is only 6 queries in Athena if they scan all the data. Athena is more cost effective when users need data be available for querying all the time but only want to query it very occasionally.
Queries on Athena typically have a fixed cost and performance for any given query. Kognitio users can select the type and number of instances used in order to get different price points or performance for query workload.
Kognitio supports a wider range of standard SQL. Athena is relatively immature and sometimes requires users to rewrite queries in a different way. Kognitio also integrates with other compute engines (e.g. R, Python, Tensorflow), allowing users to embed other languages within their SQL to get easy and seamless access to complex non-SQL analytics within queries. Athena does not have any comparable feature.
Snowflake is a data warehouse as a service solution whereas Kognitio is an analytical database designed to fit into any data lake model.
Data resides outside of Kognitio in a data lake (typically S3, HDFS or similar). The data can then be processed using any number of applications which “drink from the lake”. Kognitio functions as one of these applications. This means users aren’t restricted to one tool for everything, they can use a range of different tools and select the best for each task.
Snowflake requires users to put all of their data into the Snowflake service using a standard ETL process. Once there the data can only be queried using Snowflake. Any processing which cannot be done inside Snowflake needs to be done using data extracts.
Kognitio supports a wide range of deployment types: cloud, on-premise and on Hadoop. This allows users to bring their computational tasks to their data wherever that may be. Snowflake users are limited to keeping their data in one of a small number of cloud providers and regions where Snowflake have chosen to set up their service.
Kognitio uses cloud resources which are provisioned in the user's account so the user is in full control of their data at all times. With Snowflake the user must put data into resources outside of their control without the ability to fully audit how and where the data is processed.
Kognitio integrates with other compute engines (e.g. R, Python, Tensorflow), allowing users to embed other languages within their SQL to get easy and seamless access to complex non-SQL analytics within queries. Snowflake does not have a similar capability and users often have to extract to Spark or a similar technology to do non-SQL analytics.
Kognitio for the Cloud, Kognitio Standalone and Kognitio for Hadoop are basically the same software with identical functionality and interfaces. To users and applications they all behave exactly the same. They only differ in the way they are deployed, administered and priced.