Hadoop engine rebuilds and replacement

This is a group blog series, aimed at detailed descriptions of SQL-on-Hadoop. The first article in the Mens Nike Free 3.0 V2 Shoes Black Gold series will 2014 Nike KD 7 653997-001 Mens Basketball Shoes Orange Blue White Factory Outlet introduce some storage engines and online transaction processing (referred to as OLTP) related topics, which will introduce an online analytical processing (referred to as OLAP), the third will be an introduction to Hadoop engine rebuilds and replacement products in the relevant How selection, and other topics. Data processing and online analytical processing (OLAP) online analytical processing work to those who support business intelligence, reporting, and data mining and exploration business undertaken. Examples of such work are calculated retailer store sales by region and quarter two dimensions, the bank calculated according to the two dimensions of language and month installed capacity of mobile banking, equipment manufacturers locate which components have failure rates higher than expected, and Hospital research which events will cause stress and other high-risk babies. If the original data from the OLTP system, the typical approach is to copy the data into the OLAP database, then such 'off-line' processing and analysis tasks, do so for many reasons, but most still consider performance factors. Hypothetically, if a store with their transaction processing systems to undertake data analysis, analyst rude queries submitted in this case it may actually affect and down the store for those who are already recorded in the register waiting for the settlement of orders settlement rate. In addition to the transaction query types are fundamentally different from the data analysis class query. Transaction typical query is based on a separate entity, such as a particular customer or a particular user. For example, when an online retail site to create a trading order status page, and query data is specific for a certain customer orders that have been submitted. However, in the data analysis of the use cases, analysts are most interested in are those who divide themselves according to the time dimension query has crossed the summary of order or user data. As mentioned earlier, according to the two dimensions of the region store sales and Mens Nike Free 3.0 V2 Shoes Grey Green quarterly statistics will query the given period of time all orders. There is one final points to note herein, this database does not provide a traditional relational database users expect the kind of CRUD operations before you act. And transaction systems is different, the analysis type Nike Zoom Kobe of query SELECT query mainly those involving millions or even hundreds of millions of rows of data. Optimization of databases mainly revolves around this kind of load, but these optimization measures will lead to implement CRUD operations for small quantities of expensive data. Even if such a database interface and semantic aspects of relational databases, but they also do provide increased row (INSERT), update features support line (UPDATE) and delete rows (DELETE) operations. Perhaps some readers are asked recently added 'ACID' Hive system related issues, after detailed intrinsic Nike Blazers capacity. Outside Hive system 'ACID' function, the process update operation, there are two alternative ways, one is to use the update function where HBase data provided by the system itself. Although HBase often mainly used for OLTP operations, but some OLAP system uses Air Max 2011 Dark Black Blue HBase to store some small tables, called dimension tables typically, these tables need updated periodically. The second way to deal with the update operation is to perform a merge operation. ETL from a developer point of view, a merger process will introduce additional workload. So there is a problem will be asked, that is, since HBase system already provides update functionality, such that Air Jordan Outlet it is not Nike Basketball necessary consolidation work, why not direct all use HBase it? The reason is that the query processing performance scanning, if you want to provide additional random pattern based on the updated rear of HDFS file system capabilities, HBase have to read each line in it are doing a small amount of the merger operation, the framework decision provides high write and random read performance, but compared with HDFS, can only provide poor scan queries and sequential read performance. As a HBase it can only be used for storing those small tables that require frequent updating Nike Air Max 2011 Men of occasions; this area contains several subdirectories: Apache HiveDremel clonesSpark SQLApache Hive The project was originally created by Opera Company, Hive Hadoop is based on the first Nike Air Max of SQL engine, and it is still the most mature. Hive was originally built on the MapReduce also had been transformed to run on Apache Tez, the transformation now under way to adapt to Apache Spark is carried out, based on the transformation of Spark's Hive is known to be the final work, but should not be confused with other SQL mutual support projects on the Spark project, on 541100-300 Black Green White Style Nike Lebron X 10 Outlet the other SQL support programs Spark on I will find the right time for discussion. So far, Hive has the most complete support for SQL capabilities, and also has the largest contributor to the project, almost all users will deploy Hadoop Hive, Hadoop almost simultaneously on other SQL engine users will also deploy Hive, the fact that most SQL engines are in one way or another dependent on the Hive. Most Hadoop sponsors, including Cloudera and Hortonworks, all agreed Hive is the only capable of handling large quantities of tasks and integration of the various components of non-standard data formats. Hortonworks and Cloudera disagree is that the performance evaluation of the Hive, Clourdera feel Hive performance simply can not be compared with the Dremel clones, while Hortonworks you think can Hive and Dremel Clones under a high. Dremel Clones like the open source community, Google also created a number of internal SQL engine, they have a similar Hive SQL engine called Tenzing, there is another system called Dremel. Facebook founder Hive company also created a clone version Dremel called Presto. Cloudera Impala and Apache Drill is the most prominent of the two Dremel clone, Cloudera will be the most mature open source Dremel branch, Impala release GA version Impala market position in mid-2013, MapR Drill is the main sponsor behind, he put Drill Role of the market as the most flexible Dremel branch, Impala to Mens Nike Free 3.0 V2 Shoes Grey Orange meet the needs in the Hive metadata stored in the system tables, and Drill can directly query JSON and custom format files, such as Apache 2015 Nike Free 5.0 Parquet and Avro file format. Spark SQL Despite Hadoop on other multiple SQL engine, but it has its interest in Spark SQL broadest audience. Spark Spark SQL is second place on the engine, but the champion is Shark, Shark because taking into account the Spark SQL and Hive on Spark project, Shark has terminated development, and Shark project has been nearly a research project is different University of California at Berkeley, Spark SQL and Hive on LeBron James Shoes Spark already established their own open source projects in Spark the support of sponsors; the Hive-based Spark may simply Mens Nike Free 3.0 V2 Shoes Grey Green be said to be the front-end is Hive backend Spark, based on MR or both Tez of Hive and Hive user can in the original system Easily switch between on Spark systems, switching work is just simply need to modify the configuration parameters below. Spark SQL is a completely new engine, today's Spark SQL for those who wish to embed SQL into their Scala, Java or Python developer Spark Program is the most useful, but Spark SQL main sponsor of Spark SQL Databricks there is a greater ambition, and hope to expand the scope of use Spark SQL non Spark developer to go;SQL on Hadoop Truth (2)

calledvdsのブログ

ブログの説明を入力します。

Hadoop engine rebuilds and replacement