Recently, we create a mysql data warehouse which is based on
message queue.
Most companies must prepare for particular queries in their
systems if they consider to split their databases or tables into
many pieces.
some problems should be solved in this situation:
1. how to get correct results in-time
2. how to build strong data warehouse for future analyst
These policies were used by YHD
They have already deployed a middle-ware layer to support these
requests (between web apps and databases). Every aggregation SQL
was splited into many small SQLs and runs in every data nodes.The
Final result is the aggregation of these all small SQLs. In this
procedure, everything was computed in memory to get high
performance.
In data warehouse layer, they use self-defined ETL tools to
extract data from different databases to oracle-Exadata platform.
Log-based data was put into hadoop and hbase.
…
[Read more]