Well, every now and then, when we began to start a new project or
app, which has some data storage requirement, we have a deep
intriguing thought as to how best represent the data structure so
as to support a variety of needs including but not limited to
(ACID rules):
1. Normalization
2. Reliability
3. Consistency
4. And many others
Below, I provide a set of steps which you can follow to arrive at
a data model that correctly suites your requirements.
Steps:
1. Identify the project or app requirements /
specifications and business rules which tell you what your app
will be able to do when it is ready.
2. From these business rules, identify possible objects
for each business rule and mark them in a paper using rectangular
sections like authors, posts etc.
3. Once you have recognized the …
There may be times when you need to create a new table in MySQL and feed it with data from another database, the Internet or from combined data sources. MS Excel is commonly used as the bridge between those data sources and a target MySQL database because of the simplicity it offers to organize the information to then just dump it into a new MySQL table. Although the last bit sounds trivial, it may actually be a cumbersome step, creating ODBC connections within Excel through Microsoft Query may not help since these are normally created to extract data from MySQL into Excel, not the opposite. What if you could do this in a few clicks from within Excel after making your data ready for export to a MySQL database?
With MySQL for Excel you can do this and this guide will teach you how easy it is.
If you happen to work with personal data, chances are you are subject to SOX (Sarbanes-Oxley) whether you like it or not.
One of the worst aspects of this is that if you want to be able to analyse your data and you replicate out to another host, you have to find a way of anonymizing the information. There are of course lots of ways of doing this, but if you are replicating the data, why not anonymize it during the replication?
Of the many cool features in Tungsten Replicator, one of my favorites is filtering. This allows you to process the stream of changes that are coming from the data extracted from the master and perform operations on it. We use it a lot in the replicator for ignoring tables, schemas and columns, and for ensuring that we have the correct information within the THL.
Given this, let’s use it to anonymize the data as it is being replicated so that we don’t need to post-process it for analysis, and …
[Read more]The second article in a series covering Big Data and SQL interaction is available now:
“Big data” is a term that has been used regularly now for almost a decade, and it — along with technologies like NoSQL — are seen as the replacements for the long-successful RDBMS solutions that use SQL. Today, DB2®, Oracle, Microsoft® SQL Server MySQL, and PostgreSQL dominate the SQL space and still make up a considerable proportion of the overall market. Here in Part 2, we will concentrate on how to use HBase and Hive for exchanging data with your SQL data stores. From the outside, the two systems seem to be largely similar, but the systems have very different goals and aims. Let\’s start by looking at how the two systems differ and how we can take advantage of that in our big data requirements.
SQL to Hadoop and back again, Part 2: …
[Read more]3Ci processes over a billion transactions a month. More than 100 million unique U.S. consumers have engaged with a business through our platform. All that activity creates massive amounts of data. The Data Team at 3Ci is responsible for keeping our offerings running at optimal performance and for making sense of our data. They manage MySQL [...] …
[Read more]Enabling Real-Time MySQL to HDFS Integration
Batch processing delivered by Map/Reduce remains central to Apache Hadoop, but as the pressure to gain competitive advantage from “speed of thought” analytics grows, so Hadoop itself is undergoing significant evolution. The development of technologies allowing real time queries, such as Apache Drill, Cloudera Impala and the Stinger Initiative are emerging, supported by new generations of resource management with Apache YARN
To support this growing emphasis on real-time operations, we are releasing a new …
[Read more]Introduction
This article will explain how the data is organized in InnoDB storage engine. First we will look at the various files that are created by InnoDB, then we look at the logical data organization like tablespaces, pages, segments and extents. We will explore each of them in some detail and discuss about their relationship with each other. At the end of this article, the reader will have a high level view of the data layout within the InnoDB storage engine.
The Files
MySQL will store all data within the data directory. The data directory can be specified using the command line option –data-dir or in the configuration file as datadir. Refer to the Server Command Options for complete details.
By default, when InnoDB is initialized, it creates 3 important files in the data directory – ibdata1, ib_logfile0 and …
[Read more]Spatial data is being more used and needed at a larger number of applications. This type of data is not always easy to be managed or queried. And sometimes calculations need to be done in the application code instead of doing them at the server. Recently we added a new class to manage spatial data with Connector/Net, so our users can have the option to handle spatial data operations at their application code.
Problem: You've a large table (or two) in a database on a partition that's running out of space, and you want to see if you can move that table to another drive.
Solution: Well, several actually. No silver bullet, but several options, some with conditions and some that require preparation. Let's look at some background information first.
How MySQL Stores Data
OK, that's somewhat of an ambitious heading for an incidental paragraph or two, so to tone it back a bit, I'll summarise briefly.
- The data directory is where MySQL stores databases, and it's set by the datadir server option. Each database is stored in a subdirectory of the data directory. You can also save a considerable amount of space without moving data around, by …
I've previously written about AppArmor and MySQL, and how to change MySQL's default file locations on systems with AppArmor enabled. Ubuntu and SUSE ship with AppArmor enabled, but some other distributions such as Oracle Linux don't, along with related distrubutions such as Red Hat, CentOS and Fedora. Rather, these other distributions use another mandatory access control system called SELinux.
Here's some technical detail that might come in handy later.
SELinux uses concepts such as types and domains. Types belong to resources such as files and ports; these are the "objects" in SELinux. Domains contain the "subjects" (processes) and object types that are associated with each other in some way, for example because they are all related to …
[Read more]