Showing entries 101 to 110 of 127
« 10 Newer Entries | 10 Older Entries »
Displaying posts with tag: data (reset)
The SMAQ stack for big data

SMAQ report sections

→ MapReduce

→ Storage

→ Query

→ Conclusion

"Big data" is data that becomes large enough that it cannot be processed using conventional methods. Creators of web search engines were among the first to confront this problem. Today, social networks, mobile phones, sensors and science contribute to petabytes of data created daily.

To meet the challenge of processing such large data sets, Google created MapReduce. Google's work and Yahoo's creation of the Hadoop MapReduce implementation has spawned an ecosystem of big data processing tools.

As MapReduce has grown in popularity, a stack for big data systems …

[Read more]
MySQL GIS – Part 4

WHAT CAN YOU DO WITH GEO DATA?

Geo spatial indexes are what make this type of data valuable.  With shape and point data you can find relationships between object in our physical world.  How close is the lightning in the storm front?  What homes where hailed on? (WDT) What schools are in my city?  With a list of homes for sale, how fare are they from their nearest school?  What picture where take in this area. (TwitPic)

Lets start with a simple grid of coordinates by creating a table for it call geom, adding our data points in and out of our grid and then searching with a small bounding box. The grid looks like this.

0,0
[Read more]
MySQL GIS – Part 3

What data is available?

GEO data is expensive to create, so has been created by governments.  In the past governments charged for this data.  In 1980 the USGS was charging $300 (usd) per county for Oklahoma GEO data. (I complained to my congressman.) Today, a quick Internet search turns up lots of free GIS data.

I was hoping to find a neat collection of basic GEO data.   It would be nice if there was one place you could get world political borders (Polygons), postal codes (Polygons) and  points of interest like hospitals and airports.  What you can find is lots of  lists, often collections of odd data created for a virility of complex political purpose.  For example, The Global Change Master Directory is a large list of data sources on earth …

[Read more]
MySQL GIS – Part 1

In my business (weather) we use lots map based (Geo) information.  Almost every table has latitude and longitude. Working with this kind of data can be exciting and frustrating.  This should give you a quick start into GIS with MySQL.

“A geographic information system (GIS), or geographical information system, is any system that captures, stores, analyzes, manages, and presents data that are linked to location. In the simplest terms, GIS is the merging of cartography, statistical analysis, and database technology. GIS systems are used in cartography, remote sensing, land surveying, …

[Read more]
Does Size or Type Matter?

MySQL seems to be happy to convert types for you. Developers are rushed to complete their project and if the function works they just move on. But what is the costs of mixing your types? Does it matter if your are running across a million rows or more? Lets find out.

Here is what the programmers see.

mysql> select 1+1;
+-----+
| 1+1 |
+-----+
|   2 |
+-----+
1 row in set (0.00 sec)

mysql> select "1"+"1";
+---------+
| "1"+"1" |
+---------+
|       2 |
+---------+
1 row in set (0.00 sec)

Benchmark

What if we do a thousand simple loops?  How long does the looping itself take?

The BENCHMARK() function executes the expression expr repeatedly count times. It may be used to time how …

[Read more]
Is OpenStack Cloud Computing Rocket Science?



There’s a real explosion of cloud platforms and management tools, it seems you can’t swing a dead cat without hitting one these days. In the commercial proprietary solutions space you have – CA’s 3Terra AppLogic, Enomaly, Nimbula, RightScale. In open source there are EucalyptusCloud.com, Open Nebula and …

[Read more]
Federated Tables

Your searching for how to create a join across two databases on two different servers and it can’t be done directly.   select  d1.a, d2.b from db1@server1 join db2@server2 where db1.c = db2.c; does not work.

You learn about federated databases.  The federated storage engine allows accesses data in tables of remote databases.  Now how do you make it work?

1) Check if the federated storage engine is supported.  Federation is OFF by default!

mysql> show engines;
+------------+---------+----------------------------------------------------------------+
| Engine     | Support | Comment                                                        |
+------------+---------+----------------------------------------------------------------+
| InnoDB     | YES     | Supports transactions, row-level locking, and foreign keys     |
| MyISAM     | DEFAULT | Default engine as of …
[Read more]
mg_hot_replace_table.pl

Do you have MyISAM tables you reload with new data?

Do your queries, using that table, get blocked because the table is locked?

Do the waiting queries create idle connections slowing down the table load?

Do you wish you could just replace the table?

Years ago I was told you can replace CSV tables by simply replacing the CSV file. I figured this would also be true of a MyISAM file and it is. I use this perl script to replace MyISAM tables forcast and current observation weather data. The processing and tables are created on another computer. Weather forecasting is CPU and database expensive. I then copy (rsync) the files to the production system and run this script.

#!/usr/bin/perl
################################################################################
################################################################################
# mg_hot_replace_table.pl - Hot Replace a MySQL table.
#
# 2010-05-01 …
[Read more]
Looking just at the data

There are many areas you need to review when addressing MySQL performance such as current database load, executed SQL statements, connections, configuration parameters, memory usage, disk to memory ratio, hardware performance & bottlenecks just to name a few.

If you were to just look at the data that is held in the database, what would you consider?
Here are my tips, when looking just at the data.

  1. What is the current database size?
  2. What is the growth of data over time, say daily, weekly?
  3. Which are the 2 largest tables now?
  4. What 2 tables are growing the fastest?
  5. What tables have greatest churn, specifically DELETE’s?
  6. How often do you optimize your tables?
  7. What is your archiving/purging strategy? Do you even have one?
  8. Review data types? I average 25% reduction in footprints, just by choosing optimal data types, generally with zero …
[Read more]
My favorite MySQL data type – DECIMAL(31,0)

It may seem hard to believe, but I have seen DECIMAL(31,0) in action on a production server. Not just in one column, but in 15 columns just in the largest 4 tables of one schema. The column was being used to represent a integer primary or foreign key column.

In a representative production instance (one of a dozen plus distributed production database servers) the overall database footprint was decreased from ~10 GB to ~2 GB, a 78% saving. In total, 15 columns across just 4 tables were changed from DECIMAL(31,0) to INT UNSIGNED.

One single table > 5GB was reduced to under 1GB (a 81% saving). This being my record for any GB+ tables in my time working with the MySQL database.

Had this server for example had 4GB of RAM, and say 2.5GB allocated to the innodb_buffer_pool_size, this one change moved the system from requiring more consistent disk access (4x data to memory) to being able to store all data in memory. Tests showed …

[Read more]
Showing entries 101 to 110 of 127
« 10 Newer Entries | 10 Older Entries »