We all do need sometimes to generate raw valid dummy data for our
use cases and applications as we start them. Obviously, one can
write their own scripts to generate random data, but it is much
better to have data, to which human beings can associate with
like names, addresses instead of having them filled with random
"lorem ipsum" string data :)
While searching for such a tool, I found a site which does
exactly this: http://www.generatedata.com/
Documentation: http://benkeen.github.io/generatedata/
This can also be downloaded and installed locally. It supports
three types of installations:
- A single, anonymous user account
- A single user account, requires login
- Multiple accounts
Below is the set of wide varied data types it supports for …
A list is simply a list of things. The list has no
structure, except in some cases, the length of the list may be
known. The list may contain duplicate items. In the following
example the number 1 is included twice.
Example list:
1 2 3 1
A set is similar to a list, but has the following
differences:
- The size of the set is always known
- A set may not contain duplicates
You can convert a list to a set by creating a 'weighted
list'. The weighted list includes a
count column so that you can determine when an
item in the list appears more than once:
1,2 2,1 3,1
Notice that there are two number 1 values in the weighted list.
In order to make insertions into such a list scalable, consider
using partitioning to avoid large indexes.
…
A list is simply a list of things. The list has no
structure, except in some cases, the length of the list may be
known. The list may contain duplicate items. In the following
example the number 1 is included twice.
Example list:
1 2 3 1
A set is similar to a list, but has the following
differences:
- The size of the set is always known
- A set may not contain duplicates
You can convert a list to a set by creating a 'weighted
list'. The weighted list includes a
count column so that you can determine when an
item in the list appears more than once:
1,2 2,1 3,1
Notice that there are two number 1 values in the weighted list.
In order to make insertions into such a list scalable, consider
using partitioning to avoid large indexes.
…
The most basic and most oft-repeated task that a DBA has to accomplish is to look at slow logs and filter out queries that are suboptimal, that consume lots of unnecessary resources and that hence slow down the database server. This post looks at why and how VIEWs can help against such suboptimal operations.
I need to generate large (1TB-3TB) synthetic MySQL datasets for
testing, with a number of requirements:
a) custom output formatting (SQL, CSV, fixed-len row, etc)
b) referential integrity support (ie, child tables should
reference PK values, no orphans,etc)
c) able to generate multiple tables in parallel
d) preferably able to operate without a GUI and/or manual
intervention
e) uses a well defined templating construct for data
generation
f) preferably open source
Does anyone out there know of a product that meets at least most
of these requirements?
*edit*
I found a PHP based data generation script (www.generatedata.com)
that is extensible in its output formatting, so it should do
everything I need it to do.