Showing entries 131 to 140 of 164
« 10 Newer Entries | 10 Older Entries »
Displaying posts with tag: hadoop (reset)
451 CAOS Links 2010.11.02

JCP election results. Funding for Acquia and Continuent. Fedora 14. And more.

Follow 451 CAOS Links live @caostheory on Twitter and Identi.ca, and daily at Paper.li/caostheory
“Tracking the open source news wires, so you don’t have to.”

# The Java Community Process election results are in.

# Acquia closed an $8.5m series C funding round and announced that it has tripled its customer base in 2010.

# Continuent appointed Robert Hodges CEO and confirmed details of $5m funding from Aura Capital.

# Red Hat …

[Read more]
Webinar: navigating the changing landscape of open source databases

When we published our 2008 report on the impact of open source on the database market the overall conclusion was that adoption had been widespread but shallow.

Since then we’ve seen increased adoption of open source software, as well as the acquisition of MySQL by Oracle. Perhaps the most significant shift in the market since early 2008 has been the explosion in the number of open source database and data management projects, including the various NoSQL data stores, and of course Hadoop and its associated projects.

On Tuesday, November 9, 2010 at 11:00 am EST I’ll be joining Robin Schumacher, Director of Product Strategy from EnterpriseDB to present a …

[Read more]
How Real is the Data Deluge?

It seems obvious that given the decreasing cost of storage and computation, there's going to be a significant increase in the volume of data that organizations accumulate over the next 10 years.  But the type of data being accumulated may be different from the areas where traditional DBMSs dominated.  It's not just about transactions; it's search patterns, on-line behavior, click-thru data, events fired off by smartphones, messages over Twitter & Facebook, log data of various kinds.

If an organization can figure out a better way identify prospects, or deliver more targeted ads, or optimize pricing decisions by analyzing terrabytes of data, they'd be crazy not to. Over the long term, companies that don't develop these capabilities will be at a competitive disadvantage.

As to what the implications are from a …

[Read more]
The SMAQ stack for big data

SMAQ report sections

→ MapReduce

→ Storage

→ Query

→ Conclusion

"Big data" is data that becomes large enough that it cannot be processed using conventional methods. Creators of web search engines were among the first to confront this problem. Today, social networks, mobile phones, sensors and science contribute to petabytes of data created daily.

To meet the challenge of processing such large data sets, Google created MapReduce. Google's work and Yahoo's creation of the Hadoop MapReduce implementation has spawned an ecosystem of big data processing tools.

As MapReduce has grown in popularity, a stack for big data systems …

[Read more]
Do We Need a New Programming Language for Big Data?


 

I'm the boards of two companies (Pentaho, Revolution Analytics) that are starting to see a lot of customer traction around Big Data. More and more companies in media, pharma, retail and finance are doing advanced analysis, reporting, graphing, etc with massive data sets. It made me wonder what other areas of the technology stack might evolve with the trend towards Big Data.  Obviously, there's new middleware layers like Hadoop and Map Reduce, and we're also seeing the emergence of NoSQL data management layers with Cassandra, MongoDB, MemBase and others.  But what …

[Read more]
Open source in the clouds and in the debates

We continue to see more evidence of the themes we discuss in our latest CAOS special report, Seeding the Clouds, which examines the open source software used in cloud computing, the vendors backing open source, the cloud providers using it and the impact on the industry.

First, as usual, we are seeing consistencies between our own research — which indicates open source is a huge part of today’s cloud computing offerings from major providers like Amazon, Google, Rackspace, Terremark and VMware — and that of code analysis and management vendor Black Duck. In its analysis of code that runs the cloud, Black Duck also found a preponderance of open source pieces, in many cases the same projects we profile in our report.

Indeed, open source software is an important part of the infrastructure, …

[Read more]
Digg’s main competitor (Reddit) runs Cassandra but their VP of Engineering was fired for the decision to switch.

Apparently, Digg performed a big migration from MySQL to Cassandra and a big migration to their new Digg v4 architecture and now their VP of Engineering has been shown the door:

Ever since Digg launched its new site design, it’s been plagued with all kinds of trouble, not least of which is that it keeps going down. The problems with the new architecture are so bad that VP of Engineering John Quinn is now gone, we’ve confirmed with sources close to Digg.

In a Diggnation video today, CEO Kevin Rose explained some of the technical issues the site is dealing with and why it can’t simply roll back to the previous architecture. The new version of Digg, v4, is based on a distributed database called Cassandra, which replaced the MySQL database the site ran on before. Cassandra is very advanced—it is supposed to be faster and scale …

[Read more]
Integrating MySQL and Hadoop - or - A different approach on using CSV files in MySQL

We use both MySQL and Hadoop a lot. If you utilize each system to its strengths then this is a powerful combination. One problem we are constantly facing is to make data extracted from our Hadoop cluster available in MySQL.

The problem

Look at this simple example: Let’s say we have a table customer:

CREATE TABLE customer {

    id UNSIGNED INT NOT NULL,
    firstname VARCHAR(100) NOT NULL,
    lastname VARCHAR(100) NOT NULL,
    city VARCHAR(100) NOT NULL,

    PRIMARY KEY(id)
}

In addition to that we store orders customers made in Hadoop. An order includes: customerId, date, itemId, price. Note that these structures serve as a very simplified example.

Let’s say we want to find the first 50 customers, that placed at least one order sorted by firstname ascending. If both tables …

[Read more]
The number of Hadoop jobs continue to rise

While still a small fraction1 of data management job postings, the number of job posts that mention "hadoop" continue to grow steadily. Year-over-year, there were 300% more such job posts2 in the first seven months of 2010 compared to the same period in 2009:





The fraction of "hadoop" jobs posted by California companies remain high, but is definitely lower than what it was last year:





(1) Over the last three months, job posts that mention "hadoop" were inching towards 8-10% of the number of job posts that mention "mysql".

(2) Data for this post is for U.S. online job postings through 7/31/2010 and is maintained in partnership with SimplyHired.com. We …

[Read more]
451 CAOS Links 2010.06.29

Elephants on parade: Hadoop goes mainstream. And more.

Follow 451 CAOS Links live @caostheory on Twitter and Identi.ca
“Tracking the open source news wires, so you don’t have to.”

Elephants on parade
# Cloudera launched v3 of its Distribution for Hadoop and released v1 of Cloudera Enterprise.

# Karmasphere released new Professional and Analyst Editions of its Hadoop development and deployment studio.

# Talend announced that its Integration Suite now offers native support for Hadoop.

# Yahoo …

[Read more]
Showing entries 131 to 140 of 164
« 10 Newer Entries | 10 Older Entries »