Showing entries 201 to 210 of 211
« 10 Newer Entries | 1 Older Entries »
Displaying posts with tag: big data (reset)
Big Data innovation marches on

With IBM intending to acquire Netezza the predicted consolidation in the distributed analytics market is well underway.  Recent deals include EMC/Greenplum Teradata/Kickfire and now IBM/Netezza.  A good breakdown of this deal is on Curt’s blog.  There is still more to go of course with one of the crown jewels, Vertica, still ripe for the picking. 

What this indicates is that MPP analytics has moved from the innovative edge into the mainstream market and now the more risk adverse large caps and now willing to invest substantially in growing this market.  Interestingly Microsoft made this move early with the …

[Read more]
Was Stonebraker right?

Back in 2008 Stonebraker & DeWitt published a paper and associated blog post titled “MapReduce: A major step backwards”.  Their key points being Map Reduce is:

  1. A giant step backward in the programming paradigm for large-scale data intensive applications
  2. A sub-optimal implementation, in that it uses brute force instead of indexing
  3. Not novel at all — it represents a specific implementation of well known techniques developed nearly 25 years ago
  4. Missing most of the features that are routinely included in current DBMS
  5. Incompatible with all of the tools DBMS users have come to depend …
[Read more]
VLDB 2010

I will be at VLDB 2010 next week.  If anyone on this blog is attending and wants to catch up to discuss start ups and innovation in DB, NoSQL, Big Data etc drop me a line and I will try to meet up.

The number of Hadoop jobs continue to rise

While still a small fraction1 of data management job postings, the number of job posts that mention "hadoop" continue to grow steadily. Year-over-year, there were 300% more such job posts2 in the first seven months of 2010 compared to the same period in 2009:





The fraction of "hadoop" jobs posted by California companies remain high, but is definitely lower than what it was last year:





(1) Over the last three months, job posts that mention "hadoop" were inching towards 8-10% of the number of job posts that mention "mysql".

(2) Data for this post is for U.S. online job postings through 7/31/2010 and is maintained in partnership with SimplyHired.com. We …

[Read more]
Four short links: 1 July 2010
  1. Conflict Minerals and Blood Tech (Joey Devilla) -- electronic components have a human and environmental cost. I remember Saul Griffith asking me, "do you want to kill gorillas or dolphins?" for one component. Now we can add child militias and horrific rape to the list. (via Simon Willison)
  2. Meteor -- an open source HTTP server that serves streaming data feeds (for apps that need Comet-style persistent connections). (via gianouts on Delicious)
[Read more]
Four short links: 25 June 2010
  1. Membase -- an open-source (Apache 2.0 license) distributed, key-value database management system optimized for storing data behind interactive web applications. These applications must service many concurrent users; creating, storing, retrieving, aggregating, manipulating and presenting data in real-time. Supporting these requirements, membase processes data operations with quasi-deterministic low latency and high sustained throughput. (via Hacker News)
  2. Sergey's Search (Wired) -- Sergey Brin, one of the Google founders, learned he had a gene allele that gave him much higher odds of getting Parkinson's. His response has been to help medical research, both with money and through 23andme. …
[Read more]
Four short links: 4 November 2009
  1. ChipHacker -- collaborative FAQ site for electronics hacking. Based on the same StackExchange software as RedMonk's FOSS FAQ for open source software.
  2. Democracy Live -- BBC launch searchable coverage of parliamentary discussion, using speech-to-text. One aspect we're particularly proud of is that we've managed to deliver good results for speech-to-text in Welsh, which, we're told, is unique. I think of this as the start of a They Work For You for video coverage. I'd love to be able to scale this to local government coverage, which is disappearing as local newspapers turn into …
[Read more]
Four short links: 7 August 2009
  1. Defragging the Stimulus -- each [recovery] site has its own silo of data, and no site is complete. What we need is a unified point of access to all sources of information: firsthand reports from Recovery.gov and state portals, commentary from StimulusWatch and MetaCarta, and more. Suggests that Recovery.gov should be the hub for this presently-decentralised pile of recovery data.
  2. Memetracker -- site accompanying the research written up by the New York Times as Researchers at Cornell, using powerful computers and clever algorithms, studied the news cycle by looking for repeated phrases and tracking their appearances on 1.6 million …
[Read more]
Four short links: 26 May 2009
  1. Flare -- dynamically partitioning and reconstructing key-value server. Currently built on Tokyo Cabinet, but backend is theoretically pluggable. (via joshua on delicious)
  2. Implantable Device Offers Continuous Cancer Monitoring -- the sensor network begins to extend into our bodies. The cylindrical, 5-millimeter implant contains magnetic nanoparticles coated with antibodies specific to the target molecules. Target molecules enter the implant through a semipermeable membrane, bind to the particles and cause them to clump together. That clumping can be detected by MRI (magnetic resonance imaging). The device is made of a polymer called polyethylene, which is commonly used in orthopedic …
[Read more]
Big Data: SSD's, R, and Linked Data Streams

The Solid State Storage Revolution: If you haven't seen it, I recommend you watch Andy Bechtolsheim's keynote at the recent Mysqlconf. We covered SSD's in our just published report on Big Data management technologies. Since then, we've gotten additional signals from our network of alpha geeks and our interest in them remains high.


R and Linked Data Streams: I had a chance to visit with Dataspora founder and blogger Mike Driscoll, an enthusiastic advocate for the use of the open source statistical computing …

[Read more]
Showing entries 201 to 210 of 211
« 10 Newer Entries | 1 Older Entries »