How to Easily Deploy an IMDG in the Cloud

Cloud-based applications enjoy the unique elasticity that cloud infrastructures provide. As more computing resources are needed to handle a growing workload, virtual servers (also called cloud “instances”) can be added to take up the slack. For example, consider a Web server farm handling requests for Web users or mobile apps. Being able to add computing resources on demand keeps work queues small and ensures that Web users always see fast response times. And after a period of peak demand subsides, resources can be dialed back to minimize cost without compromising quality of service. Flexible pricing options on some public clouds ranging from hourly to annual charges per instance give organizations the ability to cost-effectively outsource hosting for their production applications. Continue reading

facebooktwitterredditpinterestlinkedinmail
facebooktwitterlinkedinyoutube

AppFabric Caching: Retry Later

We have spent a great deal of time at ScaleOut Software re-architecting our in-memory data grid (IMDG)’s code base to make best use of many cores and large memory. For example, the IMDG must be able to efficiently create millions of objects in each server to make use of its huge storage capacity. Likewise, object access paths must be heavily multi-threaded and avoid lock contention to minimize access latency and maximize throughput. Also, load-balancing after membership changes must be both multi-threaded and pipelined to drive the network at maximum bandwidth.

Given all this, we thought it would be a good opportunity to see how we are doing relative to the competition, and in particular, relative to Microsoft’s AppFabric caching for Windows on-premise servers. In addition to looking at performance differences, we also want to compare ScaleOut StateServer (SOSS) to AppFabric on qualitative measures, such as features, ease of installation, and management. Continue reading

facebooktwitterredditpinterestlinkedinmail
facebooktwitterlinkedinyoutube

Reports of Scale-Out’s Demise Are Greatly Exaggerated

A recent blog post highlighted a Microsoft technical report which asserts that most Hadoop workloads are 100 GB or smaller, and for almost all workloads except the very largest “a single ‘scale-up’ server can process each of these jobs and do as well or better than a cluster in terms of performance, cost, power, and server density.”  It’s certainly true that Hadoop MapReduce seems to have focused more on clustering issues than on single-server optimizations. But — to paraphrase Mark Twain — reports of scale-out’s demise for all but the largest workloads are greatly exaggerated. Continue reading

facebooktwitterredditpinterestlinkedinmail
facebooktwitterlinkedinyoutube

Using In-Memory Data Grids for ETL on Streaming Data

The Hadoop stack offers a compelling set of technologies and tools that can be deployed to serve as the core of next-generation data warehouses. The combination of scalable MapReduce to analyze petabyte data sets, parallel SQL query using Hive or Impala, and data visualization tools gives the analyst powerful resources for mining strategically important data. The Hadoop Distributed File System (HDFS) serves as a highly scalable data repository for hosting this data and efficiently feeding it into Hadoop’s parallel analysis engine. With industrial strength support from companies like Cloudera and others, the time is now right for deploying a Hadoop-based data warehouse: Continue reading

facebooktwitterredditpinterestlinkedinmail
facebooktwitterlinkedinyoutube

How Do In-Memory Data Grids Differ from Storm?

­­In last week’s blog post, we talked about the fact that our in-memory computing technology often is confused with popular other “big data” technologies, in particular Spark/Spark Streaming, Storm, and complex event processing (CEP).  As we mentioned, these innovative technologies are great at what they’re built for, but in-memory data grids (IMDGs) were created for a distinct use case. In this blog post, we will take a look at how IMDGs differ from Storm. Continue reading

facebooktwitterredditpinterestlinkedinmail
facebooktwitterlinkedinyoutube

How Do In-Memory Data Grids Differ from Spark?

As an in-memory computing vendor, we’ve found that our products often get confused with some popular open-source, in-memory technologies. Perhaps the three technologies we are most often confused with are Spark/Spark Streaming, Storm, and complex event processing (CEP). These innovative technologies are great at what they’re built for, but in-memory data grids (IMDGs) were created for a distinct use case. In this blog post, we will take a look at how IMDGs differ from Spark and Spark Streaming. Continue reading

facebooktwitterredditpinterestlinkedinmail
facebooktwitterlinkedinyoutube

Transforming Retail with Real-Time Analytics

Real-time analytics has the potential to transform operational systems by providing instant feedback that dramatically enhances how these systems respond to fast-changing events. For example, in a previous blog we saw how a hedge fund tracking its equity portfolios can respond to market fluctuations in milliseconds instead of minutes. However, these benefits are not restricted to financial services. In discussions with both e-commerce and brick-and-mortar retail companies, we also have identified opportunities to enhance their operational systems with real-time analytics. Let’s take a look at a few examples after a quick review of in-memory data grids (IMDGs). Continue reading

facebooktwitterredditpinterestlinkedinmail
facebooktwitterlinkedinyoutube

How Object-Oriented Programming Simplifies Data-Parallel Analytics

In-memory computing enables real-time analytics to be integrated into operational systems so that fast-changing, “live” data can be instantly evaluated to provide feedback in milliseconds or seconds. As we have discussed in previous blogs, the key to scalable performance and fast response time lies in the use of data-parallel programming techniques. How can we structure these computations to ease their integration into operational systems? Continue reading

facebooktwitterredditpinterestlinkedinmail
facebooktwitterlinkedinyoutube

Creating Data-Parallel Computations for Real-Time Analytics

Real-time analytics offers enterprises the ability to examine “live,” fast-changing data within operational systems and obtain feedback in milliseconds to seconds. For example, a hedge fund in a financial services organization can track the effect of market fluctuations on its portfolios (“strategies”) of long and short equity positions in various market areas (high tech, real estate, etc.) and immediately identify strategies requiring rebalancing. As we have seen in previous blogs, the key to real-time performance, especially for growing workloads, is to use in-memory, data-parallel computing, which delivers scalable throughput and minimizes performance losses due to data motion. But how can we easily structure computations to take advantage of this “scale out” technology? Continue reading

facebooktwitterredditpinterestlinkedinmail
facebooktwitterlinkedinyoutube

Scaling Real-Time Analytics with an IMDG

In the last blog we discussed how in-memory data grids (IMDGs) share the same architecture as parallel supercomputers. Parallel supercomputers typically add computing power by scaling “out” across a cluster of servers. Likewise, IMDGs scale out their in-memory data storage and analytics engine across service processes running on a cluster of servers. Let’s take a little deeper look at the benefits of scaling out, especially for computations in real-time analytics. Continue reading

facebooktwitterredditpinterestlinkedinmail
facebooktwitterlinkedinyoutube