LinkSmart’s Low-Cost, Big Data Plan with Linux and MapR

76

LinkSmart’s audience and link management platform for publishers was built with big data at its core. So when management decided to migrate the cloud-based application to their own hardware, there was no question it would be completely powered by Linux. 

Linux-based infrastructure allows the 3-year-old startup to cut costs, both by avoiding the licensing fees of proprietary systems and by tapping the community’s collective knowledge base instead of paying for expensive support contracts, said CTO Manny Puentes.

Manny Puentes, CTO LinkSmart“Linux is the operating system really helping to propel a lot of the startup community and innovation,” Puentes said.

LinkSmart uses mostly open source software, including Hadoop, to mine terabytes of audience-related data for digital publishers. Using information such as how long visitors stay on-site and what they’re interested in, LinkSmart’s platform can then steer a publisher’s content decisions as well as strategically insert links that drive more pageviews and sales.

Because much of the open source software they use embraces the Linux environment, it’s easier to find developers who are accustomed to working on Linux, Puentes said. They can also contribute back to upstream projects important to their business, he said.

“That’s how we surface important information to our publishers at a low cost to the business,” Puentes said.

More IT cost savings with MapR

LinkSmart’s core data platform is the MapR Hadoop distribution running on 10 Intel Xeon processors with 64GB memory and four 2-terabyte SAS drives installed with Ubuntu 12.04. Other tools include Kafka, Cassandra and mySQL for storage.

Key to the company’s IT cost savings is MapR’s NFS mount, Puentes said.  The feature allows systems administrators to directly access the Hadoop cluster using standard Linux commands.  They can run sort to change the order of how data is presented, for example, or tail a long-running batch job as it’s running.

“In other distributions, a developer has to wait until the program completes before viewing results,” said Jack Norris, chief marketing officer at MapR. “If it is a long-running job — and many Hadoop jobs can execute for hours — this causes delays in determining if there are problems with output that can be detected much earlier through a tail command.”

For a small company like LinkSmart, with only 20 employees, MapR’s Linux integration means the IT staff can maintain the distributed file system like it manages all their other file systems. And developers save time by accessing the stack directly instead of writing a special script to pull data.

“It’s not just a cool feature; it plays an economic role,” Puentes said. “I was able to put in a Hadoop cluster and I have one IT guy managing the file system and all other Linux file systems. For a startup, that’s incredible.”

MapR has intentionally worked over the past two years to better integrate with enterprise operating systems including Linux, Norris said. As a result, their distribution now supports Linux commands across distros from Red Hat, CentOS, Ubuntu, SUSE and OpenStack.

“It’s about defining the new stack for enterprises in terms of how they’re going to drive the applications of the future,” Norris said. “It’s a much simplified stack and at the foundation is Linux.”