Questions and answers on big data on Stack Overflow

 

Stack Overflow is a question and answer site. It's 100% free, no registration required.

 

Big Data is a blanket term for any collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis and visualization.

 

Data sets grow in size in part because they are increasingly being gathered by ubiquitous information-sensing mobile devices, aerial sensory technologies, software logs, cameras, microphones, radio-frequency identification (RFID) readers, and wireless sensor networks. What is considered "big data" varies depending on the capabilities of the organization managing the set, and on the capabilities of the applications that are traditionally used to process and analyze the data set in its domain (from Wikipedia, the free encyclopedia).

 

Why we are importing data into hive/hbase using sqoop? [on hold]

Fri, 15 December 2017 19:16 +0000 GMT

GraphStream - Visualizing Big Data fails

Fri, 15 December 2017 17:36 +0000 GMT

date_trunc in hive is working incorrectly

Fri, 15 December 2017 11:58 +0000 GMT

How to populate a Country-Region-City form in a database [on hold]

Fri, 15 December 2017 11:56 +0000 GMT

How to get a query result into a key value form in HiveQL

Fri, 15 December 2017 09:58 +0000 GMT

How to download Hadoop files (on HDFS) via FTP?

Fri, 15 December 2017 09:18 +0000 GMT

Python Spark: How to join 2 datasets containing >2 elements for each tuple

Fri, 15 December 2017 01:51 +0000 GMT

Inserting a value on a frozen set in cassandra 3

Thu, 14 December 2017 17:33 +0000 GMT

Copying from one directory in HDFS to another directory in HDFS using JAVA

Wed, 13 December 2017 23:02 +0000 GMT

how to load json file greater than 10gb in pandas/python of a particular pattern

Wed, 13 December 2017 17:58 +0000 GMT

What happends at backend when we alter a table in hive

Tue, 12 December 2017 15:23 +0000 GMT

Statistical analysis in R on big data(Amazon redshift)

Tue, 12 December 2017 14:33 +0000 GMT

CSV file is being created larger than the size of my original data in python/pycharm?

Tue, 12 December 2017 13:21 +0000 GMT

Elasticsearch partial bulk update

Tue, 12 December 2017 13:14 +0000 GMT

Optimizing pyspark config for 86GB data

Tue, 12 December 2017 12:02 +0000 GMT

Apache Flink writeAsCsv() method to write a tuple of objects

Tue, 12 December 2017 09:39 +0000 GMT

How to query array fields in range and count number conforming range condition in elasticsearch

Tue, 12 December 2017 01:56 +0000 GMT

Get Custom TB/Day Memory Utilization report in Cloudera 5.10.x

Mon, 11 December 2017 21:09 +0000 GMT

How to run Saved Searchs on Splunk CLI?

Mon, 11 December 2017 13:56 +0000 GMT

Big educational datasets for mining project

Mon, 11 December 2017 12:23 +0000 GMT

Cassandra OOM crash

Mon, 11 December 2017 06:58 +0000 GMT

Cloudera - Unable to connect to Impala

Sun, 10 December 2017 23:00 +0000 GMT

Iterating over very large dataframe efficiency in python pandas is too time consuming

Sun, 10 December 2017 20:35 +0000 GMT

SQL Server vs. MySQL

Sat, 09 December 2017 14:50 +0000 GMT

SequenceFileInputFormat for Hadoop

Fri, 08 December 2017 23:25 +0000 GMT

perl regex match pattern through sql.dump file larger than 100GB

Fri, 08 December 2017 21:10 +0000 GMT

Order by not work in cassandra

Fri, 08 December 2017 09:32 +0000 GMT

Split hive partition to create multiple partition

Fri, 08 December 2017 06:34 +0000 GMT

The fastest way to extract a identical data from two text files [duplicate]

Fri, 08 December 2017 05:00 +0000 GMT

Bulk DocumentDB insertion increase CPU usage

Thu, 07 December 2017 18:08 +0000 GMT