Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
2
votes
0
answers
32
views
Why is dropping a table from HBase a blocking operation?
After a lot of preparation work, today we were ready to drop our biggest HBase table (1 petabyte before replication) from the database. As needed, we started with disabling the table, this took 4 minutes. After confirming nobody saw any issues, we proceeded to drop the table. This process is now sti...
After a lot of preparation work, today we were ready to drop our biggest HBase table (1 petabyte before replication) from the database. As needed, we started with disabling the table, this took 4 minutes. After confirming nobody saw any issues, we proceeded to drop the table.
This process is now still running, 70 minutes into the task. It looks like all the region files are being deleted from HDFS before the HBase shell where the command to drop was given returns. This is affecting all other tables in the database, none of which are able to return answers. It appears to mostly be a problem with updates to the HBase:Meta table.
Why does HBase not return within a few seconds and let the region servers do their cleanup work as a background task?
Paul
(21 rep)
Aug 2, 2024, 01:14 PM
2
votes
1
answers
330
views
Binary storage in Cassandra, HBase
I am looking at some implementations of Cassandra and HBase for medium-sized data sets (~1M resources) to be exposed to clients as graphs (via e.g. Tinkerpop). I would also like to store binaries in the same data stores. While it seems like both systems support storing large binaries one way o anoth...
I am looking at some implementations of Cassandra and HBase for medium-sized data sets (~1M resources) to be exposed to clients as graphs (via e.g. Tinkerpop). I would also like to store binaries in the same data stores. While it seems like both systems support storing large binaries one way o another (HBase via HDFS) I wonder what the performance implications would be for using these versus flat file storage. Are these systems designed to store binaries at scale, or are they more targeted at metadata storage? I am talking about 100s of Tb of binary data.
Thanks
.s
gattu marrudu
(21 rep)
Apr 1, 2017, 06:06 AM
• Last activity: Nov 29, 2019, 11:02 PM
2
votes
0
answers
182
views
Storing a large, sparse matrix in Apache Hbase?
I'm currently testing out Apache Hbase for our Matrix analytics service. I'm using a managed cluster running on AWS EMR. The matrices are sparse, and we have 50,000 columns, and up to 10 million rows. The values are integer values The main operation we'd like to support is random access of any 500 r...
I'm currently testing out Apache Hbase for our Matrix analytics service. I'm using a managed cluster running on AWS EMR.
The matrices are sparse, and we have 50,000 columns, and up to 10 million rows. The values are integer values
The main operation we'd like to support is random access of any 500 rows and 30 columns. We would like this operation to return in under a second, and ideally under half a second.
Apache Hbase seemed like the ideal option, as it's advertised to support 'real-time' access of large sparse matrices with 'millions of columns and rows'.
The instance I'm running is sizable, with 4 nodes, each with 16 cores and 256GB of memory.
I've tried both 'wide and short' and 'tall and thin' table formats. The 'wide and short' format has 50,000 columns, and is simply the matrix represented in hbase. The 'tall and thin' format has the row key as a composite key, with the format '{row_name};{column_name}' - e.g.
;100507661
. The 'tall and thin' format has a single column containing the value.
For both formats, I've reduced the block size to 8192 bytes and turned on the Bloom Filter, as well as adding SNAPPY compression.
The access pattern for rows and columns is completely random. Any 500 rows and 30 cells can be requested at any time. There is no easy way to group the rows and columms for faster access.
I'm still seeing a latency of up to 5 seconds, which is much too slow. Is this too fast a response time to expect from such a large dataset? Or am I making some basic error? Should I try encoding and compressing the row key?
gacharya
(21 rep)
Jul 2, 2019, 12:11 AM
1
votes
0
answers
17
views
Automating HBase Online Merges
Long story short; I have an HBase database that divided into a huge number of small regions. I'm trying to merge them into larger regions, as the huge number of regions is drastically slowing performance. I'm looking for a simple way to automate the merges; I thought the simplest way would be trying...
Long story short; I have an HBase database that divided into a huge number of small regions. I'm trying to merge them into larger regions, as the huge number of regions is drastically slowing performance.
I'm looking for a simple way to automate the merges; I thought the simplest way would be trying to use the REST API to automate searching for neighboring regions and attempt online merges.
I'm having trouble finding examples of how to do this, though. Is there a proper REST endpoint that can do this?
If it makes a difference, there is a Cloudera 5x interface to the cluster.
Bart Silverstrim
(153 rep)
Jun 28, 2019, 02:45 PM
1
votes
0
answers
107
views
Is Big data solutions a good option for 500 million temporal records?
So I have about 250k items x 2000 days (2010 to 2019) ~= 500 million records. For each item I have variable number of fields. To begin with we have 50 fields defined for each item but in the future we want the ability to add more fields. With that said, we thought we could pack all of the fields of...
So I have about 250k items x 2000 days (2010 to 2019) ~= 500 million records.
For each item I have variable number of fields. To begin with we have 50 fields defined for each item but in the future we want the ability to add more fields.
With that said, we thought we could pack all of the fields of each item in a JSON blob. Postgres allows us to query the JSON blobs. We did some prototyping and it took around 3 seconds for the QC queries on about 1 year of data with a smaller subset of items.
Since we are in Prototyping stage, I was thinking of building a Prototype using HBase. Since writing a spark job to bulk load data into it would take atleast me a couple of days I wanted to check online if HBase is a good option for this problem.
Also are there any other RDBMS solutions we should consider for this problem. MongoDB is a no-go because of some bureaucratic reasons.
PS: our QC queries are mostly, given a date get me all the items. Given an item get me all the data available for that item on all dates. Occasionally get me all the data for a particular field for a particular item.
Aditya
(123 rep)
Feb 13, 2019, 09:11 PM
• Last activity: Feb 13, 2019, 09:29 PM
2
votes
0
answers
207
views
incremental update in HBase using sqoop
i am working on a POC where i need to implement same like SSIS for sql to hadoop. out DW are in SQL Server and we want to create a HBase at central location where all the data from diff DW will be posted. for the first time load i can do it using "sqoop import" command. but after then for changed da...
i am working on a POC where i need to implement same like SSIS for sql to hadoop.
out DW are in SQL Server and we want to create a HBase at central location where all the data from diff DW will be posted. for the first time load i can do it using "sqoop import" command. but after then for changed data how to do incremental of sqoop with HBase?
Radhi
(323 rep)
Aug 25, 2015, 08:10 AM
• Last activity: Nov 16, 2018, 01:50 AM
0
votes
1
answers
816
views
Does HBase support spatial functionality?
I see mention for spatial functional in HBase. For example [*"HBaseSpatial: A Scalable Spatial Data Storage Based on HBase"*](https://ieeexplore.ieee.org/abstract/document/7011307). What spatial functionality does HBase support and where is this documented?
I see mention for spatial functional in HBase. For example [*"HBaseSpatial: A Scalable Spatial Data Storage Based on HBase"*](https://ieeexplore.ieee.org/abstract/document/7011307) .
What spatial functionality does HBase support and where is this documented?
e7lT2P
(175 rep)
Nov 12, 2018, 05:31 PM
• Last activity: Nov 16, 2018, 01:47 AM
Showing page 1 of 7 total questions