Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
1
votes
1
answers
1008
views
How to check how many keys stored in Aerospike?
I am new to Aerospike and exploring AQL and ASADM. I Executed command "show sets" and got response - ``` aql> show sets +------------------+-------------+---------+-------------------+---------------+-------------------+-------------------+--------------+------------+ | disable-eviction | ns | objec...
I am new to Aerospike and exploring AQL and ASADM.
I Executed command "show sets" and got response -
aql> show sets
+------------------+-------------+---------+-------------------+---------------+-------------------+-------------------+--------------+------------+
| disable-eviction | ns | objects | stop-writes-count | set | memory_data_bytes | device_data_bytes | truncate_lut | tombstones |
+------------------+-------------+---------+-------------------+---------------+-------------------+-------------------+--------------+------------+
| "false" | "tvpreprod" | "35249" | "0" | "data" | "1684642852" | "0" | "0" | "0" |
| "false" | "tvpreprod" | "12229" | "0" | "epg" | "3035957260" | "0" | "0" | "0" |
| "false" | "tvpreprod" | "6009" | "0" | "account" | "6288324" | "0" | "0" | "0" |
| "false" | "tvpreprod" | "24821" | "0" | "epg_account" | "59593681" | "0" | "0" | "0" |
| "false" | "tvstage" | "2956" | "0" | "data" | "66573412" | "0" | "0" | "0" |
| "false" | "tvstage" | "1873" | "0" | "account" | "1984140" | "0" | "0" | "0" |
| "false" | "tvstage" | "18060" | "0" | "epg_account" | "30209254" | "0" | "0" | "0" |
| "false" | "tvstage" | "5197" | "0" | "epg" | "1792880530" | "0" | "0" | "0" |
+------------------+-------------+---------+-------------------+---------------+-------------------+-------------------+--------------+------------+
[xx.xx.xx.xx:3000] 8 rows in set (0.781 secs)
But on executing : select * from tvstage.data, I got result -
138 rows in set (3.796 secs)
I am not able to understand why there is difference in count?
**Show sets** gives 2956 records, but **Select** shows only 138 records.
All the keys pushed are having different TTLs and expiry is set to not more than 24 hours
Vibhav Singh Rohilla
(199 rep)
Aug 25, 2022, 01:06 PM
• Last activity: Mar 18, 2025, 01:03 AM
3
votes
2
answers
2960
views
Choosing the right database for stock price history
The model is `[(stock_id, period, ts), open, high, low, close, last, volume]` We write new prices for all stocks (120,000) each minute and delete old once when they go out of retention time. It doesn't matter if retention cleanup will happen automatically or we'll do daily cleanup process. Periods a...
The model is
[(stock_id, period, ts), open, high, low, close, last, volume]
We write new prices for all stocks (120,000) each minute and delete old once when they go out of retention time. It doesn't matter if retention cleanup will happen automatically or we'll do daily cleanup process.
Periods are 1 minute, 10 minutes, 1 day, 7 days and about 1,000 to 10,000 last points data retention for each period.
Currently there are about 200,000,000 rows (40GB) in postgres table and the performance is sometimes bad, especially if autovacuum is triggered.
When we query we usually pull the whole period to show chart, but sometimes we access specific dates.
We try to understand if there are some optimizations that can be done in postgresql itself or maybe some other database will work better.
NoSQL databases that I consider for testing are MongoDB and Aerospike with the following document model. The document key would be stock_id in this case.
{
1111: {
"symbol": "aapl",
"hist_1m": {
12334: {
"last": 123.1,
"close": 123.2,
...
}, ...
},
"hist_10m": {...},
"hist_1d": {...},
"hist_7d": {...},
},
2222: {...}
But I'm really not sure about performance of such model where each sub-hash will be 1000 to 10,000 hashes or maybe even more in the future. In aerospike there's per-map max size of write-block-size
(1mb default), in Mongo the limit per document is 16mb but it doesn't explain the performance.
How fast or efficient are individual additions to large collections in MongoDB or Aerospike? Are they happen in-place or require loading whole collection in memory and rewriting it back to disk, like it would be with postgresql jsonb column?
In postgres we just do thousands of inserts and it's very fast. The performance issue happens because of nature of infinite insert/delete - gaps in the tablespace and autovacuum. Also very it takes quite a long time to do global backfills.
I even thought about timeseries DBs like Prometheus or InfluxDB but really don't think they're designed for realtime high-load querying.
Please suggest in which database/model you think is ideal for this purpose.
I searched for existing question with the same requirement (as I think thousands of systems store similar historical data in some way).
Thanks.
Kaplan Ilya
(173 rep)
Jun 16, 2022, 01:22 PM
• Last activity: Dec 21, 2024, 09:03 AM
0
votes
1
answers
392
views
Aerospike memory usage increasing constantly with time
I am new to Aerospike and exploring TTL features. I have a varity of keys stored in different sets and namespaces All the keys pushed are having TTL (from 10 minutes to 24 hours) This is how sets look like - ``` aql> show sets +------------------+-------------+---------+-------------------+---------...
I am new to Aerospike and exploring TTL features.
I have a varity of keys stored in different sets and namespaces
All the keys pushed are having TTL (from 10 minutes to 24 hours)
This is how sets look like -
aql> show sets
+------------------+-------------+---------+-------------------+---------------+-------------------+-------------------+--------------+------------+
| disable-eviction | ns | objects | stop-writes-count | set | memory_data_bytes | device_data_bytes | truncate_lut | tombstones |
+------------------+-------------+---------+-------------------+---------------+-------------------+-------------------+--------------+------------+
| "false" | "tvpreprod" | "35255" | "0" | "data" | "1684670710" | "0" | "0" | "0" |
| "false" | "tvpreprod" | "12239" | "0" | "epg" | "3037722712" | "0" | "0" | "0" |
| "false" | "tvpreprod" | "6019" | "0" | "account" | "6300040" | "0" | "0" | "0" |
| "false" | "tvpreprod" | "24847" | "0" | "epg_account" | "59688725" | "0" | "0" | "0" |
| "false" | "tvstage" | "2958" | "0" | "data" | "66575414" | "0" | "0" | "0" |
| "false" | "tvstage" | "1877" | "0" | "account" | "1989341" | "0" | "0" | "0" |
| "false" | "tvstage" | "18090" | "0" | "epg_account" | "30313634" | "0" | "0" | "0" |
| "false" | "tvstage" | "5202" | "0" | "epg" | "1798086251" | "0" | "0" | "0" |
+------------------+-------------+---------+-------------------+---------------+-------------------+-------------------+--------------+------------+
[xx.xx.xx.xx:3000] 8 rows in set (0.848 secs)
I am using Java Client to put and get the data from Aerospike.
Code to push data to Aerospike -
public void setWithTTL(String key, String value, Integer ttl) throws AerospikeException {
try {
Key primaryKey = new Key(config.getNameSpace(), config.getSetName(), key);
WritePolicy policy = new WritePolicy();
policy.expiration = ttl;
Bin keyBin = new Bin(config.getBinNameKey(), key);
Bin valueBin = new Bin(config.getBinNameValue(), value);
aerospikeClient.put(policy, primaryKey, keyBin, valueBin);
logCacheOperation(OPERATION.PUT, primaryKey, valueBin);
} catch (AerospikeException e) {
log.error("AerospikeException @ PUT", e);
throw e;
}
}
Code to get the data -
public String get(String key) throws AerospikeException {
try {
Key primaryKey = new Key(config.getNameSpace(), config.getSetName(), key);
Record record = aerospikeClient.get(aerospikeClient.getReadPolicyDefault(), primaryKey);
logCacheOperation(OPERATION.GET, primaryKey, record);
return record.bins.get(config.getBinNameValue()).toString();
} catch (AerospikeException e) {
log.error("AerospikeException @ GET", e);
throw e;
}
}
Aerospike Client is created using -
ClientPolicy clientPolicy = new ClientPolicy();
clientPolicy.user = "test";
clientPolicy.password = "xxxx";
String[] nodes = "x.x.x.x:3000,y.y.y.y:3000".split(",");
Host[] hosts = new Host[nodes.length];
AerospikeClient aerospikeClient = new AerospikeClient(clientPolicy, hosts);
Aerospike configuration file looks like this -
# Aerospike database configuration file for use with systemd.
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
service-threads 20
proto-fd-max 15000
cluster-name tv-hu-prod
node-id-interface ens5
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address any
access-address xx.xx.xx.xx
port 3000
}
heartbeat {
mode mesh
address yy.yy.yy.yy
port 3002
mesh-seed-address-port yy.yy.yy.yy 3002
mesh-seed-address-port yy.yy.yy.yy 3002
mesh-seed-address-port yy.yy.yy.yy 3002
# To use unicast-mesh heartbeats, remove the 3 lines above, and see
# aerospike_mesh.conf for alternative.
interval 150
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace tvstage {
replication-factor 1
memory-size 60G
default-ttl 86400
storage-engine memory
allow-ttl-without-nsup true
}
The problem is Aerospike memory consumption is constantly increasing and after every 5-7 days memory is completely consumed and node needs to be restarted.
I am not able to understand why memory is being consumed when every key is having TTL and expiry is ranging from 10 minutes to 24 hours.
Please let me know if any more information is needed.
Aerospike Version : C-5.4.0.4
Vibhav Singh Rohilla
(199 rep)
Aug 25, 2022, 01:25 PM
• Last activity: Aug 31, 2022, 03:43 AM
3
votes
2
answers
367
views
Implementating user notifications list using Aerospike
I need to choose the right DB for a notifications system that needs to handle billions of notifications. The record structure is as follows: [user_id, item_type, item_id, created_at, other_data] The inserts are going to be in bulks up to hundreds of thousands at spikes. And it needs to support thous...
I need to choose the right DB for a notifications system that needs to handle billions of notifications. The record structure is as follows:
[user_id, item_type, item_id, created_at, other_data]
The inserts are going to be in bulks up to hundreds of thousands at spikes.
And it needs to support thousands of selects per minute of this kind:
select * from user_notifications where user_id=12345 order by created_at limit 10
select * from user_notifications where user_id=12345 and item_type='comment' order by created_at limit 10
--- and for the pagination next page:
select * from user_notifications where user_id=12345 and item_type='comment' and created_at>'2020-11-01 10:50' order by created_at limit 10
It should also allow quick updates and deletes and ideally have TTL on each record.
Right now it's implemented using MySQL, we only have 400M rows and it's already slow as hell. And bulk cleanup is just impossible.
Initially, I thought ScyllaDB/Cassandra is ideal for that. If I set the primary key to be
[user_id, item_type, item_id]
(user_id being the partition key) for inserts/updates/deletes and [user_id, item_type, created_at]
as secondary index. CQL seems straightforward in this case and it should work fast (correct me if I'm wrong). The problem is that we are Ruby-On-Rails based and there's no good ruby client library for that. The one listed in the ScyllaDB clients' list (https://github.com/datastax/ruby-driver) is in maintenance mode and I'm not sure it will be updated with new Ruby versions, etc.
Recently, I heard about Aerospike and their benchmarks look really cool, but I couldn't figure out how to implement the above requirement using Aerospike's architecture. Especially as their secondary index seems to be always in memory which makes it impossible to index billions of rows.
This notifications schema seems to me like something very common, but still, I couldn't find a good article describing all the ideal ways to implement it.
Any suggestions are welcome.
Thanks
Kaplan Ilya
(173 rep)
Dec 2, 2020, 10:15 PM
• Last activity: Dec 3, 2020, 06:44 PM
1
votes
1
answers
124
views
Delete record in Aerospike?
I'm tuning Aerospike, after I use `Recipe for an SSD Storage Engine` with this config: ``` namespace test { replication-factor 2 memory-size 4G default-ttl 30d # 30 days, use 0 to never expire/evict. #storage-engine memory storage-engine device { # Configure the storage-engine to use persistence dev...
I'm tuning Aerospike, after I use
Recipe for an SSD Storage Engine
with this config:
namespace test {
replication-factor 2
memory-size 4G
default-ttl 30d # 30 days, use 0 to never expire/evict.
#storage-engine memory
storage-engine device { # Configure the storage-engine to use persistence
device /dev/vdb # raw device. Maximum size is 2 TiB
# device /dev/ # (optional) another raw device.
write-block-size 128K # adjust block size to make it efficient for SSDs.
}
}
After run the benchmarks first time,my SSD is completely full
quanlm@quanlm2:/mnt/device$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 2.0G 0 2.0G 0% /dev
tmpfs 395M 3.1M 392M 1% /run
/dev/vda1 60G 2.8G 54G 5% /
tmpfs 2.0G 0 2.0G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
/dev/vdb 64Z 64Z 59G 100% /mnt/device
tmpfs 395M 0 395M 0% /run/user/1000
So is there anyway to delete the data in my SSD (/dev/vdb) ?
Lê Minh Quân
(145 rep)
Dec 16, 2019, 09:24 AM
• Last activity: Dec 16, 2019, 07:48 PM
1
votes
1
answers
213
views
Setup an aerospike high-availbilty cluster?
Are there any guide to set up an aerospike HA? I had search google but there was no possible result in return. Look like Aerospike is not really popular in common. P/s: Lab purpose so there only need 2-3 node/host
Are there any guide to set up an aerospike HA? I had search google but there was no possible result in return.
Look like Aerospike is not really popular in common.
P/s: Lab purpose so there only need 2-3 node/host
Lê Minh Quân
(145 rep)
Nov 29, 2019, 02:52 AM
• Last activity: Nov 30, 2019, 04:19 PM
3
votes
3
answers
4609
views
Deleting aerospike bins
I have an Aerospike instance and I messed up one of my namespaces. I didn't know there was a ~32k limit on bins, so I wrote unique bin names into my namespace. I hit that 32k limit, and now my whole namespace is hurting. How can I delete a bin from the entire namespace so I can free up my bin limit?...
I have an Aerospike instance and I messed up one of my namespaces. I didn't know there was a ~32k limit on bins, so I wrote unique bin names into my namespace. I hit that 32k limit, and now my whole namespace is hurting. How can I delete a bin from the entire namespace so I can free up my bin limit?
I've looked at the aerospike documentation and I just can't seem to find anything. I've even looked for 3rd party programs and I can't find anything there ether.
Mr. MonoChrome
(131 rep)
May 1, 2014, 05:35 PM
• Last activity: Feb 2, 2018, 05:46 AM
2
votes
0
answers
97
views
Aerospike schema: which one is the better design?
We have database requirements as follows:- 1. Huge number of records: 10-100 millions per set. 2. Huge number of bins: around 100 bins 3. Some point queries need to be run within milliseconds. Now which one of the following two schema design philosophy will be better for aerospike? 1. Have one set o...
We have database requirements as follows:-
1. Huge number of records: 10-100 millions per set.
2. Huge number of bins: around 100 bins
3. Some point queries need to be run within milliseconds.
Now which one of the following two schema design philosophy will be better for aerospike?
1. Have one set of each type with all possible bins (in hundreds). But would it degrade Aerospike performance?
2. Categorize bins and have multiple sets with each set have around 10 bins max. But this means redundant keys in each set. High space complexity and hard to combine data for same key from different bins.
Mangat Rai Modi
(141 rep)
Mar 9, 2016, 11:18 AM
• Last activity: Mar 9, 2016, 11:25 AM
2
votes
1
answers
1049
views
Using Aerospike without SSD
Is there a drawback for using [Aerospike][1] when used in non-SSD server? [1]: http://aerospike.com
Is there a drawback for using Aerospike when used in non-SSD server?
Kokizzu
(1403 rep)
Nov 28, 2014, 01:04 AM
• Last activity: Nov 28, 2014, 07:15 AM
Showing page 1 of 9 total questions