Sample Header Ad - 728x90

Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes
1 answers
45 views
Database / Search-Index recommendation: Match time ranges from different categories
* 300k+ videos * 10+ millions of markers, pointing to timeranges in videos ```json { "markerCategory": "something", "markerDesc": "something-more-specific", "frameIn": 120001, "frameOut": 140002 }, { "markerCategory": "something-else", "markerDesc": "something-else-more-specific", "frameIn": 130001,...
* 300k+ videos * 10+ millions of markers, pointing to timeranges in videos
{
  "markerCategory": "something",
  "markerDesc": "something-more-specific",
  "frameIn": 120001,
  "frameOut": 140002
},
{
  "markerCategory": "something-else",
  "markerDesc": "something-else-more-specific",
  "frameIn": 130001,
  "frameOut": 135002
}
Any Suggestions which database / search-index would perform best, when searching for something along these lines: > Videos having events of category A __AND__ category B in overlapping timeranges, sorted by amount of covered time Videos are currently exported from some proprietary relational database and stored in an Apache SOLR instance for searching. * Is there a specific name for those kind of queries ("inverted range queries" or some thing like that...) ? * Any Suggestions which technology would perform best, for those types of queries? I was thinking maybe elasticsearch?
gherkins (103 rep)
Mar 6, 2025, 06:45 AM • Last activity: Mar 6, 2025, 01:08 PM
-1 votes
1 answers
29 views
Indexing system vs raw DMBS connection
I have created a new dataset that has 11,000,000+ rows with 4 pivot tables in MySQL. The tables are not that deep, only 6 - 12 cols per. My question is I've set up the Lucene Apache SOLR to index the data, and it work's great for searching .. BUT I haven't noticed a dramatic load time in searches. I...
I have created a new dataset that has 11,000,000+ rows with 4 pivot tables in MySQL. The tables are not that deep, only 6 - 12 cols per. My question is I've set up the Lucene Apache SOLR to index the data, and it work's great for searching .. BUT I haven't noticed a dramatic load time in searches. If I run a raw SQL command, the difference between the two is negligible. At what point is it more beneficial to use a indexing engine vs scripting a raw query yourself? Now granted I have the ability to set "weight" and all that via SOLR .. But my impression was that I would have dramatically reduced overhead on the server ... Is it just that my dataset isn't large enough / complicated enough to illicit these finding? Other than the obvious advantages being weight, rank and sort/filter -- What is the overhead advantage of using a indexing system such as SOLR?
Zak (113 rep)
Sep 8, 2023, 08:31 PM • Last activity: Sep 8, 2023, 10:17 PM
2 votes
0 answers
52 views
List solr replica properties
I'm looking for a way to list all properties on a replica. I checked this page: https://lucene.apache.org/solr/guide/7_6/collections-api.html But no such thing is mentioned there. Only adding, editing or deleting existing properties, but no way of listing all of them. How can I find out which proper...
I'm looking for a way to list all properties on a replica. I checked this page: https://lucene.apache.org/solr/guide/7_6/collections-api.html But no such thing is mentioned there. Only adding, editing or deleting existing properties, but no way of listing all of them. How can I find out which properties have already been set on a replica? EDIT: Found them in zookeeper. Use command: zkCli.sh get /solr/collections//state.json
KdgDev (211 rep)
Nov 1, 2019, 03:42 PM • Last activity: Nov 4, 2019, 09:23 AM
0 votes
1 answers
2430 views
WCS8 Solr Preprocess failed : Error SQLCODE=-601, SQLSTATE=42710, SQLERRMC=WCSSTGAP.XI_CATENTRY_PRICE_0
Our Solr Pro-process is failing with below error: > [2018/08/28 14:31:36:104 CDT] 00000001 W > com.ibm.commerce.foundation.dataimport.preprocess.AbstractDataPreProcessor:createDBTable(Connection, > String, String) create XI_CATENTRY_PRICE_0 with error: DB2 SQL Error: > SQLCODE=-601, SQLSTATE=42710,...
Our Solr Pro-process is failing with below error: > [2018/08/28 14:31:36:104 CDT] 00000001 W > com.ibm.commerce.foundation.dataimport.preprocess.AbstractDataPreProcessor:createDBTable(Connection, > String, String) create XI_CATENTRY_PRICE_0 with error: DB2 SQL Error: > SQLCODE=-601, SQLSTATE=42710, > SQLERRMC=WCSSTGAP.XI_CATENTRY_PRICE_0;TABLE, DRIVER=4.19.49 I suspect that DROP table XI_CATENTRY_PRICE_0 is failing and ultimately causing the creation of the existing table. How could I resolve this?
Prardhan (1 rep)
Aug 29, 2018, 11:36 PM • Last activity: Jun 13, 2019, 01:02 PM
4 votes
2 answers
2988 views
How we can do full text search and facets stuffs on BigQuery?
In the future, we will have millions ofs record in our web product. So we had used BigQuery for data storage and analysis. We have to build filters on our search page like Flipkart and Amazon provide product filters. Basically, we need different filters on left side of our search page. With help of...
In the future, we will have millions ofs record in our web product. So we had used BigQuery for data storage and analysis. We have to build filters on our search page like Flipkart and Amazon provide product filters. Basically, we need different filters on left side of our search page. With help of these filters, we can see our desired results. Every filters have their count, i.e., record counts in that category/term. So we have to build queries in such way that they give result and count of every category(term). Here, "category" means different types filter on different columns as we see on e-commerce sites, e.g., Flipkart, Amazon etc. > Faceted search (also called faceted navigation, guided navigation, or parametric search) breaks up search results into multiple categories, typically showing counts for each, and allows the user to "drill down" or further restrict their search results based on those facets. Is there any framework/plugin available like Solr which can be used with BigQuery to provide above desired functionality? Is BigQuery not suitable for above purpose? Do we need to stick with any RDBMS database (e.g., PostgreSQL, MySQL, etc.) and search engine (e.g., Solr, Elasticsearch, etc.) for this purpose?
Lalit Kumar Maurya (143 rep)
Jun 13, 2017, 04:22 AM • Last activity: Apr 8, 2019, 11:36 PM
2 votes
2 answers
297 views
SQL performances of selecting recently modified records Solr
I have a SQL Server 2916 database, and Solr. The Solr indexer runs often such queries: select book_collection from books group by book_collection having max(updated_on) > '2017-04-04 09:50:05' The column updated_on is updated by a trigger on insert/update; at every run of the Solr incremental indexe...
I have a SQL Server 2916 database, and Solr. The Solr indexer runs often such queries: select book_collection from books group by book_collection having max(updated_on) > '2017-04-04 09:50:05' The column updated_on is updated by a trigger on insert/update; at every run of the Solr incremental indexer (every 10 minutes), many queries like the above get the latest modified records and reindexes them. This table, for instance, has about one million rows and at every time the query would return 10-20 rows maximum. These queries end up in the list of the most expensive queries run on the database, so I would like to optimize them. My questions: 1) Would a timestamp column perform better than a datetime column? 2) If I changed the query like this, would it be more efficient? select distinct book_collection from books where updated_on > '2017-04-04 09:50:05' The first query plan is the original one, the second is modified by me. The fact that the second plan asks for an index, and the first one doesn't, really suggests that the modified query can use an index, if present, and is therefore better. Original Modified by me
carlo.borreo (1477 rep)
Apr 4, 2017, 08:53 AM • Last activity: Apr 4, 2017, 04:10 PM
1 votes
0 answers
70 views
Where do custom Solr schemas go in Riak?
I have checked these two pages in the Riak docs http://docs.basho.com/riak/kv/2.2.0/developing/usage/search/ http://docs.basho.com/riak/kv/2.2.0/developing/usage/search-schemas/ and they tell how to make a custom schema and how to attach it to an index, but they seem to expect the schema to be in a...
I have checked these two pages in the Riak docs
http://docs.basho.com/riak/kv/2.2.0/developing/usage/search/
http://docs.basho.com/riak/kv/2.2.0/developing/usage/search-schemas/
and they tell how to make a custom schema and how to attach it to an index, but they seem to expect the schema to be in a certain folder to do so. Whenever I try adding the schema through curl like this: curl -XPUT http://localhost:8098/search/schema/users \ -H 'Content-Type:application/xml' \ --data-binary @user_schema.xml it throws an error saying it isn't found. I've tried inserting the path too. I've found solutions online saying that it's somewhere in the data folder, but those were back in 2015 and I'm using version 2.2 of Riak now. Has anyone else recently had success with adding a custom solr schema to Riak? PS - I'm using ubuntu/trusty64 in a vm
AndrewK (11 rep)
Dec 29, 2016, 03:07 AM
2 votes
1 answers
231 views
DSE Cassandra decimal type
I generated solrschema with `dsetool` on a table name `seriesdata` (its key is composed by `seriesmetadata_id`, `initialtime`; two decimal field). These fields are defined as `DecimalStrField` but they are numeric valued (I have to execute range selection). I tried to change schema defining these fi...
I generated solrschema with dsetool on a table name seriesdata (its key is composed by seriesmetadata_id, initialtime; two decimal field). These fields are defined as DecimalStrField but they are numeric valued (I have to execute range selection). I tried to change schema defining these fields as: but I receive this error: >[cassandra@bigdatalin-03 ~]$ dsetool reload_core timeseriesks.seriesdata schema=schema_data.xml solrconfig=solr_config_data.xml reindex=true -l cassandra -p cassandra org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: com.datastax.bdp.search.solr.CassandraIndexSchema$ValidationException: Mismatch between Solr key field [seriesmetadata_id] with type {TrieLongField{class=org.apache.solr.schema.TrieLongField,analyzer=org.apache.solr.schema.FieldType$DefaultAnalyzer,args={class=org.apache.solr.schema.TrieLongField}}] and Cassandra key alias [seriesmetadata_id] with type [decimal] at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:665) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:303) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:294) at com.datastax.bdp.tools.SearchDseToolCommands.createOrReloadCore(SearchDseToolCommands.java:559) at com.datastax.bdp.tools.SearchDseToolCommands.access$200(SearchDseToolCommands.java:59) at com.datastax.bdp.tools.SearchDseToolCommands$ReloadCore.execute(SearchDseToolCommands.java:209) at com.datastax.bdp.tools.DseTool.run(DseTool.java:126) at com.datastax.bdp.tools.DseTool.run(DseTool.java:51) at com.datastax.bdp.tools.DseTool.main(DseTool.java:186) Why? Thanks a lot!
afmulone (65 rep)
May 6, 2016, 02:41 PM • Last activity: May 10, 2016, 02:43 PM
2 votes
0 answers
190 views
Solr with SQL Server or NoSQL
I am working on a POC of using Solr with SQL Server. I have a very complex data model with SQL server which required a bunch of joining and scalar functions to strip some mark up and other stuff. This is turning out to be performance bottleneck. To address this issue, we have considered NoSQL (MogoD...
I am working on a POC of using Solr with SQL Server. I have a very complex data model with SQL server which required a bunch of joining and scalar functions to strip some mark up and other stuff. This is turning out to be performance bottleneck. To address this issue, we have considered NoSQL (MogoDB) or Solr with SQL Server as our options. Using MongoDB, we attach a replication events for all CRUD operations which will carry over the data to MongoDB as well after successful insert, update, delete on SQL Server. And when we have to perform a search we do the search directly on Mongo Collections. **This sounds very cool as we have 32 tables joining in this search, which can convert to 2 collections in MongoDB** On the other note we are also exploring our options using Solr with SQL Server with DataImport My concern is, based on this article http://www.codewrecks.com/blog/index.php/2013/04/29/loading-data-from-sql-server-to-solr-with-a-data-import-handler/ I have to do a import for each entity - How does the joining search works with Solr? Should I import each table from SQL to Solr and then write a join login against Solr APIs - Can I Import Multiple Entities at once? Should I create a view for expected result set (denormalized) and import that view to Solr? - Will these imports have to be done on regular intervals? After a import if there is new data, does Solr API reflects that change? or I have to do a import first and then do the search against Solr API Finally can I compare Solr with MongoDB, if anyone has done this kind of evaluation please share your thoughts.
HaBo (191 rep)
Apr 25, 2016, 05:47 AM • Last activity: Apr 25, 2016, 06:53 AM
1 votes
2 answers
853 views
SQL Server vs SOLR (Or any document db)
I have my "customer" data in a normalized sql server database. Getting out the customer data in my app is taking too long. This is because I have to go to 10+ tables to get all the data I need. My company has an installation of SOLR that I thought about storing a Json object that contains all the da...
I have my "customer" data in a normalized sql server database. Getting out the customer data in my app is taking too long. This is because I have to go to 10+ tables to get all the data I need. My company has an installation of SOLR that I thought about storing a Json object that contains all the data I need for a single "customer" already put together. I think that this would give me some significant speed improvements. However, it got me to wondering what the difference would be to me putting this data in a single table with a varchar(max) column that has the Json in it. I could index my 10ish searchable columns on the same table as the json. I know that document databases are very popular. **So I imagine there has to be benefits over just rolling my own using denormalized data in sql server. Can someone tell me what they are?**
Vaccano (2550 rep)
Apr 7, 2015, 10:10 PM • Last activity: Feb 8, 2016, 04:08 PM
Showing page 1 of 10 total questions