Working with a large list

Chad_Richins · March 12, 2023, 11:48pm

I setup a database on firebase that is quite large, about 315,000 records. This is a list of vehicles by year. In my app I am trying to select the year, then that will show only the vehicle makes in that year. Then you select the model from that. When I was using a smaller test set of data roughly 10 different vehicles, the app was working perfect. I uploaded my full vehicle list to firebase, but now when I go into app inventor, and select year, the app is crashing and closing out. When it does this, I dont see an error created. I have a feeling its a fan of the size of the list its trying to display.

Im not using the actual firebase extension built into app inventor, I am using the web connectivity feature, and then putting in the url for json file from firebase to pull the data. Is there a best practice for working with large amounts of data like Im attempting?

Kevinkun · March 13, 2023, 12:00am

You can use web api to get filtered data, rather than retrieve all data to local.

Chad_Richins · March 13, 2023, 12:01am

How would I do something like that without knowing what the user is going to select?

Kevinkun · March 13, 2023, 12:10am

You have to know what users are going to search, other vise how to write the code?

Chad_Richins · March 13, 2023, 12:13am

I figured as much. That's the problem I'm going to run into. There won't be a way to know what they may select. So with that, is there a better way to handle the large list locally?

Kevinkun · March 13, 2023, 12:19am

User can search data by Year , or Maker,or Model as in your example.
But I am not familiar with Firebase. For me I will use MySQL.

ABG · March 13, 2023, 2:02am

This is called data design:

Basically, you have to flatten your data and store it in such a way that it can be approached in multiple ways, in small clumps, never all at once.

Also see

See if you can take advantage of Firebase indexes.

Once you have reduced the size of your incoming result set, AI2 has blocks that can filter it further:

dictionary walk thru path
nb191 map/reduce list blocks.

Chad_Richins · March 13, 2023, 4:38am

So I am using a walk through in a dictionary. Below is a screen shot of what the got text is doing after the json file is pulled. I read through the data structure things you posted, and how to flatten out the data. With this data set I have Im not sure how I would do that though. Is it something where I would have 3 different "tables"? one for year with a key, a second for the make with the same key, and third with the model, and that same key again. Then link it together similar to what I have the blocks doing now? I have also taken a screen shot of what the json data being stored in firebase looks like.

Kevinkun · March 13, 2023, 5:40am

use a url like this:

https://URL.TO.YOUR.JSON?orderBy="year"&equalTo=1996&print=pretty

for details, refer to their document:

ABG · March 13, 2023, 12:42pm

You need different "tables" to support query building into the cars prairie.

You should not have to download the prairie to fend off people looking for 2022 Studebakers or 2009 Teslas.

Only after you have year and model should you submit a filtered query to Firebase.

Your dictionary blocks make no sense to me.

TIMAI2 · March 13, 2023, 1:24pm

As suggested by others, you need to use the indexing feature in Firebase, get Firebase to do the grunt work of filtering your data, to return the filtered results

https://firebase.google.com/docs/database/security/indexing-data

ABG · March 13, 2023, 4:34pm

How are you loading Firebase with your data?

How often does it change?

Chad_Richins · April 24, 2023, 6:55am

Sorry it has taken me so long to respond, the normal 9-5 job has been crazy busy. Firebase gets updated with a json file that I am uploading. I wrote a small program through python that converts a csv file to a json file.

This data may get updated once a year with new vehicles that have been manufactured. So the start of 2024, there would be another upload made appending data to the existing table.

I have attached the json file that I am loading.

VehicleDataTestLoad.json (4.1 KB)

Chad_Richins · April 24, 2023, 8:41am

So after I read through the index documentation this is what I have setup in the rules. With this setup,

{
"rules": {
"VehicleData": {
".indexOn": ["year", "make", "model"]
}
}
}

if I try using the below URL in a browser, I get a message that I need to add indexes to my rules. Im not sure if it takes a bit for firebase to build the indexes, or if it doesnt like the way the rules are setup.

" https://firebaseUrl/FullVehicleList/VehicleData.json?orderBy="year" "

{
"error" : "Index not defined, add ".indexOn": "year", for path "/FullVehicleList/VehicleData", to the rules"
}

TIMAI2 · April 24, 2023, 9:52am

Using your data in the json file, I am guessing your firebase console looks something like this:

Therefore your rules need to look like this:

Then you can run queries like this:

Which should output:

Be aware that firebase will not return a sorted set of results.

With such a large dataset, I would suggest that you generate lists of years, makes and models from the data on first run, either store this also on firebase or on the device. This will help to make querying the data easier, and avoid having to download all the data to get results.

This could be done by pulling the data into a google sheet and generating the required lists there:

[EDIT] I have just remembered, there is a gotcha to importing the json to a google sheet, it can only handle 40,000 rows, so you would have to split the json into @ 9 files and use 9 sheets!

To be honest, if the data is only updated @ once a year, you might be better off loading the data into an sqlite database and running this locally on each device. This will allow for improved queries and performance.

https://puravidaapps.com/sqlite.php

https://ai2.metricrat.co.uk/guides/working-with-sqlite

I calculate that your json file will be between 72 and 110 mb in size !! (given 315,000 entries)

Chad_Richins · April 29, 2023, 8:38pm

I got it figured out. the json file is now only 7.43mb. I did some filtering through it. There were duplicate entries based on the model, sub model, and engine size. Also I added some concatenated fields so the year, make and model were all one word in a field. So now you open the app, hit year, and it only pulls the year table. Then the rest api sends a request for the year selected, and will come back with the makes that were build that year. In the app it then joins the year and make, and I used that value to send in the rest api. That concatenated value matches a year and and make in firebase, and only displays the models that are associated with that make for the year selected.

I appreciate all the help. It made me sit down rethink how I have the tables setup, and what should be used to pull that data out. The table sizes are much smaller, more optimized, and its actually doing what I expected.

TIMAI2 · April 29, 2023, 9:40pm

Pleased to hear you found a solution.

Be good, if you find the time, to provide an example of your blocks / firebase so that others might learn.

Chad_Richins · April 29, 2023, 10:57pm

Here are the blocks that I used, and the formatting that was done with the json file that was loaded into Firebase. Ive been testing it for the past couple of hours, and it seems to be pretty stable. Checking it against the data that pulls, and then using the url thats passed to the firebase rest api.

firebaseRestAPIexample.aia (760.6 KB)
jsonFirebaseExample.json (433 Bytes)

system · May 6, 2023, 10:57pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.