Technology
Frequently Asked Questions
Nerd Alert: some of the content here can get technical. Nothing to see here if something isn't resonating with you unless, of course, you’re of a technical mind. In that case, please connect with us to follow-up.
What happens if a connector goes down?
Monitoring is part of the ongoing service FUSE provides. A problem can occur, for example, if a webservice isn’t available or a vendor updates their API which breaks FUSE’s connection to your content.
First, no worries — your users won’t notice anything amiss. FUSE will continue to seamlessly use the latest successful index of that content. The index won’t have content that is new between the time of the connector outage and the search being conducted, however.
Next, we get alerted when this type of thing happens. Nine times out of ten, the problem fixes itself at the next indexing attempt. If something really is down, however, it’s up to the FUSE team to get it running again — the are no fees or development costs. This is part of the ongoing service you’re paying for.
Our content does not have an API. Will FUSE work?
FUSE’s numero uno choice to reach content is via API since we can get all of the nice juicy taxonomy and other metadata that way (which feeds our powerful filters). This isn’t always possible, however. As a last resort, we’ll crawl a content source just like Google does. Also called “scraping,” we avoid if we can because it often produces messy and inconsistent output that we then have to process/parse additionally.
No worries about authentication if the resource in question is behind a wall or a mixture of public/private - we can handle both scenarios no problem. Scraping/crawling makes sense sometimes when there’s a source you want to index but just aren’t interested in the metadata or there’s really no metadata to be had.
How is performance of our website or our sources affected?
When FUSE first indexes your website or external content source, there will be a performance hit since we’re indexing potentially thousands of content items potentially going back several years. Performance is usually moot since it's a onetime event never to be seen again (unless for some reason a complete re-indexing is required again down the road).
After the initial indexing, FUSE only indexes new items with no discernable performance penalty since it’s essentially the same as a web user requesting a webpage. We figure out what’s new content by performing a single item query at the next scheduled pull: Get most recent item from this content source. Does it match the latest item we already have in our index? If yes, stop. If no, get ten latest items. Now do we have them all? Rinse and repeat.
As for your website, there is no performance concern since we store your index in AWS and all of the computing power to run queries is happening there. A good analogy is YouTube. You can embed five hour long YouTube videos on a page on your site with no performance issues since the streaming/computing power comes from YouTube’s infrastructure.
We want to build our own feed for FUSE - what format should we use?
Can we use a development environment for testing?
Yes. Our embed code is platform agnostic - we'll send you the code to embed and you can put it anywhere you'd like. When you’re ready to go live, just place it on a page in your production environment.
In terms of indexing test/development content, we prefer to index production data when possible. This cuts down on repeat work on our side and it's better for you since seeing test data often isn't as insightful as the real thing.
Where is FUSE hosted?
Amazon Web Services (AWS). AWS runs a dominant third of all cloud computing.
What's a processor?
On our website we refer to Processors as Integrations. A Processor is an ETL-like concept (Extract, Transform, Load) highly optimized for getting data into FUSE. Processors extract data from external sources (or from existing FUSE data), transform it into the FUSE Series concept, and load it into FUSE.
JSON-formatted objects store key things like URLs, passwords, tokens, etc. The processor code is written in Python and is generally 10 to 500 lines long, with the average around 150. Most Processors use Schedules to run at the appropriate time while others are run manually or through Uploaders. Processors write Series through Jobs.
How do schedules work?
Processors can be run manually, through Uploaders, or, most frequently, through Schedules. Each processor can have multiple schedules. Each Schedule specifies the frequency it should run using CRON syntax, as well as how it should run: as a Job or a Trigger. As the name suggests, Jobs immediately start a Job and try to import data. Triggers check if a Job should be started, and based on the logic in the code, may decide to start the Job or wait until the next time the Trigger runs.
What's an uploader?
An uploader is a way for a user to launch Processors that operate on one or more files. Most FUSE clients do NOT use an uploader since we typically get to content using a combination of code and schedules. FUSE sets up an Uploader Preset by:
1. associating it with an processor,
2. giving permissions to users or teams to use it,
3. overwriting any processor settings for when this uploader is used to launch it,
4. adding descriptive language for users of the uploader.
A user then runs the uploader through FUSE Web by:
5. opening the uploader tool,
6. choosing the uploader Preset,
7. uploading the file(s) they want to use,
8. giving the upload job a descriptive name.
Keep in mind that uploaders launch processors that write series through Jobs. All data must be written through a job, and uploaders are not an exception.
How do you prevent version conflicts?
Series’ points or fields can only be changed by a job with the same or higher job ID than the previous one. This ensures that newer jobs always get preferential treatment in a conflict. Consider a case when two jobs that started seconds apart try to import the same 10 series, but the second one has a slightly more updated version of the data. FUSE guarantees that by the time both jobs are done, all 10 series will have the values set by the second job (the one with the higher job ID) even if the first job wrote the series last (because of slower processing, for example).
What search engine is FUSE based on?
Elasticsearch. Elasticsearch is a search engine based on Lucene. Elasticsearch is the most popular enterprise search engine followed by Apache Solr, also based on Lucene.
Why does the search’s embed code render strangely when put on our site?
FUSE’s embed code uses generic HTML and CSS tags with no styling applied (e.g.
<h3>
,<p>
). The idea is that the FUSE embed will adapt to your site’s existing styles (e.g. fonts, colors, sizes). Sometimes things can look ‘off’ when your site’s use of a particular tag doesn’t align with FUSE’s use of the same one.For example,
<h3>
on your site may be 24pts and yellow per your site’s CSS. FUSE uses<h3>
tags for the title links in your search results. If you don’t want those to be 24pts and yellow, overriding the site’s CSS styles prior to the embed on a given page will be necessary.
Can we pass parameters to the search page so certain filters are pre-applied when it loads?
Yep. Something like this will work:
https://www.yoursite.org/search?query=vegetables&category=celery
Use can also use a comma to separate more than one value, adding as many filters as you like:
https://www.yoursite.org/search?query=vegetables&category=celery,carrots&color=green
What’s FUSE’s IP Address Range?
The following are helpful if you need to whitelist our service.
54.208.151.205
54.209.49.63
52.70.169.233
52.72.43.69