Skjetlein: November 2015

Monday, November 23, 2015

(Rolling) restart of elasticsearch datanodes

elasticsearch 1.7.3

Planned restart of the data nodes should include stopping the routing of traffic to avoid unnecessary rebalancing and prolonged recovery period when node rejoins the cluster.

Example:
Stop the routing:
PUT /_cluster/settings
{
"transient" : {
"cluster.routing.allocation.enable" : "none"
}
}

Should reply with:
{
"persistent": {
},
"transient": {
"cluster": {
"routing": {
"allocation": {
"enable": "none"
}
}
}
},
"acknowledged": true
}

Stop and do whatever you need to do, then start the node and wait for the cluster reporting the node rejoin in the logs:
[2015-11-23 01:18:32,623][INFO ][cluster.service ] [servername] added {[servername2][2DwlAl3SAe-aijdas1336Ew][servername2][inet[/1.1.1.2:9300]],}, reason: zen-disco-receive(join from node[[servername2][2DwlAl3SAe-aijdas1336Ew][servername2][inet[/1.1.1.2:9300]]])

When to re-enable routing is a question on how busy the cluster is. Reenabling will cause additional load and stress and myself have been delaying this for some hours until a suitable occasion appears. Must stress that the documents the rejoined node has will not be visible to the cluster nor will the rejoined cluster in any way offload the rest since data it has is not present (of course).

The cluster might also depending on shard and replication setting not be redundant during the routing allocation is off, what the overall status; orange ok during transition.

To reenable the routing
PUT /_cluster/settings
{
"transient" : {
"cluster.routing.allocation.enable" : "all"
}
}

Watch the status to become green and continue to next node if needed. The time for the cluster to become green, that is in this context for all shards and replication criteria to be met varies highly with the capacity of each node, the overall load and not at least the number of documents. Eg. a 3 node cluster with 200M doc's should spend somewhere around 10 minutes for recovery, not more.

GET _cluster/health
{
"cluster_name": "clustername",
"active_primary_shards": 56,
"active_shards": 112,
"number_of_data_nodes": 3,
"number_of_in_flight_fetch": 0,
"number_of_nodes": 5,
"unassigned_shards": 0,
"number_of_pending_tasks": 0,
"timed_out": false,
"delayed_unassigned_shards": 0,
"relocating_shards": 0,
"initializing_shards": 0,
"status": "green"

}

Thursday, November 5, 2015

How to replace a headset

Replacing the headset is usually very straightforward and simple, given you have acces to the right tools which you will see in the pictures below. The two main tools are the extractor and the pressfit compressor. Teoretically one can do without these tools, but at risk of damaging the frame. The frame below is a Scott CR1 carbon and I would say that doing this job without the tools a damaged frame would probably be the result. The pressfit tends to sit very hard since the are to a difference from the bottom bracket in aluminium.

The reason for replacing the headset was a rusty lower bearing, the headset startet to bindly shortly after use and that lead me doubt the quality. I continued to use the bike rest of the last season and this, but it got worse and worse affecting the safety, especially noticable during fast decents.

Dismantle the stem in the usual way, first by removing the rings and the top.

Be sure to hold the fork while lossening up the stem to avoid it falling to the floor.

The fork should come loose and it some of the further service might require to dismantle the front break

Top bearing is looking good, almost no rust and do not really need replacement.

Inspect top and bottom to see if there are any wear, cracks etc in the carbon

I purchased a new complete high quality sealed BBB headset, the original Ritchey was not sealed and I believe that could have accelerated the wear and tear since replacing the headset should only be done every 3-5 years.

This is the pressfit extractor, it enables your to apply force on the pressfit itself and not the frame when hammering down.

Insert the extractor backwards

Be sure that the blades on the extractor engages inside on the edges of the pressfit. If a rubberhammer dont do - use a metall hammer, but be carefull and take your time to avoid damaging the frame

The bearing can usually be removed without forced since they are only held in place by the fork that is now removed. As you can see, the bearing was very rusty and no wonder why it caused binding.

Same process for removing the top pressfit, notice the metall hammer :)

Inspect the inside for reasons mentioned earlier. The white stuff you see inside the carbonframe is normal and are remains after the molding.

Wipe and clean

Notice the remains after the old bearing, this a seal and must be removed since it do not fit with the new bearing.

Removal was in this case hard, since the rust had glued it to the carbon

Careful prying got it moving after a while

There are different oppinions of what to do with the surface between pressfit and carbon. Some say to keep it dry, but I prefer to lube it up - in this case using lithium grease for longevity and resistance to moisture.

New pressfits alligned with the compressor aligned, be careful when tightning up to keep everything alligned.

New lower bearing inserted onto the fork

New top bearing inserted

Insert the spacers and thighten up by hand, be carefull not to tighten to hard. It should now bind when turning and there should be no slack.

There, done!

Go biking!

Wednesday, November 4, 2015

Elasticsearch and stemming

Main use of Elasticsearch is storing logs, and lot's of them. Searching throught the data suing Kibana frontend is awesome and one usually finds what one is looking for.

But lets have a bit fun by using the stemming in Elasticsearch.

Elasticsearch provides good support for stemming via numerous token filters, but lets focus on the Hunspell stemmer this time.

Els have the stemmer already (v1.7.3), but do not have the words. So first step would be to get the dictionaries and install them, not going into details here. Bounce the cluster (yep, all nodes) and be sure that the dictionries loads in nicely.

By default, newly created indices do not use stemming, thusly one have to set this when creating index.

put /skjetlein

{
"settings": {
"analysis": {
"filter": {
"en_US": {
"type": "hunspell",
"language": "en_US"
}
},
"analyzer": {
"en_US": {
"tokenizer": "standard",
"filter": [ "lowercase", "en_US" ]
}
}
}

If the dictionaries are missing from one or several nodes, you will receive a failure notice.

Othervise:
{
"acknowledged": true
}

Verify the settings

get /skjetlein/_settings

{
"skjetlein": {
"settings": {
"index": {
"uuid": "ny3n0uJMRKywpvy6OCRmLw",
"number_of_replicas": "1",
"analysis": {
"filter": {
"en_US": {
"type": "hunspell",
"language": "en_US"
}
},
"analyzer": {
"en_US": {
"filter": [
"lowercase",
"en_US"
],
"tokenizer": "standard"
...

Lets test the stemming,

get skjetlein/_analyze?analyzer=en_US -d "driving painting traveling"

...output should be something like this:

{
  "tokens": [
    {
      "token": "drive",
      "start_offset": 0,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "painting",
      "start_offset": 8,
      "end_offset": 16,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "paint",
      "start_offset": 8,
      "end_offset": 16,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "traveling",
      "start_offset": 17,
      "end_offset": 26,
      "type": "<ALPHANUM>",
      "position": 3
    },
    {
      "token": "travel",
      "start_offset": 17,
      "end_offset": 26,
      "type": "<ALPHANUM>",
      "position": 3
    }
  ]
}

Ie. result is drive, paint and travel. Looks good.

So what can the usecase be in the context of elasticsearch, which usually is storing wast amount of logs (events) ? Well, lets say i search throught the logs for problems with filesystems. Elastic as-is would require search strings that include every possible word related to filesystem and since logs, in this context such as syslog do not provide the information is such a way that the search could be expressed in a constistant way.

Eg. filesystems can be stemmed to 'filesystem'

"custom_stem": {
          "type": "stemmer_override",
          "rules": [ 
            "ext2fs=>filesystem",
            "nfs=>filesystem",
            "btrfs=>filesystem"

...

or

           "postfix=>mail",
            "smtp=>mail",
            "qmail=>mail"