Indexing just stops #4

jeacott · 2015-01-28T22:56:41Z

Hi,
I'm using ES 1.4.2, and a cluster of 2 data nodes, 6 shards, and about 1m docs in an index I'm trying to reindex.
I invoke this reindexing plugin, and it runs for a while, then stops. there is no output in the log at all.
It does not stop at the same place each time, sometimes it manages 72000docs, sometimes it makes 200k docs, but it never succeeds with the entire index. a subsequent
GET _reindex
yields
{
"acknowledged": true,
"names": []
}

I have also noticed that if I invoke this plugin on a cluster that has shard allocation disabled, and I target an as yet non existent index, ES creates the new index properly, but the reindexing plugin cannot write anything until shard allocation is enabled again (of course) - but after that it still fails. it just sits there doing nothing at all.

Hope these issues can be identified and fixed.
this plugin would be very useful if I could get it working properly.

Cheers

marevol · 2015-01-28T23:11:11Z

Does reindexing work on a cluster that has shard allocation enabled if you do not change the status for shard allocation when invoking reindexing?

Could you check a debug log for elasticsearch?

jeacott · 2015-01-29T09:31:47Z

No. As I said it does not work even if I don't change the allocation status. Also as I said there are zero entries in the log. It just stops.

marevol · 2015-01-29T09:56:54Z

Could you provide a step to reproduce it?
I do not have any problems for reindexing...

jeacott · 2015-01-29T10:22:50Z

I wish I could. All I do is start indexing with post from/_reindex/to

It starts.
Then it stops sometime later as though it finished properly.

My documents are quite large if that makes any difference.

jeacott · 2015-02-03T00:30:06Z

so I resolved this,
after checking the source I added scroll=10m&size=200 to the end of my reindex url.
the time was the likely cause as it defaults to 1000 documents and 1minute max scroll time.

there are some places that errors can occur that will never be logged,
nor anyone alerted to, and I had to change a few things in order to run the tests locally too.

I'll make the changes available when I get a chance if you're interested

derjohn · 2015-04-29T17:10:20Z

Hello,
I suffer from the same effect, but adding ?scroll=10m&size=200 didn't make it work.

kaplun · 2015-12-15T14:35:01Z

Hi. I think this might be related to what we are also having on our production servers.
e.g. we have found this in our elasticsearch log,

Here is the full traceback:

[2015-12-15 14:53:25,814][ERROR][action.bulk              ] [Surge] unexpected error while replicating for action [indices:data/write/bulk[s]]. shard [[hep_v2][4]]. 
org.codelibs.elasticsearch.reindex.exception.ReindexingException: failure in bulk execution:
[95]: index [hep_v2], type [record], id [1398802], message [MergeMappingException[Merge failed with failures {[mapper [publication_info.recid] of different type, current_type [string], merged_type [lo
ng]]}]]
       at org.codelibs.elasticsearch.reindex.service.ReindexingService$ReindexingListener$2.onResponse(ReindexingService.java:214)
       at org.codelibs.elasticsearch.reindex.service.ReindexingService$ReindexingListener$2.onResponse(ReindexingService.java:209)
       at org.elasticsearch.action.bulk.TransportBulkAction$2.finishHim(TransportBulkAction.java:358)
       at org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:330)
       at org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:319)
       at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicationPhase.doFinish(TransportReplicationAction.java:983)
       at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicationPhase.doRun(TransportReplicationAction.java:836)
       at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.finishAndMoveToReplication(TransportReplicationAction.java:530)
       at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.performOnPrimary(TransportReplicationAction.java:608)
       at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase$1.doRun(TransportReplicationAction.java:452)
       at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

but meanwhile querying for the status at _reindex/<name> was simply returning:

{
  "acknowledged": true,
  "name": "6bedbfdd-228b-4822-bfcf-a437ff394a76",
  "found": true
}

(This is with ES 2.1.0)

In our case the error was simply due to the fact that we initalized the new index with a mapping that was slightly incompatible with the previous index (which was using dynamic mapping for certain fields).

Could it be that @derjohn's error is similar?

Nevertheless I think it would be great if the error state would be, not only logged in, but also reported in the JSON response to _reindex/<name>: in this way it would be possible to pass on to clients the responsibility to deal with the failure.

Currently we have not found a way to distinguish between a very long reindexing and a failed one.

cc: @jalavik

marevol · 2015-12-17T02:40:04Z

Merge failed with failures {[mapper [publication_info.recid] of different type, current_type [string], merged_type [long]]}]]

The mapping handling for ES 2.x became strict.
"recid" property must be string or long in all _type.

kaplun · 2015-12-17T08:28:00Z

Yep. Indeed we fixed it locally for our own specific problem. The only issue with the elasticsearch-reindexing, though, was that there was no report of an error via the REST interface. The error is currently only logged in the Elasticsearch logs.

marevol · 2015-12-17T08:33:07Z

Did you run it with wait_for_completion=true?

kaplun · 2015-12-17T08:38:26Z

I can't because it would timeout the REST request. (the index is several gigabytes of size). So what I am doing is a simple polling of the /_reindex/<name> handler.

marevol added the question label Jan 29, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing just stops #4

Indexing just stops #4

jeacott commented Jan 28, 2015

marevol commented Jan 28, 2015

jeacott commented Jan 29, 2015

marevol commented Jan 29, 2015

jeacott commented Jan 29, 2015

jeacott commented Feb 3, 2015

derjohn commented Apr 29, 2015

kaplun commented Dec 15, 2015

marevol commented Dec 17, 2015

kaplun commented Dec 17, 2015

marevol commented Dec 17, 2015

kaplun commented Dec 17, 2015

Indexing just stops #4

Indexing just stops #4

Comments

jeacott commented Jan 28, 2015

marevol commented Jan 28, 2015

jeacott commented Jan 29, 2015

marevol commented Jan 29, 2015

jeacott commented Jan 29, 2015

jeacott commented Feb 3, 2015

derjohn commented Apr 29, 2015

kaplun commented Dec 15, 2015

marevol commented Dec 17, 2015

kaplun commented Dec 17, 2015

marevol commented Dec 17, 2015

kaplun commented Dec 17, 2015