Skip to content
This repository has been archived by the owner on Nov 14, 2019. It is now read-only.

Indexing just stops #4

Open
jeacott opened this issue Jan 28, 2015 · 11 comments
Open

Indexing just stops #4

jeacott opened this issue Jan 28, 2015 · 11 comments
Labels

Comments

@jeacott
Copy link

jeacott commented Jan 28, 2015

Hi,
I'm using ES 1.4.2, and a cluster of 2 data nodes, 6 shards, and about 1m docs in an index I'm trying to reindex.
I invoke this reindexing plugin, and it runs for a while, then stops. there is no output in the log at all.
It does not stop at the same place each time, sometimes it manages 72000docs, sometimes it makes 200k docs, but it never succeeds with the entire index. a subsequent
GET _reindex
yields
{
"acknowledged": true,
"names": []
}

I have also noticed that if I invoke this plugin on a cluster that has shard allocation disabled, and I target an as yet non existent index, ES creates the new index properly, but the reindexing plugin cannot write anything until shard allocation is enabled again (of course) - but after that it still fails. it just sits there doing nothing at all.

Hope these issues can be identified and fixed.
this plugin would be very useful if I could get it working properly.

Cheers

@marevol
Copy link
Contributor

marevol commented Jan 28, 2015

Does reindexing work on a cluster that has shard allocation enabled if you do not change the status for shard allocation when invoking reindexing?

Could you check a debug log for elasticsearch?

@jeacott
Copy link
Author

jeacott commented Jan 29, 2015

No. As I said it does not work even if I don't change the allocation status. Also as I said there are zero entries in the log. It just stops.

@marevol
Copy link
Contributor

marevol commented Jan 29, 2015

Could you provide a step to reproduce it?
I do not have any problems for reindexing...

@jeacott
Copy link
Author

jeacott commented Jan 29, 2015

I wish I could. All I do is start indexing with post from/_reindex/to

It starts.
Then it stops sometime later as though it finished properly.

My documents are quite large if that makes any difference.

@jeacott
Copy link
Author

jeacott commented Feb 3, 2015

so I resolved this,
after checking the source I added scroll=10m&size=200 to the end of my reindex url.
the time was the likely cause as it defaults to 1000 documents and 1minute max scroll time.

there are some places that errors can occur that will never be logged,
nor anyone alerted to, and I had to change a few things in order to run the tests locally too.

I'll make the changes available when I get a chance if you're interested

@derjohn
Copy link

derjohn commented Apr 29, 2015

Hello,
I suffer from the same effect, but adding ?scroll=10m&size=200 didn't make it work.

@kaplun
Copy link

kaplun commented Dec 15, 2015

Hi. I think this might be related to what we are also having on our production servers.
e.g. we have found this in our elasticsearch log,

Here is the full traceback:

[2015-12-15 14:53:25,814][ERROR][action.bulk              ] [Surge] unexpected error while replicating for action [indices:data/write/bulk[s]]. shard [[hep_v2][4]]. 
org.codelibs.elasticsearch.reindex.exception.ReindexingException: failure in bulk execution:
[95]: index [hep_v2], type [record], id [1398802], message [MergeMappingException[Merge failed with failures {[mapper [publication_info.recid] of different type, current_type [string], merged_type [lo
ng]]}]]
       at org.codelibs.elasticsearch.reindex.service.ReindexingService$ReindexingListener$2.onResponse(ReindexingService.java:214)
       at org.codelibs.elasticsearch.reindex.service.ReindexingService$ReindexingListener$2.onResponse(ReindexingService.java:209)
       at org.elasticsearch.action.bulk.TransportBulkAction$2.finishHim(TransportBulkAction.java:358)
       at org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:330)
       at org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:319)
       at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicationPhase.doFinish(TransportReplicationAction.java:983)
       at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicationPhase.doRun(TransportReplicationAction.java:836)
       at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.finishAndMoveToReplication(TransportReplicationAction.java:530)
       at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.performOnPrimary(TransportReplicationAction.java:608)
       at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase$1.doRun(TransportReplicationAction.java:452)
       at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

but meanwhile querying for the status at _reindex/<name> was simply returning:

{
  "acknowledged": true,
  "name": "6bedbfdd-228b-4822-bfcf-a437ff394a76",
  "found": true
}

(This is with ES 2.1.0)

In our case the error was simply due to the fact that we initalized the new index with a mapping that was slightly incompatible with the previous index (which was using dynamic mapping for certain fields).

Could it be that @derjohn's error is similar?

Nevertheless I think it would be great if the error state would be, not only logged in, but also reported in the JSON response to _reindex/<name>: in this way it would be possible to pass on to clients the responsibility to deal with the failure.

Currently we have not found a way to distinguish between a very long reindexing and a failed one.

cc: @jalavik

@marevol
Copy link
Contributor

marevol commented Dec 17, 2015

Merge failed with failures {[mapper [publication_info.recid] of different type, current_type [string], merged_type [long]]}]]

The mapping handling for ES 2.x became strict.
"recid" property must be string or long in all _type.

@kaplun
Copy link

kaplun commented Dec 17, 2015

Yep. Indeed we fixed it locally for our own specific problem. The only issue with the elasticsearch-reindexing, though, was that there was no report of an error via the REST interface. The error is currently only logged in the Elasticsearch logs.

@marevol
Copy link
Contributor

marevol commented Dec 17, 2015

Did you run it with wait_for_completion=true?

@kaplun
Copy link

kaplun commented Dec 17, 2015

I can't because it would timeout the REST request. (the index is several gigabytes of size). So what I am doing is a simple polling of the /_reindex/<name> handler.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants